2020-05-18 15:55:10 +00:00
|
|
|
import copy
|
2020-05-20 01:51:54 +00:00
|
|
|
import re
|
2020-05-18 15:55:10 +00:00
|
|
|
from io import BytesIO
|
2020-02-10 21:19:23 +00:00
|
|
|
|
2020-05-18 15:55:10 +00:00
|
|
|
import jinja2
|
2020-05-20 01:51:54 +00:00
|
|
|
from docx.shared import Inches
|
|
|
|
from docxtpl import DocxTemplate, Listing, InlineImage
|
2020-02-10 21:19:23 +00:00
|
|
|
|
|
|
|
from crc import session
|
|
|
|
from crc.api.common import ApiError
|
2020-05-20 01:51:54 +00:00
|
|
|
from crc.models.file import CONTENT_TYPES, FileModel, FileDataModel
|
2020-05-18 15:55:10 +00:00
|
|
|
from crc.models.workflow import WorkflowModel
|
2020-03-03 18:50:22 +00:00
|
|
|
from crc.scripts.script import Script
|
2020-02-28 16:54:11 +00:00
|
|
|
from crc.services.file_service import FileService
|
2020-02-10 21:19:23 +00:00
|
|
|
from crc.services.workflow_processor import WorkflowProcessor
|
|
|
|
|
|
|
|
|
2020-03-03 18:50:22 +00:00
|
|
|
class CompleteTemplate(Script):
|
|
|
|
|
|
|
|
def get_description(self):
|
2021-09-22 16:12:26 +00:00
|
|
|
return """Using the Jinja template engine, takes data available in the current task, and uses it to populate
|
2020-03-19 21:13:30 +00:00
|
|
|
a word document that contains Jinja markup. Please see https://docxtpl.readthedocs.io/en/latest/
|
|
|
|
for more information on exact syntax.
|
|
|
|
Takes two arguments:
|
|
|
|
1. The name of a MS Word docx file to use as a template.
|
|
|
|
2. The 'code' of the IRB Document as set in the irb_documents.xlsx file."
|
|
|
|
"""
|
2020-02-10 21:19:23 +00:00
|
|
|
|
2020-05-24 20:13:15 +00:00
|
|
|
def do_task_validate_only(self, task, study_id, workflow_id, *args, **kwargs):
|
2020-03-27 12:29:31 +00:00
|
|
|
"""For validation only, process the template, but do not store it in the database."""
|
2020-06-01 16:33:58 +00:00
|
|
|
workflow = session.query(WorkflowModel).filter(WorkflowModel.id == workflow_id).first()
|
|
|
|
self.process_template(task, study_id, workflow, *args, **kwargs)
|
2020-03-27 12:29:31 +00:00
|
|
|
|
2020-05-24 20:13:15 +00:00
|
|
|
def do_task(self, task, study_id, workflow_id, *args, **kwargs):
|
Refactor the document details scripts. Now there is one script, it returns data in a consistent format, and has all the details required. The script is located in StudyInfo, with the argument documents. Make note that it returns a dictionary of ALL the documents, with a field to mark which ones are required according to the protocol builder. Others may become required if a workflow determines such, in which case the workflow will enforce this, and the document will have a count > 0, and additional details in a list of files within the document. I modified the XLS file to use lower case variable names, because it disturbed me, and we have to reference them frequently. Removed devious "as_object" variable on get_required_docs, so it behaves like the other methods all the time, and returns a dictionary. All the core business logic for finding the documents list now resides in the StudyService.
Because this changes the endpoint for all existing document details, I've modified all the test and static bpmn files to use the new format.
Shorting up the SponsorsList.xls file makes for slightly faster tests. seems senseless to load 5000 everytime we reset the data.
Tried to test all of this carefully in the test_study_details_documents.py test.
2020-04-29 19:08:11 +00:00
|
|
|
workflow = session.query(WorkflowModel).filter(WorkflowModel.id == workflow_id).first()
|
2020-05-20 04:10:32 +00:00
|
|
|
final_document_stream = self.process_template(task, study_id, workflow, *args, **kwargs)
|
2020-03-27 12:29:31 +00:00
|
|
|
file_name = args[0]
|
|
|
|
irb_doc_code = args[1]
|
A major refactor of how we search and store files, as there was a lot of confusing bits in here.
From an API point of view you can do the following (and only the following)
/files?workflow_spec_id=x
* You can find all files associated with a workflow_spec_id, and add a file with a workflow_spec_id
/files?workflow_id=x
* You can find all files associated with a workflow_id, and add a file that is directly associated with the workflow
/files?workflow_id=x&form_field_key=y
* You can find all files associated with a form element on a running workflow, and add a new file.
Note: you can add multiple files to the same form_field_key, IF they have different file names. If the same name, the original file is archived,
and the new file takes its place.
The study endpoints always return a list of the file metadata associated with the study. Removed /studies-files, but there is an
endpoint called
/studies/all - that returns all the studies in the system, and does include their files.
On a deeper level:
The File model no longer contains:
- study_id,
- task_id,
- form_field_key
Instead, if the file is associated with workflow - then that is the one way it is connected to the study, and we use this relationship to find files for a study.
A file is never associated with a task_id, as these change when the workflow is reloaded.
The form_field_key must match the irb_doc_code, so when requesting files for a form field, we just look up the irb_doc_code.
2020-05-28 12:27:26 +00:00
|
|
|
FileService.add_workflow_file(workflow_id=workflow_id,
|
2021-08-26 14:40:47 +00:00
|
|
|
task_spec_name=task.get_name(),
|
A major refactor of how we search and store files, as there was a lot of confusing bits in here.
From an API point of view you can do the following (and only the following)
/files?workflow_spec_id=x
* You can find all files associated with a workflow_spec_id, and add a file with a workflow_spec_id
/files?workflow_id=x
* You can find all files associated with a workflow_id, and add a file that is directly associated with the workflow
/files?workflow_id=x&form_field_key=y
* You can find all files associated with a form element on a running workflow, and add a new file.
Note: you can add multiple files to the same form_field_key, IF they have different file names. If the same name, the original file is archived,
and the new file takes its place.
The study endpoints always return a list of the file metadata associated with the study. Removed /studies-files, but there is an
endpoint called
/studies/all - that returns all the studies in the system, and does include their files.
On a deeper level:
The File model no longer contains:
- study_id,
- task_id,
- form_field_key
Instead, if the file is associated with workflow - then that is the one way it is connected to the study, and we use this relationship to find files for a study.
A file is never associated with a task_id, as these change when the workflow is reloaded.
The form_field_key must match the irb_doc_code, so when requesting files for a form field, we just look up the irb_doc_code.
2020-05-28 12:27:26 +00:00
|
|
|
name=file_name,
|
|
|
|
content_type=CONTENT_TYPES['docx'],
|
|
|
|
binary_data=final_document_stream.read(),
|
|
|
|
irb_doc_code=irb_doc_code)
|
2020-03-27 12:29:31 +00:00
|
|
|
|
2020-05-20 04:10:32 +00:00
|
|
|
def process_template(self, task, study_id, workflow=None, *args, **kwargs):
|
2020-02-10 21:19:23 +00:00
|
|
|
"""Entry point, mostly worried about wiring it all up."""
|
2020-05-20 01:51:54 +00:00
|
|
|
if len(args) < 2 or len(args) > 3:
|
2020-02-10 21:19:23 +00:00
|
|
|
raise ApiError(code="missing_argument",
|
2020-03-19 21:13:30 +00:00
|
|
|
message="The CompleteTemplate script requires 2 arguments. The first argument is "
|
|
|
|
"the name of the docx template to use. The second "
|
|
|
|
"argument is a code for the document, as "
|
2020-05-07 17:57:24 +00:00
|
|
|
"set in the reference document %s. " % FileService.DOCUMENT_LIST)
|
2020-03-16 14:37:06 +00:00
|
|
|
task_study_id = task.workflow.data[WorkflowProcessor.STUDY_ID_KEY]
|
2020-03-27 12:29:31 +00:00
|
|
|
file_name = args[0]
|
2020-03-16 14:37:06 +00:00
|
|
|
|
|
|
|
if task_study_id != study_id:
|
|
|
|
raise ApiError(code="invalid_argument",
|
|
|
|
message="The given task does not match the given study.")
|
2020-02-10 21:19:23 +00:00
|
|
|
|
2020-05-20 04:10:32 +00:00
|
|
|
file_data_model = None
|
|
|
|
if workflow is not None:
|
2020-05-29 00:03:50 +00:00
|
|
|
# Get the workflow specification file with the given name.
|
|
|
|
file_data_models = FileService.get_spec_data_files(
|
|
|
|
workflow_spec_id=workflow.workflow_spec_id,
|
2020-06-01 16:33:58 +00:00
|
|
|
workflow_id=workflow.id,
|
|
|
|
name=file_name)
|
|
|
|
if len(file_data_models) > 0:
|
|
|
|
file_data_model = file_data_models[0]
|
|
|
|
else:
|
|
|
|
raise ApiError(code="invalid_argument",
|
|
|
|
message="Uable to locate a file with the given name.")
|
2020-03-27 12:29:31 +00:00
|
|
|
|
2020-05-20 01:51:54 +00:00
|
|
|
# Get images from file/files fields
|
|
|
|
if len(args) == 3:
|
2020-05-20 04:10:32 +00:00
|
|
|
image_file_data = self.get_image_file_data(args[2], task)
|
|
|
|
else:
|
|
|
|
image_file_data = None
|
2020-05-20 01:51:54 +00:00
|
|
|
|
|
|
|
return self.make_template(BytesIO(file_data_model.data), task.data, image_file_data)
|
|
|
|
|
2020-05-20 04:10:32 +00:00
|
|
|
def get_image_file_data(self, fields_str, task):
|
|
|
|
image_file_data = []
|
|
|
|
images_field_str = re.sub(r'[\[\]]', '', fields_str)
|
|
|
|
images_field_keys = [v.strip() for v in images_field_str.strip().split(',')]
|
|
|
|
for field_key in images_field_keys:
|
|
|
|
if field_key in task.data:
|
|
|
|
v = task.data[field_key]
|
|
|
|
file_ids = v if isinstance(v, list) else [v]
|
|
|
|
|
|
|
|
for file_id in file_ids:
|
|
|
|
if isinstance(file_id, str) and file_id.isnumeric():
|
|
|
|
file_id = int(file_id)
|
|
|
|
|
|
|
|
if file_id is not None and isinstance(file_id, int):
|
|
|
|
if not task.workflow.data[WorkflowProcessor.VALIDATION_PROCESS_KEY]:
|
|
|
|
# Get the actual image data
|
|
|
|
image_file_model = session.query(FileModel).filter_by(id=file_id).first()
|
|
|
|
image_file_data_model = FileService.get_file_data(file_id, image_file_model)
|
|
|
|
if image_file_data_model is not None:
|
|
|
|
image_file_data.append(image_file_data_model)
|
|
|
|
|
|
|
|
else:
|
|
|
|
raise ApiError(
|
|
|
|
code="not_a_file_id",
|
|
|
|
message="The CompleteTemplate script requires 2-3 arguments. The third argument should "
|
|
|
|
"be a comma-delimited list of File IDs")
|
2020-05-20 04:12:48 +00:00
|
|
|
|
2020-05-20 04:10:32 +00:00
|
|
|
return image_file_data
|
|
|
|
|
2020-05-20 01:51:54 +00:00
|
|
|
def make_template(self, binary_stream, context, image_file_data=None):
|
2020-02-29 22:22:38 +00:00
|
|
|
doc = DocxTemplate(binary_stream)
|
2020-05-18 15:55:10 +00:00
|
|
|
doc_context = copy.deepcopy(context)
|
|
|
|
doc_context = self.rich_text_update(doc_context)
|
2020-05-20 01:51:54 +00:00
|
|
|
doc_context = self.append_images(doc, doc_context, image_file_data)
|
2020-02-12 16:07:01 +00:00
|
|
|
jinja_env = jinja2.Environment(autoescape=True)
|
2021-05-06 01:36:57 +00:00
|
|
|
try:
|
|
|
|
doc.render(doc_context, jinja_env)
|
|
|
|
except Exception as e:
|
|
|
|
print (e)
|
2020-02-10 21:19:23 +00:00
|
|
|
target_stream = BytesIO()
|
|
|
|
doc.save(target_stream)
|
2020-05-20 01:51:54 +00:00
|
|
|
target_stream.seek(0) # move to the beginning of the stream.
|
2020-02-10 21:19:23 +00:00
|
|
|
return target_stream
|
|
|
|
|
2020-05-20 01:51:54 +00:00
|
|
|
def append_images(self, template, context, image_file_data):
|
|
|
|
context['images'] = {}
|
|
|
|
if image_file_data is not None:
|
|
|
|
for file_data_model in image_file_data:
|
|
|
|
fm = file_data_model.file_model
|
|
|
|
if fm is not None:
|
|
|
|
context['images'][fm.id] = {
|
|
|
|
'name': fm.name,
|
|
|
|
'url': '/v1.0/file/%s/data' % fm.id,
|
|
|
|
'image': self.make_image(file_data_model, template)
|
|
|
|
}
|
|
|
|
|
|
|
|
return context
|
|
|
|
|
|
|
|
def make_image(self, file_data_model, template):
|
|
|
|
return InlineImage(template, BytesIO(file_data_model.data), width=Inches(6.5))
|
|
|
|
|
2020-05-18 15:55:10 +00:00
|
|
|
def rich_text_update(self, context):
|
|
|
|
"""This is a bit of a hack. If we find that /n characters exist in the data, we want
|
|
|
|
these to come out in the final document without requiring someone to predict it in the
|
|
|
|
template. Ideally we would use the 'RichText' feature of the python-docx library, but
|
|
|
|
that requires we both escape it here, and in the Docx template. There is a thing called
|
|
|
|
a 'listing' in python-docx library that only requires we use it on the way in, and the
|
|
|
|
template doesn't have to think about it. So running with that for now."""
|
|
|
|
# loop through the content, identify anything that has a newline character in it, and
|
|
|
|
# wrap that sucker in a 'listing' function.
|
|
|
|
if isinstance(context, dict):
|
|
|
|
for k, v in context.items():
|
|
|
|
context[k] = self.rich_text_update(v)
|
|
|
|
elif isinstance(context, list):
|
|
|
|
for i in range(len(context)):
|
|
|
|
context[i] = self.rich_text_update(context[i])
|
|
|
|
elif isinstance(context, str) and '\n' in context:
|
|
|
|
return Listing(context)
|
|
|
|
return context
|