Solved PDF Formatting Error When Assembling DOCX Files Using docassemble
Recently, I started using docassemble again to automate legal documents for myself and other lawyers. Today, I solved a formatting error that was occurring when I was asking docassemble to include several DOCX templates into a master DOCX template. The final DOCX file that docassemble generated was formatted perfectly, but the PDF was not. The PDF had several extra page breaks that were turning what should have been a two-page final product into four pages.
While researching the problem, I learned that docassemble uses LibreOffice by default to convert DOCX files into PDFs, and sometimes there are incompatibilities between Microsoft Word and LibreOffice. docassemble allows developers to use ConvertAPI or CloudCovert instead, but these solutions are not free.
I asked for help on docassemble’s Slack group. I’m grateful for the advice that Quinten Steenhuis gave, which included:
LibreOffice is free, and you can usually troubleshoot layout issues by editing your file in LibreOffice and making sure it works there before you upload it to Docassemlbe
I downloaded LibreOffice, but I could not find any layout issues.
I was ready to give up and give users the option to download only the final DOCX file when I remembered reading the following in docassemble’s documentation:
Note that it is important to use the
p
form of Jinja2 markup, or else the contents of the included document will not be visible. The template text must also be by itself on a line, and theinclude_docx_template()
function must be by itself within a Jinja2p
tag, and not combined with the+
operator with any other text.
The error was mine, not docassemble’s nor LibreOffice’s. As the following screen capture shows, I had Section Break (Next Page)
in the same paragraph as the include instruction:
I placed Section Break (Next Page)
on its own line (as the next image shows), and now the final PDF file that docassemble generates has no layout problems:
The takeaway might seem to be just this: Read the documentation. But it’s also this: Reach out to others when you run into something unexpected. And this: Reread the documentation, and reread it again when you run into something unexpected.
Update 12/3/2021
I ran into a related problem when include children templates into a master, parent template. I was trying to insert a DOCX file with one image that takes up the entire page. I was getting extra pages not only in the PDF, but also in the DOCX file.
To discover the source of the error, I reinserted each child template one-by-one and examined both the PDF and the DOCX that Docassemble generated.
I learned that docassemble’s inclusion syntax was adding an extra paragraph break. To avoid having an extra page, the image must be small enough to allow at least two paragraph breaks below it.