Create PDF/A Documents with pdfLaTeX

To guarantee long-term archiving of PDF documents (Portable Document Format), ISO standards called PDF/A have been established in different levels. Especially when publishing scientific articles or theses, PDF/A is often required by publishers, cf. qucosa. An important premise is the absence of references to external resources, i. e., documents must be self-contained. Additionally, the use of JavaScript and encryption is not allowed. PDF/A-1 does not support transparency, PDF/A-2 allows it. Find detailed description of PDF/A levels in the corresponding Wikipedia article.

Validate PDF Documents

When checking documents with confidential contents, the PDF files cannot be sent to an online validation service. Offline validation is provided by e. g. 3-Heights™ PDF Validator. You may generate a 30 days evaluation key and add it to the license manager. The library may be used with different interfaces. To access it with Java, VALA.jar has to be on the classpath, the system variable java.library.path must point to the directory containing PDFValidatorAPI.dll and the PDF file to be validated should be handed over as an argument, as long as you adhere to the recommended architecture and the apps provided in the directory Samples. After executing the validation process, the results are shown, indicating problem types and the document page they occurred on. Moreover, the results are grouped in categories. A conveniently formatted output may be achieved by:

System.out.printf("p%d: %s (%dx)\n", err.getPageNo(), err.getMessage(), err.getCount());

Achieve PDF/A Compliance for LaTeX Documents

According to qucosa, the current level worth striving for is PDF/A-2a followed by PDF/A-2b. Since the support of PDF/A-2a in the workflow described below is labeled experimental, we focus on PDF/A-2b. The first step is to check, whether included files – especially vector graphics – include their used fonts and color profiles where applicable. This is achieved best by providing those files themselves as PDF documents. Strategies for some common validation problems are:

PDF/A compliance of the output of the actual TeX document can be achieved by employing the package pdfx. Specify the required level as a package argument: \usepackage[a-2b]{pdfx}. Now you see the blue indicator bar for PDF/A compliance in Adobe Acrobat Reader. Unfortunately, this does not imply, that all requirements are fulfilled already.

PDF/A indicator bar in Adobe Acrobat Reader DC.
PDF/A indicator bar in Adobe Acrobat Reader DC.

To provide mandatory metadata, create a file with the same name as your TeX document, but ending with .xmpdata such as thesis.xmpdata. The elements \Author, \Title, \Keywords, \Subject and \CopyrightURL should be sufficient for the beginning, cf. Add metadata in pdf as type pdf/a. In parallel, you should clean up the metadata specified within hypersetup. The package hyperref is a dependency of pdfx and does not have to be included explicitly.

Special Validation Problems

A consequential error to fonts, which are not embedded, cf. above, is: The CharSet of the font font name must contain the name character name. Fix this by fixing the font embedding. Two other problems, which are fixed in pdfx version 1.5.6, but were reported in earlier versions, are: