Creation of PDF/A-Compliant Documents: Introductory Information on the PDF/A Standard
1 Introduction
The creation of PDF/A-compliant files is not easy in some cases and causes misunderstanding when either no PDF file can be created according to the PDF/A standard or, if it works, this file does not always correspond to the representation of the original file. There are certain reasons for this, which are explained here in order to clarify legitimate questions that may arise in advance when creating PDF/A-compliant files
In this document, information sources on PDF/A are first compiled. It then explains what PDF/A is. The main aim of this explanation is to describe the reasons why the PDF/A standard can be difficult to achieve in some cases. This is supplemented by an explanation of the different specifications of the PDF/A standard. Finally, validation options are presented and the challenge of subsequent conversion is described.
2 Literature/ Information
There is a lot of online information on the PDF/A standard. The PDF ASSOCIATION and the PDF/A COMPETENCE CENTER offer some information on this, in particular PDF/A KOM PAKT (Drümmer, Oettler and Seggern, 2007) and PDF/A KOMPAKT 2.0 (Oettler, 2013). FAQs, further links and other information can also be found here. Of course, WIKIPEDIA also offers information on the PDF/A standard. There is a good overview there, but it is not enough to grasp the concept of the standard. The listed properties of the PDF/A variants are important (especially for the creation of PDF/A-compliant files).
VALIDATION OF PDF/A (2011) provides an overview of the validation of PDF/A-compliant files. Friese's (2014) discussion of JHOVE as a validation tool should also be mentioned in this context.
There are also many instructions on how to create PDF/A-compliant files. As the creation of PDF/A-compliant files from LaTeX is very difficult, here is an example of a page that outlines the problem area quite well.
Before explaining how PDF/A-compliant files can be created, the following section explains what PDF/A means. Without an understanding of this standard, the creation of PDF/A-compliant files can be very difficult.
3 What is PDF/A?
PDF/A is a special PDF standard that requires certain document properties from a PDF file in order to guarantee that the document can be reproduced in a few decades' time. If a document contains content that does not comply with the standard, a PDF/A-compliant file cannot be created.
"The aim of the PDF/A standard is to ensure that PDF documents can be created whose visual appearance is preserved over time" (Drümmer, Oettler and Seggern, 2007, p.9).
What makes a PDF/A? PDF/A compact gives the following examples of what must and must not be included:
,,Required: a "must" is something like complete access to all elements belonging to the document. An example: Fonts must be embedded, a reference to the font provided is not sufficient. If a reader does not have the required font on their computer in 10 years' time, special characters or symbols, for example, may not be displayed." (Drümmer, Oettler and Seggern, 2007, p.9)
"Prohibited: There are also PDF features that should be avoided. They are prohibited because they undermine the desired consistency, such as interactive elements or PDF layers. Such features prevent the uniqueness that a valid PDF/A file must achieve. In the case of a PDF document with layers, the question may arise in years to come as to which layer should apply and which should not. This decision must be made now, at the time of PDF creation." (Drümmer, Oettler and Seggern, 2007, p.9)
The Wikipedia article on the PDF/A standard provides more detailed information here.
In this document, PDF files that comply with the PDF/A standard are now referred to as PDF/A files. However, a PDF/A file is not a separate format, but a "normal" PDF file that must meet certain standards. However, confirmation that a file is a PDF/A file is not provided by the fact that it has a specific file extension or that it can be opened in a specific program.
An indication that at least one attempt has been made to create a PDF/A file is given by the blue bar in PDF files, which contains the following text:
- The opened file complies with the PDF/A standard. It has been opened read-only to prevent changes (ADOBE ACROBAT PRO)
- This file requires conformity with the PDF/A standard and has been opened read-only to prevent changes (ADOBE READER)
To ensure that the file really conforms to the PDF/A standard, the conformity must be checked in ADOBE ACROBAT PRO (or another program).
Ultimately, a PDF file is only PDF-compliant if it is ensured that it withstands certain validation routines. However, it is not easy for the "average consumer" to check whether a PDF file is a PDF/A file because
- Validation programs must be available
- Programs that can create PDF/A files must be available.
Further information on this can be found in the Validation section.
In addition, PDF/A is not just PDF/A. There are several versions that allow different content:
- PDF/A-1b
- PDF/A-la
- PDF/A-2a
- PDF/A-2b
- PDF/A-2u
There are of course reasons for the different versions, which are important for the correct creation of the file. The differences are therefore explained here with the aim of formulating a practical guide to PDF/A creation.
---- Note: If a document is still to be displayed correctly decades from now, it is best to create the document in such a way that a PDF/A-compliant file can be created without any problems.-----
4 PDF/A versions
4.1 PDF/A-1 - PDF/A-2
One of the main features is the underlying PDF version:
- PDF/A-1 is based on PDF 1-4
- PDF/A-2 is based on PDF 1.7
This means that PDF/A-1 files only allow content that may be contained in PDF 1-4 files, whereas PDF/A-2 files also allow content that may be contained in PDF 1.7 files. If objects with transparency are contained in the file, PDF/A-2 should be selected, for example. An overview of the PDF specifications can be found in PDF/A compact (Drümmer, Oettler and Seggern, 2007, p.8).
Even if, in contrast to PDF/A-1, some additional elements are permitted, PDF/A-2 is criticized for other properties. It is therefore not possible to formulate a clear recommendation as to which PDF/A version should be used. It depends very much on the contents of the source file and the programs with which the source file was created or with which programs the PDF is to be created. If possible, the following hierarchy should be observed, with PDF/A-2a being the most desirable:
- PDF/A-2a
- PDF/A-2b
- PDF/A-1a
- PDF/A-1b
But what do these standards mean? These specifications are explained in the following section.
4.2 PDF/A-1b, PDF/A-1a
There are two conformance levels of PDF/A-1:
- a (Level A (Accessible) conformance: both unambiguous visual reproducibility and mappability of text according to Unicode and content structuring of the document so that it can be read aloud by a screen reader in terms of accessibility).
- b (Level B (Basic) conformance: clear visual reproducibility)
An overview of the differences between the two conformance levels can be found in PDF/A compact (Drümmer, Oettler and Seggern, 2007, p.13). Reference can also be made to Wikipedia .
An important feature of PDF/A-1b is that only purely visual reproducibility is guaranteed (Drümmer, Oettler and Seggern, 2007, p.13). It is therefore possible that text passages
- are not searchable
- cannot be copied and pasted into other documents.
Particularly in the case of subsequent conversions, it may be the case that these defects are not immediately noticeable, as they are displayed correctly to the reader's eye. Similar behavior occurs when PDF/A files are created via GHOSTSCRIPT or POSTSCRIPT.
4.3 PDF/A-2b, PDF/A-2a
There are three conformance levels of PDF/A-2:
- - PDF/A-2a: fully implements all requirements of ISO 19005-2, in particular all structural and semantic properties.
- - PDF/A-2b: minimum requirement for a PDF/A-2 file, guarantees the correct appearance of the document for long-term archiving.
- - PDF/A-2u: like 2b, plus: the entire text is mapped in Unicode so that the entire text can be indexed and displayed.
5 Validation
In order to determine whether a PDF/A file is correct, or to determine which content causes the creation of a PDF/A to fail, the file can be validated. There are various programs available for this purpose. This document only describes validation programs that are free to use or that are included in relatively widespread paid software.
5.1 veraPDF
"veraPDF is an open source conformance checker for PDF/A files. lt is designed to help archives and libraries check that their PDF/A collections conform to the appropriate ISO 19005 archiving standard specification."
This program can be downloaded and installed free of charge. Installation instructions can be found in the file veraPDFPDFAConformanceCheckerGUI.pdf on GitHub.
To start the program with the graphical user interface, the file verapdj-gui.bat must be called on Windows systems. This file is usually located in the verapdf folder in the personal user directory if no other location was specified during installation.
The results of the validation can be called up and saved as an HTML or XML file. The respective errors are described in these reports and the places where they occur are named. It is not possible to correct the errors with veraPDF.
5.2 3-Heights™ PDF Validator Online Tool
This online validation program checks whether a PDF file is PDF/A-compliant. All PDF/A specifications can be checked. A log is also output that describes which elements are not PDF/A-compliant. However, the errors are not localized directly in the file, so that the location in the document where the error occurs must be searched for.
5.3 Preflight
Preflight is a validation tool in Adobe Acrobat Professional. This tool has the advantage that the places in the document where the error is located are also displayed. This makes it very easy to make corrections (changing the font, etc.). In addition, the error messages from Preflight are commented on in detail in PDF KOMPAKT (Drümmer, Oettler and Seggern, 2007).
5.4 Jhove
Jhove is a program that can be used to validate PDF files, among other things. It is not really suitable for validating PDF/A files, but it should be listed here for the sake of completeness. Friese (2014) has explained the usability of Jhove quite clearly.
5.5 Others
There is other validation software (PDF/A Competence Center, 2011)1 but these are subject to a charge and cannot be presented here.
5.6 Notes
However, it is possible that certain content cannot be saved as PDF/A. A decision must be made here as to what is more important:
- Preservation of the original content and its presentation -> risk that the document can no longer be displayed after a few years
- Preservation of the content that can be displayed and shown over a long period of time -> loss of the content that prevents this.
6 Conversion
In some cases, it is not possible to create a PDF/A from existing files, for example because the source file no longer exists or because a program cannot create PDF/A files. In these cases, a PDF/A file can only be created by converting a "normal" PDF. It must be expected that some functions or properties will no longer be available, such as
for example:
- Full text searchability
In PDF/A-1b files, fonts that cannot be embedded are inserted as images in order to at least visually preserve the document. This means that the relevant text sections can no longer be searched and are therefore not found in a search.
- Falsification of images
If images in a document do not comply with the PDF/A standard (compression, transparency, etc.), they are no longer displayed correctly.
It is therefore necessary to consider what is more important before conversion:
- Preservation of the original content and its presentation-> risk that the document can no longer be displayed after a few years
- Preservation of the content that can be displayed and shown over a long period of time.
7 Conclusions
Even if it has already been mentioned frequently in the text, it must be emphasized once again that when creating PDF/A files, the most important thing is to be able to display the content for a long time! This is not possible with all content in PDF files, so care must be taken to use "simple" content when creating documents. Otherwise, this file cannot be included in a long-term archive that guarantees that the files can be displayed even after decades. This applies to content such as
- fonts
- graphics
- metadata
For fonts, care should be taken to ensure that
- no proprietary fonts are used
- no special characters are used
- "normal" fonts are used as much as possible
- care is taken to ensure that the respective software actually embeds the font completely and not just as a subgroup
For graphics, care should be taken to ensure that
- no transparency is permitted for PDF/A-1
- JPEG 2000 compression is not permitted for PDF/A-1
- fonts in graphics are often not embedded correctly
It is generally recommended to integrate graphics as a simple image in a PDF/A file. These are readable by humans and offer few sources of error. However, this means that textual content can no longer be found using a full text search or extracted using copy and paste.
For metadata, it can only be recommended to check in the respective creation software that only standard metadata is inserted. These should be PDF/A-compliant. Due to the variety of programs and creation routines, the metadata is quite individual, which prevents general advice on troubleshooting.
8 Literature
Drümmer, 0., Oettler, A. and Seggern, D. von, 2007. PDFA kompakt: digitale Langzeitarchivierung mit PDF. {online} Callas Software GmbH. Available at:
"><http: www.pdfa.org wp-content uploads o8 pdfa_kompakt_pdfa1b.pdf>.</http:>
Friese, Y., 2014. ensuring long-term availability: PDF validation by JHOVE? / PDF Association. {online} Available at: <http://www.pdfa.org/2014/12/langzeitverfugbarkeitsichern-pdf-validierung-durch-jhove/?lang=de>;">{Accessed 25 Mar. 2015}.
Oettler, A., 2013. PDF/A compact 2.0: PDF for long-term archiving. Available at: <http:<a href="http://www.pdfa.org/wp-content/uploads/2o13/o3/PDFA-kompakt-2_o_screen.pdf>">//www.pdfa.org/wp-content/uploads/2o13/o3/PDFA-kompakt-2_o_screen.pdf>.
PDF/A Competence Center, 2011. validation of PDF/A / PDF Association. {online} Available at: <http://www.pdfa.org/2011/09/validierung-von-pdfa/?lang=de>;">{Accessed 27Mar. 2015}.