EPM v1.0 — PDF/A Compliance Checklist for Implementers¶
This checklist covers what a producer must handle to embed an EPM v1.0 manifest in a host PDF and legitimately claim PDF/A-3 or PDF/A-4f conformance.
EPM itself does not define PDF internals. This checklist identifies the PDF-layer obligations that EPM producers must satisfy — obligations that packaging libraries (such as pikepdf) may handle partially but not completely.
Shared requirements (PDF/A-3 and PDF/A-4f)¶
These obligations apply regardless of which conformance level is targeted.
Host PDF¶
-
[ ] The host PDF must not carry a PDF
Encryptdictionary. Any form of PDF-level encryption — password or certificate — voids PDF/A conformance. Application-layer encryption of the EPM payload (declared viapayload.encoding.encryption) is unaffected by this prohibition; it operates entirely within the embedded JSON bitstream. -
[ ] The host PDF must carry XMP metadata declaring its conformance level. The
pdfaid:partandpdfaid:conformancefields in thehttp://www.aiim.org/pdfa/ns/id/namespace must be set correctly. Packaging libraries typically do not write this automatically; the producer is responsible. -
PDF/A-3:
pdfaid:part = 3,pdfaid:conformance = B(orUorA) - PDF/A-4f:
pdfaid:part = 4,pdfaid:conformance = F
EPM JSON file — embedded file stream¶
-
[ ] The EPM manifest must be serialized as UTF-8 JSON and embedded as a PDF embedded file stream.
-
[ ] The embedded file stream dictionary must include a
Subtypekey whose value is the MIME typeapplication/json. -
[ ] The embedded file stream dictionary must include a
Paramsdictionary containing at minimum aModDatekey (a PDF date string for the file's last modification date). -
[ ] The file specification dictionary must include an
AFRelationshipkey. The value must be one of the closed list defined in ISO 19005-3 Annex E:Source,Data,Alternative,Supplement,Unspecified. (EncryptedPayload,FormData, andSchemaare PDF 2.0 additions available in PDF/A-4f but not in PDF/A-3.) For EPM,DataorSupplementare typically appropriate;Unspecifiedis valid when the relationship is not characterised. -
[ ] The file specification dictionary should include a
Desckey with a human-readable description of the EPM manifest. Recommended by ISO 19005-3; aids inspection tooling.
Document-level AF array — not written automatically by most libraries¶
- [ ] The document catalog must contain an
AFarray (Associated Files) that references the file specification dictionary of the EPM manifest. This is what makes the embedded file an "Associated File" under the standard and what EPM's discovery mechanism depends on.
Most PDF packaging libraries, including pikepdf, write the EmbeddedFiles
name tree entry automatically but do not write the catalog-level AF
array. Producers must add this explicitly.
Example (pikepdf):
python
pdf.Root.AF = pdf.make_indirect(
pikepdf.Array([filespec.obj])
)
EmbeddedFiles name tree¶
- [ ] The EPM JSON file must appear in the
EmbeddedFilesname tree in the document catalog'sNamesdictionary. Most packaging libraries handle this automatically when using their attachments API (e.g.,pdf.attachments['epm.json'] = filespecin pikepdf).
PDF/A-3 specific requirements¶
PDF/A-3 is based on ISO 32000-1 (PDF 1.7). The Associated Files mechanism was introduced in PDF/A-3 and is defined there as an extension to PDF 1.7.
-
[ ]
AFRelationshipvalue must be from the closed PDF/A-3 list:Source,Data,Alternative,Supplement,Unspecified. The PDF 2.0 additions (EncryptedPayload,FormData,Schema) are not valid in PDF/A-3. -
[ ] The
AFentry in the document catalog must reference only embedded files, never external file references. PDF/A-3 prohibits external file references for associated files. -
[ ] The
AFkey may only appear in the locations explicitly identified in ISO 19005-3 Table E.1: document catalog, page dictionary, image XObject, form XObject, logical structure element, annotation dictionary, or marked content section. TheDPartlocation (available in PDF 2.0) is not permitted.
PDF/A-4f specific requirements¶
PDF/A-4f is based on ISO 32000-2 (PDF 2.0). Associated Files are native to the base PDF 2.0 specification; the embedding model is the same but the normative grounding is stronger.
-
[ ] The document catalog must contain an
EmbeddedFileskey in theNamesdictionary. PDF/A-4f explicitly requires this; it is not merely recommended. A PDF/A-4f file without anEmbeddedFilesentry in the name dictionary is non-conformant even if files are associated via theAFarray. -
[ ] The additional
AFRelationshipvalues introduced in PDF 2.0 —EncryptedPayload,FormData,Schema— are valid in PDF/A-4f. Note:EncryptedPayloaddescribes a PDF-level encrypted payload construct defined in ISO 32000-2 and is distinct from EPM's application-layerpayload.encoding.encryptiondeclaration. Do not useEncryptedPayloadas theAFRelationshipfor an EPM manifest; the EPM JSON is not a PDF encrypted payload object. -
[ ] External file references remain prohibited in PDF/A-4f, consistent with PDF/A-3.
Encryption boundary — summary¶
| Layer | PDF/A-3 | PDF/A-4f |
|---|---|---|
PDF Encrypt dictionary on host |
Prohibited | Prohibited |
AFRelationship = EncryptedPayload |
Not valid (PDF 1.7) | Valid but not applicable to EPM |
EPM payload.encoding.encryption |
Permitted | Permitted |
| Embedded JSON file itself encrypted | Prohibited (voids PDF/A conformance) | Prohibited |
The EPM payload may be application-layer encrypted. The EPM JSON wrapper, the embedded file stream, and the host PDF must all be unencrypted.
Validation¶
Neither PDF/A-3 nor PDF/A-4f conformance is self-certifying. Producers should validate output using an independent validator before claiming conformance.
-
veraPDF is the reference open-source PDF/A validator and covers PDF/A-1 through PDF/A-4. For EPM implementers, this means both PDF/A-3 and PDF/A-4f host documents can be validated with the same toolchain. Available at verapdf.org. Validation model and rules are documented at docs.verapdf.org/validation.
-
A PDF that carries correct
pdfaidXMP metadata but fails veraPDF validation is not conformant, regardless of the metadata claim. -
For implementation testing, validate at least one intentionally passing and one intentionally failing file per target profile (PDF/A-3 and PDF/A-4f) in CI so that regressions are detected early.
Understanding the encryption boundary¶
The PDF/A encryption prohibition is frequently misread as a blanket ban on any encrypted content inside a PDF/A file. It is not. The prohibition is precisely scoped, and understanding that scope is essential for EPM implementers whose payloads may be application-layer encrypted.
What the standard actually prohibits¶
ISO 19005-3 §6.1.3 (PDF/A-3) and ISO 19005-4 §6.1 (PDF/A-4) both prohibit
use of the PDF Encrypt dictionary as defined in ISO 32000-1 §7.6 and
ISO 32000-2 §7.6 respectively. The Encrypt dictionary is the mechanism by
which a PDF file requires a password or certificate to be opened, decrypted,
and rendered. When present, it applies to the entire file structure.
The prohibition exists for a single, clearly stated reason: long-term preservation requires that a conforming reader can access the document's content without any external dependency. An encrypted PDF requires a key. If that key is lost — which, over archival timescales, is likely — the document becomes permanently inaccessible. The standard eliminates this risk by prohibiting the mechanism entirely.
This intent is confirmed by multiple authoritative sources:
-
The Library of Congress format description for PDF/A-3 states that PDF/A "differs from PDF by prohibiting features unsuitable for long-term archiving, such as font linking and encryption," with encryption understood as the PDF access-control mechanism. (loc.gov)
-
The NDSA 2014 report The Benefits and Risks of the PDF/A-3 File Format for Archival Institutions frames the preservation concern as future accessibility of the PDF container and its visible content — not the internal state of embedded file bitstreams. (digitalpreservation.gov)
-
The PDF Association's PDF/A in a Nutshell 2.0 describes the prohibition as applying to the document-level encryption mechanism, consistent with the self-containment and independence principles underlying all PDF/A versions.
What the standard does not govern¶
The PDF/A encryption prohibition has no jurisdiction over the content of
embedded file bitstreams. A PDF/A-3 or PDF/A-4f validator inspects the PDF
file structure — dictionaries, streams, cross-reference tables, and the
Encrypt entry. It does not inspect, parse, or evaluate the semantic content
of embedded files.
An embedded file whose bytes happen to be encrypted at the application layer
is, from the PDF/A standard's perspective, an opaque bitstream. Its internal
state is outside the standard's scope. The standard requires only that the
file specification dictionary carries a MIME type, a ModDate, and an
AFRelationship — nothing about the content of the bitstream itself.
This is the same principle that permits embedding a password-protected ZIP archive, an AES-encrypted binary, or any other format whose internal structure the standard cannot and does not evaluate.
How EPM uses this boundary¶
EPM v1.0 places encrypted content entirely on the application side of this boundary. When a payload is encrypted:
- The encryption is performed by the EPM producer before the payload is
base64-encoded into
payload.content. - The encrypted bytes are then base64-encoded — producing a plain text string.
- That plain text string lives inside a JSON field inside the EPM manifest.
- The EPM manifest is a plain, unencrypted JSON file.
- That JSON file is embedded as a plain, unencrypted PDF embedded file stream.
- The host PDF carries no
Encryptdictionary.
At every level that the PDF/A standard examines, the content is fully
accessible and unencrypted. The encryption exists only within the semantic
content of payload.content — a region the standard neither inspects nor
restricts.
EPM declares the encryption explicitly via payload.encoding.encryption,
providing consumers with the information they need to reverse the transform.
This declaration is part of the EPM specification (§11.5), not a PDF/A
requirement. It is good practice regardless of compliance context.
The EncryptedPayload AFRelationship — a potential source of confusion¶
PDF 2.0 (ISO 32000-2) introduced a new AFRelationship value:
EncryptedPayload. This value is available in PDF/A-4f (which is based on
PDF 2.0) but not in PDF/A-3 (which is based on PDF 1.7).
The name may suggest a connection to EPM's encrypted payload model. There is
none. EncryptedPayload is a PDF 2.0 construct that describes a specific
PDF-native encrypted object structure defined in ISO 32000-2 §7.6.7 — a
mechanism for embedding encrypted content using PDF's own encryption
infrastructure. It is unrelated to EPM's application-layer encryption model
and is not an appropriate AFRelationship value for an EPM manifest.
EPM manifests should use Data, Supplement, or Unspecified as the
AFRelationship value, regardless of whether the payload is encrypted.
Summary¶
The following statements are all simultaneously true and standards-consistent:
- A PDF/A-3 or PDF/A-4f file must not use the PDF
Encryptdictionary. - An EPM manifest embedded in a PDF/A-3 or PDF/A-4f file must not itself be encrypted at the PDF stream level.
- An EPM payload carried inside
payload.contentmay be application-layer encrypted without affecting PDF/A conformance. - A producer embedding an EPM with an encrypted payload can legitimately claim PDF/A-3 or PDF/A-4f conformance, provided all other requirements in this checklist are satisfied.
Normative references¶
- ISO 19005-3:2012 — PDF/A-3 (iso.org)
- ISO 19005-4:2020 — PDF/A-4 (iso.org)
- ISO 32000-1:2008 — PDF 1.7 (iso.org)
- ISO 32000-2:2020 — PDF 2.0 (iso.org)
- EPM v1.0 Specification (epmstandard.org/spec)
- EPM v1.0 PDF Embedding Profile (epmstandard.org/pdf-profile)
Informative references¶
- PDF/A-3 Format Description — Library of Congress (loc.gov)
- PDF/A-4f Format Description — Library of Congress (loc.gov)
- The Benefits and Risks of the PDF/A-3 File Format for Archival Institutions — National Digital Stewardship Alliance, February 2014 (digitalpreservation.gov)
- PDF 2.0 Application Note 002: Associated Files — PDF Association (pdfa.org)
- Understanding Private Data in PDF/A — PDF Association Application Note, June 2024 (pdfa.org)
- veraPDF — Reference PDF/A validator (verapdf.org)