Skip to content

EPM v1.0 — PDF/A Compliance Checklist for Implementers

This checklist covers what a producer must handle to embed an EPM v1.0 manifest in a host PDF and legitimately claim PDF/A-3 or PDF/A-4f conformance.

EPM itself does not define PDF internals. This checklist identifies the PDF-layer obligations that EPM producers must satisfy — obligations that packaging libraries (such as pikepdf) may handle partially but not completely.


Shared requirements (PDF/A-3 and PDF/A-4f)

These obligations apply regardless of which conformance level is targeted.

Host PDF

  • [ ] The host PDF must not carry a PDF Encrypt dictionary. Any form of PDF-level encryption — password or certificate — voids PDF/A conformance. Application-layer encryption of the EPM payload (declared via payload.encoding.encryption) is unaffected by this prohibition; it operates entirely within the embedded JSON bitstream.

  • [ ] The host PDF must carry XMP metadata declaring its conformance level. The pdfaid:part and pdfaid:conformance fields in the http://www.aiim.org/pdfa/ns/id/ namespace must be set correctly. Packaging libraries typically do not write this automatically; the producer is responsible.

  • PDF/A-3: pdfaid:part = 3, pdfaid:conformance = B (or U or A)

  • PDF/A-4f: pdfaid:part = 4, pdfaid:conformance = F

EPM JSON file — embedded file stream

  • [ ] The EPM manifest must be serialized as UTF-8 JSON and embedded as a PDF embedded file stream.

  • [ ] The embedded file stream dictionary must include a Subtype key whose value is the MIME type application/json.

  • [ ] The embedded file stream dictionary must include a Params dictionary containing at minimum a ModDate key (a PDF date string for the file's last modification date).

  • [ ] The file specification dictionary must include an AFRelationship key. The value must be one of the closed list defined in ISO 19005-3 Annex E: Source, Data, Alternative, Supplement, Unspecified. (EncryptedPayload, FormData, and Schema are PDF 2.0 additions available in PDF/A-4f but not in PDF/A-3.) For EPM, Data or Supplement are typically appropriate; Unspecified is valid when the relationship is not characterised.

  • [ ] The file specification dictionary should include a Desc key with a human-readable description of the EPM manifest. Recommended by ISO 19005-3; aids inspection tooling.

Document-level AF array — not written automatically by most libraries

  • [ ] The document catalog must contain an AF array (Associated Files) that references the file specification dictionary of the EPM manifest. This is what makes the embedded file an "Associated File" under the standard and what EPM's discovery mechanism depends on.

Most PDF packaging libraries, including pikepdf, write the EmbeddedFiles name tree entry automatically but do not write the catalog-level AF array. Producers must add this explicitly.

Example (pikepdf): python pdf.Root.AF = pdf.make_indirect( pikepdf.Array([filespec.obj]) )

EmbeddedFiles name tree

  • [ ] The EPM JSON file must appear in the EmbeddedFiles name tree in the document catalog's Names dictionary. Most packaging libraries handle this automatically when using their attachments API (e.g., pdf.attachments['epm.json'] = filespec in pikepdf).

PDF/A-3 specific requirements

PDF/A-3 is based on ISO 32000-1 (PDF 1.7). The Associated Files mechanism was introduced in PDF/A-3 and is defined there as an extension to PDF 1.7.

  • [ ] AFRelationship value must be from the closed PDF/A-3 list: Source, Data, Alternative, Supplement, Unspecified. The PDF 2.0 additions (EncryptedPayload, FormData, Schema) are not valid in PDF/A-3.

  • [ ] The AF entry in the document catalog must reference only embedded files, never external file references. PDF/A-3 prohibits external file references for associated files.

  • [ ] The AF key may only appear in the locations explicitly identified in ISO 19005-3 Table E.1: document catalog, page dictionary, image XObject, form XObject, logical structure element, annotation dictionary, or marked content section. The DPart location (available in PDF 2.0) is not permitted.


PDF/A-4f specific requirements

PDF/A-4f is based on ISO 32000-2 (PDF 2.0). Associated Files are native to the base PDF 2.0 specification; the embedding model is the same but the normative grounding is stronger.

  • [ ] The document catalog must contain an EmbeddedFiles key in the Names dictionary. PDF/A-4f explicitly requires this; it is not merely recommended. A PDF/A-4f file without an EmbeddedFiles entry in the name dictionary is non-conformant even if files are associated via the AF array.

  • [ ] The additional AFRelationship values introduced in PDF 2.0 — EncryptedPayload, FormData, Schema — are valid in PDF/A-4f. Note: EncryptedPayload describes a PDF-level encrypted payload construct defined in ISO 32000-2 and is distinct from EPM's application-layer payload.encoding.encryption declaration. Do not use EncryptedPayload as the AFRelationship for an EPM manifest; the EPM JSON is not a PDF encrypted payload object.

  • [ ] External file references remain prohibited in PDF/A-4f, consistent with PDF/A-3.


Encryption boundary — summary

Layer PDF/A-3 PDF/A-4f
PDF Encrypt dictionary on host Prohibited Prohibited
AFRelationship = EncryptedPayload Not valid (PDF 1.7) Valid but not applicable to EPM
EPM payload.encoding.encryption Permitted Permitted
Embedded JSON file itself encrypted Prohibited (voids PDF/A conformance) Prohibited

The EPM payload may be application-layer encrypted. The EPM JSON wrapper, the embedded file stream, and the host PDF must all be unencrypted.


Validation

Neither PDF/A-3 nor PDF/A-4f conformance is self-certifying. Producers should validate output using an independent validator before claiming conformance.

  • veraPDF is the reference open-source PDF/A validator and covers PDF/A-1 through PDF/A-4. For EPM implementers, this means both PDF/A-3 and PDF/A-4f host documents can be validated with the same toolchain. Available at verapdf.org. Validation model and rules are documented at docs.verapdf.org/validation.

  • A PDF that carries correct pdfaid XMP metadata but fails veraPDF validation is not conformant, regardless of the metadata claim.

  • For implementation testing, validate at least one intentionally passing and one intentionally failing file per target profile (PDF/A-3 and PDF/A-4f) in CI so that regressions are detected early.


Understanding the encryption boundary

The PDF/A encryption prohibition is frequently misread as a blanket ban on any encrypted content inside a PDF/A file. It is not. The prohibition is precisely scoped, and understanding that scope is essential for EPM implementers whose payloads may be application-layer encrypted.

What the standard actually prohibits

ISO 19005-3 §6.1.3 (PDF/A-3) and ISO 19005-4 §6.1 (PDF/A-4) both prohibit use of the PDF Encrypt dictionary as defined in ISO 32000-1 §7.6 and ISO 32000-2 §7.6 respectively. The Encrypt dictionary is the mechanism by which a PDF file requires a password or certificate to be opened, decrypted, and rendered. When present, it applies to the entire file structure.

The prohibition exists for a single, clearly stated reason: long-term preservation requires that a conforming reader can access the document's content without any external dependency. An encrypted PDF requires a key. If that key is lost — which, over archival timescales, is likely — the document becomes permanently inaccessible. The standard eliminates this risk by prohibiting the mechanism entirely.

This intent is confirmed by multiple authoritative sources:

  • The Library of Congress format description for PDF/A-3 states that PDF/A "differs from PDF by prohibiting features unsuitable for long-term archiving, such as font linking and encryption," with encryption understood as the PDF access-control mechanism. (loc.gov)

  • The NDSA 2014 report The Benefits and Risks of the PDF/A-3 File Format for Archival Institutions frames the preservation concern as future accessibility of the PDF container and its visible content — not the internal state of embedded file bitstreams. (digitalpreservation.gov)

  • The PDF Association's PDF/A in a Nutshell 2.0 describes the prohibition as applying to the document-level encryption mechanism, consistent with the self-containment and independence principles underlying all PDF/A versions.

What the standard does not govern

The PDF/A encryption prohibition has no jurisdiction over the content of embedded file bitstreams. A PDF/A-3 or PDF/A-4f validator inspects the PDF file structure — dictionaries, streams, cross-reference tables, and the Encrypt entry. It does not inspect, parse, or evaluate the semantic content of embedded files.

An embedded file whose bytes happen to be encrypted at the application layer is, from the PDF/A standard's perspective, an opaque bitstream. Its internal state is outside the standard's scope. The standard requires only that the file specification dictionary carries a MIME type, a ModDate, and an AFRelationship — nothing about the content of the bitstream itself.

This is the same principle that permits embedding a password-protected ZIP archive, an AES-encrypted binary, or any other format whose internal structure the standard cannot and does not evaluate.

How EPM uses this boundary

EPM v1.0 places encrypted content entirely on the application side of this boundary. When a payload is encrypted:

  • The encryption is performed by the EPM producer before the payload is base64-encoded into payload.content.
  • The encrypted bytes are then base64-encoded — producing a plain text string.
  • That plain text string lives inside a JSON field inside the EPM manifest.
  • The EPM manifest is a plain, unencrypted JSON file.
  • That JSON file is embedded as a plain, unencrypted PDF embedded file stream.
  • The host PDF carries no Encrypt dictionary.

At every level that the PDF/A standard examines, the content is fully accessible and unencrypted. The encryption exists only within the semantic content of payload.content — a region the standard neither inspects nor restricts.

EPM declares the encryption explicitly via payload.encoding.encryption, providing consumers with the information they need to reverse the transform. This declaration is part of the EPM specification (§11.5), not a PDF/A requirement. It is good practice regardless of compliance context.

The EncryptedPayload AFRelationship — a potential source of confusion

PDF 2.0 (ISO 32000-2) introduced a new AFRelationship value: EncryptedPayload. This value is available in PDF/A-4f (which is based on PDF 2.0) but not in PDF/A-3 (which is based on PDF 1.7).

The name may suggest a connection to EPM's encrypted payload model. There is none. EncryptedPayload is a PDF 2.0 construct that describes a specific PDF-native encrypted object structure defined in ISO 32000-2 §7.6.7 — a mechanism for embedding encrypted content using PDF's own encryption infrastructure. It is unrelated to EPM's application-layer encryption model and is not an appropriate AFRelationship value for an EPM manifest.

EPM manifests should use Data, Supplement, or Unspecified as the AFRelationship value, regardless of whether the payload is encrypted.

Summary

The following statements are all simultaneously true and standards-consistent:

  • A PDF/A-3 or PDF/A-4f file must not use the PDF Encrypt dictionary.
  • An EPM manifest embedded in a PDF/A-3 or PDF/A-4f file must not itself be encrypted at the PDF stream level.
  • An EPM payload carried inside payload.content may be application-layer encrypted without affecting PDF/A conformance.
  • A producer embedding an EPM with an encrypted payload can legitimately claim PDF/A-3 or PDF/A-4f conformance, provided all other requirements in this checklist are satisfied.

Normative references

Informative references

  • PDF/A-3 Format Description — Library of Congress (loc.gov)
  • PDF/A-4f Format Description — Library of Congress (loc.gov)
  • The Benefits and Risks of the PDF/A-3 File Format for Archival Institutions — National Digital Stewardship Alliance, February 2014 (digitalpreservation.gov)
  • PDF 2.0 Application Note 002: Associated Files — PDF Association (pdfa.org)
  • Understanding Private Data in PDF/A — PDF Association Application Note, June 2024 (pdfa.org)
  • veraPDF — Reference PDF/A validator (verapdf.org)