Skip to content

EPM v1.0 PDF Embedding Profile

This document is informative. It does not define additional conformance classes beyond EPM v1.0 itself.

Its purpose is to record the current implementation direction for embedding and discovering an EPM v1.0 manifest inside a PDF.

The broader standards effort associated with this draft is being organized under the GitHub organization document-data-standards. This profile does not yet declare a final canonical repository URL for EPM under that organization.

1. Scope

EPM v1.0 is PDF-first, but the core specification intentionally does not redefine PDF internals. This profile narrows that gap by describing a practical PDF integration pattern built on ISO 32000-2 embedded file and Associated Files features.

The profile is intentionally small. It is meant to help implementations converge on the same basic placement and discovery behavior without turning EPM v1.0 into a full PDF engineering standard.

1.1 Minimum Supported PDF Profile

The current minimum supported archival/interchange host profile for EPM is PDF/A-3. Current implementation guidance and compliance documentation in this repository cover PDF/A-3 and PDF/A-4f.

This boundary is intentional. The current EPM profile depends on two ideas together:

  • embedding the EPM as a real embedded file; and
  • associating that embedded file with the document through the Associated Files model.

Earlier versions of PDF introduced related features incrementally, but not the full model EPM currently prefers.

For accessible background on this version history, see the Library of Congress format descriptions for PDF/A-1, PDF/A-3, PDF 1.4, and PDF 1.7.

1.2 Short PDF History Relevant to EPM

  • PDF 1.3 introduced file attachment annotations, which are page-associated attachments typically shown with a paperclip icon.
  • PDF 1.4 introduced the EmbeddedFiles name tree in the document catalog.
  • PDF/A-1 was based on PDF 1.4 and did not provide the arbitrary embedded-file model EPM needs.
  • PDF/A-2 allowed embedded files, but only when those attachments were themselves PDF/A-conforming.
  • PDF/A-3, still based on PDF 1.7, was the first PDF/A version to allow embedding files in arbitrary formats such as JSON or XML.
  • PDF/A-3 also introduced the Associated Files model that allows embedded content to be related to specific PDF objects.
  • PDF 2.0 later standardized Associated Files in the base PDF specification.
  • PDF/A-4, based on PDF 2.0, continued and refined the Associated Files model, making it the modern standard for conforming archival PDFs with embedded content.

For EPM, this means older PDF features such as page-level file attachments are historically relevant, but they are not the preferred interoperability target.

2. Profile Goals

  • Make the EPM discoverable without private conventions.
  • Preserve the idea that the PDF remains the presentation artifact and the EPM remains the machine-readable wrapper.
  • Prefer standard PDF attachment mechanisms over custom object layouts.
  • Keep extraction logic deterministic enough that independent implementations will look in the same place first.

3. Embedded Object Model

The EPM should be serialized as a standalone JSON file and embedded in the PDF as an embedded file stream.

The embedded file should represent the EPM wrapper object itself, not only the decoded payload content.

The manifest should be associated with the PDF through the document-level Associated Files mechanism rather than through page-level association or private metadata alone.

This means the current preferred model is:

  • one PDF document;
  • one or more embedded JSON files, each containing one EPM v1.0 object; and
  • document-level associated-file relationships connecting those JSON files to the host document.

Implementations should embed the manifest as JSON with a stable, predictable filename.

Recommended characteristics:

  • filename: epm.json;
  • media type: application/json; and
  • document-level association rather than page-level association.

The filename recommendation is not normative, but it simplifies inspection and extraction tooling.

4.1 Filename Rules

All EPM filenames SHALL comply with the following cross-platform rules:

  • The filename MUST end with .json.
  • The base name SHOULD consist only of alphanumeric characters (A–Z, a–z, 0–9), hyphens, and underscores. This character set is valid on Windows, Linux, and macOS without OS-specific handling.
  • The base name SHOULD NOT match a Windows reserved device name in any letter case: CON, PRN, AUX, NUL, and COM1COM9, LPT1LPT9. The recommended naming patterns (epm.json, epm-{descriptor}.json) are unlikely to conflict with these names, but this constraint applies to any custom filename.

4.2 Multiple EPM Manifests in One PDF

PDF's EmbeddedFiles name tree uses filenames as unique keys. Embedding two files with the same filename in the same PDF is a PDF validity issue under ISO 32000-2, not merely an EPM convention concern. Producers embedding multiple EPMs in a single PDF MUST use a distinct filename for each.

When multiple EPMs are embedded, the RECOMMENDED naming pattern is:

epm-{descriptor}.json

where {descriptor} is a short, lowercase label reflecting the payload type or domain — for example, epm-report.json, epm-invoice.json, or epm-supplemental.json. This parallels how payload.type conveys content identity: the filename carries a human-readable domain signal while the manifest_id field carries the machine-readable unique identifier.

Consumers MUST NOT rely on filenames alone for EPM discovery or identity. The manifest_id field is the canonical unique identifier for each EPM, and discovery proceeds through Associated Files followed by inspection of the discovery fields (epm_version, manifest_id, payload.type) rather than filename matching.

5. Discovery Order

Consumers that want to locate EPMs in a PDF should use the following discovery order:

  1. Enumerate document-level Associated Files entries that resolve to embedded JSON files.
  2. For each candidate JSON file, read the top-level discovery header fields used for quick identification: epm_version, manifest_id, and payload.type.
  3. Treat candidates with that discovery header as EPM candidates and expose them in a manifest index for user or implementation selection.
  4. If no candidate exposes the discovery header, treat the PDF as not containing discoverable EPMs.

If a consumer chooses to run full schema or semantic validation, that validation is applied per candidate EPM. This profile does not require a PDF-level assertion that all embedded JSON attachments are valid EPMs.

Consumers should not rely on page annotations, XMP metadata, or private name-tree conventions as the primary discovery mechanism for EPM v1.0.

6. Relationship to the Payload

The embedded EPM is the primary object this profile describes.

The payload described by payload.content remains inside the JSON manifest as base64 text. Under the current EPM v1.0 model, the profile does not require the decoded payload to also be embedded as a second PDF attachment.

Implementations may choose to embed additional related files in a PDF for other reasons, but those files are outside the EPM v1.0 wrapper model unless later guidance standardizes them.

This keeps the present model aligned with the core specification:

  • exactly one payload is described by the manifest;
  • the payload is carried inside payload.content; and
  • the PDF embedding layer carries the manifest, not a parallel second transport model.

7. Extraction Expectations

An implementation extracting EPM from a PDF should:

  1. locate candidate manifests according to the discovery order above;
  2. build a manifest index exposing at least manifest_id and payload.type for each candidate;
  3. read selected embedded JSON bytes as EPMs;
  4. optionally validate selected objects against the EPM schema and any chosen supplemental checks;
  5. base64-decode payload.content; and
  6. if declared transforms are to be reversed, decompress first and then decrypt.

If validation is performed, failures should be reported per candidate object rather than as a single PDF-level pass/fail condition.

8. Writer Expectations

An implementation writing EPM into a PDF should:

  1. produce a valid EPM v1.0 JSON object;
  2. serialize it as UTF-8 JSON;
  3. embed that JSON as an embedded file stream;
  4. associate it with the document through document-level Associated Files; and
  5. avoid writing contradictory duplicate copies of the manifest elsewhere in the PDF.

Writers SHALL ensure each embedded EPM has a distinct manifest_id when multiple EPMs are embedded in the same PDF.

9. Non-Goals of This Profile

This profile does not currently define:

  • a required AFRelationship value for the EPM;
  • XMP metadata requirements;
  • digital signature handling;
  • encrypted-host-PDF handling;
  • incremental update behavior for revised manifests; or
  • required PDF-level validity outcomes when a PDF contains both valid and invalid EPM candidates.

Those topics are deferred until implementation testing shows that additional standardization is necessary.

9.1 Encryption Boundary

PDF/A-3 and PDF/A-4 conforming files prohibit PDF-level encryption (the Encrypt dictionary is not permitted under ISO 19005-3 and ISO 19005-4). For EPM implementations targeting these profiles, the host PDF will be openly readable without a password.

This constraint does not prevent encrypting the payload. EPM v1.0's payload.encoding.encryption field describes encryption applied to the payload bytes inside payload.content, not to the PDF container. A consumer reads the EPM manifest as a plain JSON file from the PDF and decrypts the payload bytes as a separate step.

Current implementation work in this repository is designed for unencrypted host PDFs. See Implementation Checklist for further information on encryption boundary analysis and compliance requirements for both PDF/A-3 and PDF/A-4f scenarios.

10. Current Recommendation

For current implementation work, the preferred interoperable pattern is simple:

  • embed one or more UTF-8 JSON files containing EPM v1.0 manifests;
  • attach them at document scope using PDF Associated Files; and
  • discover them by Associated Files first, then by reading discovery header fields for fast identification (epm_version, manifest_id, payload.type).

That profile implies PDF/A-3 as the current minimum supported archival/interchange host profile for EPM experiments and supports PDF/A-4f where implementation and validation requirements are met.

It is narrow enough to implement now and specific enough to support consistent embedding and extraction experiments.