Embedded Payload Manifest (EPM v1.0) Draft Specification¶
1. Status¶
This document is a working draft of EPM v1.0.
The canonical home for this specification is epmstandard.org. The final publication URL and repository structure under that domain have not yet been established.
The key words "MUST", "SHALL", "SHOULD", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Where EPM v1.0 uses terms and representations already defined by established standards, it SHOULD reuse those definitions rather than create parallel terminology.
In particular:
- media type strings follow RFC 2045 and the IANA media type registry;
- base64 representation follows RFC 4648; and
- PDF embedding is expected to rely on ISO 32000-2 embedded file and Associated Files mechanisms rather than EPM-specific embedding rules.
2. Referenced Standards¶
EPM v1.0 relies on a small set of external standards concepts rather than redefining them.
- RFC 2119, Key words for use in RFCs to Indicate Requirement Levels, for interpretation of normative keywords.
- RFC 2045, Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies, together with the IANA media type registry, for the syntax and meaning of payload media types.
- RFC 3986, Uniform Resource Identifier (URI): Generic Syntax, for the form of stable identifiers used in manifest fields.
- RFC 4648, The Base16, Base32, and Base64 Data Encodings, for the base64 representation used by
payload.contentand for the base16 (hex) representation used bypayload.integrity.hash.value. - RFC 6920, Naming Things with Hashes, for the IANA Named Information Hash Algorithm identifier conventions used by
payload.integrity.hash.algorithm. - ISO 32000-2 PDF embedded file and Associated Files mechanisms, including embedded file streams and
AFRelationship, for host-document embedding behavior.
These references define the underlying concepts. EPM v1.0 defines only how those concepts are used inside this wrapper model.
3. Terms and Definitions¶
The following terms are used in this specification. Terms not defined here are used according to their meanings in the referenced standards listed in Section 2.
base64
Base64 encoding as defined in RFC 4648. payload.content SHALL be stored as a base64-encoded string. Producers SHOULD use the base64 alphabet defined in RFC 4648 §4 without line wrapping.
consumer An implementation that reads and processes EPMs.
decoded payload bytes
The bytes obtained by base64-decoding payload.content, prior to reversing any declared compression or encryption transforms. Both payload.size and payload.integrity.hash.value are computed over the decoded payload bytes.
discovery fields
The three top-level fields — epm_version, manifest_id, and payload.type — that enable fast identification of EPMs within a host document without requiring full validation.
document type
The semantic category of an EPM's payload, identified by a URI in the payload.doc_type field. Distinct from the media type, which identifies the encoding format; the document type identifies what the payload represents in domain terms. Producers SHOULD use Schema.org type URIs where applicable. See §11.3.
EPM The JSON object defined by this specification, carrying the manifest identifier, schema reference, and one structured payload. An EPM is the primary artifact produced by a producer and processed by a consumer.
hash algorithm identifier
A string naming a cryptographic hash algorithm. Producers SHOULD use lowercase IANA Named Information Hash Algorithm identifiers as defined in RFC 6920, such as sha-256 or sha-512.
host document The PDF or other document that carries one or more EPMs as embedded files. In EPM v1.0, the host document is a PDF conforming to PDF/A-3 or PDF/A-4f.
media type
A structured identifier for the nature and format of content, as defined by RFC 2045 and the IANA media type registry. Used by payload.type to identify the embedded structured data.
payload
The structured data carried inside payload.content, represented as a base64-encoded byte sequence within the EPM.
payload schema
An externally defined schema referenced by payload.schema, governing the structure of the decoded payload content. EPM v1.0 does not define payload schemas; it references them.
producer An implementation that creates EPMs.
URI
Uniform Resource Identifier, as defined in RFC 3986. Used for manifest_id, schema.id, and payload.schema.id. schema.id is fixed to the canonical EPM v1.0 identifier (see §10.2); it serves as a stable wrapper-schema identifier and need not be dereferenceable. manifest_id and payload.schema.id are producer-chosen URIs that also need not be dereferenceable but SHOULD be stable.
wrapper schema
The EPM schema definition referenced by the top-level schema object, used to validate the structure of an EPM. Distinct from the payload schema, which governs the payload content.
4. Purpose¶
EPM v1.0 defines a small, machine-readable manifest-wrapper for a structured payload embedded in a host document. Its immediate purpose is to preserve structured data inside a PDF so that the data remains discoverable, identifiable, and machine-processable even when the PDF is treated as a presentation-first artifact.
The motivating problem is straightforward: many source workflows begin with structured data, but that structure is lost when the workflow emits a PDF for transport, review, or archival use. In some environments this leads to sidecar delivery patterns in which a PDF and a separate structured file must be transported together. EPM v1.0 proposes a single-file alternative in which one self-contained EPM can travel with the PDF itself.
In this model, the host document remains the presentation layer while the embedded EPM carries the machine-readable data layer.
EPM v1.0 is intended to say, "you can do this, and here is how to describe it," not "you must do this, and here is how we enforce it."
5. Scope¶
EPM v1.0 is PDF-first.
In version 1, this specification defines a self-contained wrapper object. It does not redefine PDF embedding mechanisms and does not replace PDF embedded file or Associated Files features. Instead, it standardizes how one embedded payload is carried and described inside a single EPM that can itself be embedded in the host document.
An informative PDF integration profile for the current recommended embedding and discovery pattern appears in docs/pdf-profile.md.
For the current repository profile, the minimum supported archival/interchange host profile is PDF/A-3. Current implementation guidance and compliance documentation in this repository cover PDF/A-3 and PDF/A-4f. This is because the recommended EPM embedding model relies on arbitrary embedded files plus document-level Associated Files semantics, rather than only older page-level attachment mechanisms.
Other host formats may adopt the same manifest pattern later, but they are outside the conformance scope of EPM v1.0.
EPM v1.0 therefore defines a narrow wrapper layer, not a replacement for the surrounding standards stack.
6. Goals¶
- Preserve structured data that would otherwise be lost when a source document becomes a PDF.
- Provide deterministic identification and interpretation of the embedded payload.
- Provide optional declarations for encoding transformations such as compression and encryption.
- Provide optional integrity metadata that implementations may use for verification if they choose to do so.
- Remain small enough to embed, implement, and validate easily.
7. Non-Goals¶
- General-purpose file attachment management.
- Multi-payload packaging in version 1.
- Host-format-specific embedding rules beyond the descriptive manifest.
- Provenance, signatures, or trust-chain semantics.
- Vendor extension mechanisms or extensibility hooks. In v1, unknown properties are invalid. Extension support is planned for a future version.
8. Informative Rationale¶
EPM v1.0 is intentionally limited to one structured payload per manifest. That constraint keeps the standard small, reduces ambiguity, and aligns with the current objective of pairing a presentation document with one machine-readable representation or extension of that document.
The payload may use any media type that the producer and consumer understand. EPM v1.0 does not standardize the payload schema itself. It standardizes the wrapper used to carry that payload and the metadata used to describe it.
EPM v1.0 also avoids becoming a PDF-specific engineering profile. It depends only on the existence of a host mechanism that can carry an embedded object or file. In version 1, PDF is the target host format because it is common, stable, and widely used for presentation workflows.
This separation follows the pattern seen in hybrid-document workflows: the host document preserves presentation, while the embedded EPM preserves machine-readable structure. EPM v1.0 standardizes the wrapper and the payload metadata. It does not standardize PDF internals, attachment mechanics, or editor behavior.
This design deliberately prefers substance over novelty. Where an accepted standard already defines a concept EPM needs, EPM v1.0 should reference and inherit that concept unless there is a clear interoperability reason to narrow it, simplify it, or make a different rule explicit.
Examples include media type syntax, base64 representation, and PDF attachment semantics. EPM v1.0 only introduces new structure where existing standards do not already provide a sufficiently small, domain-neutral, single-payload wrapper.
9. Reused Concepts¶
EPM v1.0 reuses existing concepts from other standards where practical.
- Payload
typereuses the media type model used across MIME and the IANA media type registry. - Payload
contentreuses the familiar base64 representation model defined by RFC 4648 rather than inventing a new binary-in-JSON convention. - Host-document embedding in PDF reuses ISO 32000-2 embedded file and Associated Files mechanisms.
- External payload schemas remain external. EPM v1.0 does not redefine domain schemas when an existing schema already governs the payload content.
When EPM v1.0 narrows one of these reused concepts, the narrowing is intentional and specific to the EPM wrapper model. For example, EPM v1.0 requires a single payload and requires payload.content, even though related standards may permit multiple parts, detached references, or optional embedded content.
10. Core Model¶
An EPM carries exactly one payload.
That payload SHALL be associated with the host document. It is not intended to serve as an arbitrary attachment inventory.
That association may preserve structured source data, describe the document in a structured form, or extend the document with related structured data. Those relationship patterns are informative examples, not separate conformance classes in EPM v1.0.
The manifest contains four top-level members:
epm_version: the version of the EPM specification.manifest_id: a stable URI identifying this EPM. Theurn:uuid:form is RECOMMENDED. The identifier need not be dereferenceable.schema: the identifier and version of the EPM wrapper schema definition used to validate the manifest.payload: the content and descriptive metadata for the single embedded payload.
10.1 Quick-Discovery Contract¶
For fast manifest identification in host documents that may contain multiple embedded JSON attachments, an EPM SHALL expose the following discovery fields at top level:
epm_version;manifest_id; andpayload.type.
payload.doc_type is OPTIONAL but RECOMMENDED. When present, it enables semantic classification of the payload at discovery time without decoding payload.content.
payload.schema.id is optional but recommended when the payload conforms to an externally defined domain schema that consumers are expected to recognize.
This quick-discovery contract supports fast identification and indexing. It does not, by itself, establish that a manifest is fully valid under all EPM v1.0 rules. Discovery is a pre-validation step: a candidate that exposes these three fields MAY still fail full schema validation — for example, by lacking payload.content, which is required for conformance but is not part of the discovery contract. Consumers SHOULD report such candidates per §12.3 rather than silently discard them.
Each EPM embedded in the same host document SHALL have a distinct manifest_id. Consumers that discover duplicate manifest_id values within a single host document SHOULD surface this as a conflict and SHOULD NOT silently select one candidate over another.
10.2 Schema Identifiers¶
EPM v1.0 uses two distinct schema identifier fields:
schema.ididentifies the wrapper schema, that is, the EPM schema definition used to validate the structure of the EPM itself. For EPM v1.0,schema.idSHALL behttps://epmstandard.org/schema/epm-1. This value is the canonical identifier for the EPM v1.0 wrapper schema and matches the$iddeclared in the schema file. Consumers MAY use this field to confirm that a candidate JSON attachment is a conforming EPM v1.0 manifest before proceeding with full validation. Consumers SHALL NOT treat a failed HTTP retrieval of this URI as a validation failure; the URI functions as a stable identifier, not a retrieval endpoint.payload.schema.id, when present, identifies an externally defined schema governing the content of the decoded payload. This field is not constrained to a canonical value; producers SHOULD use a stable URI that consumers are expected to recognize.payload.schema.idSHALL NOT equal the canonical EPM wrapper schema identifierhttps://epmstandard.org/schema/epm-1; setting it to the wrapper schema URI is not meaningful and would imply the payload is another EPM, which EPM v1.0 does not support.
payload.schema.id is an identifier field, not a guaranteed dereferenceable retrieval URL. It SHOULD be a stable URI chosen so that implementations can treat it as a durable identifier across versions and deployments.
11. Payload Rules¶
11.1 Payload Cardinality¶
An EPM SHALL carry exactly one payload.
11.2 Payload Type¶
The payload type SHALL be a media type string identifying the embedded structured data.
The syntax and interpretation of that media type SHOULD follow RFC 2045 and the IANA media type registry rather than EPM-specific rules.
EPM v1.0 does not restrict that structured data to a single domain-specific schema. Any structured embedded payload is in scope for EPM v1.0, provided the manifest describes exactly one payload in version 1.
For discovery and indexing, payload.type is the canonical payload classification field in EPM v1.0.
EPM v1.0 does not enforce media type syntax in the JSON Schema. Schema-level patterns for RFC 2045 media types are prone to false rejections of valid types that include parameters, whitespace, or uppercase characters. Producers are expected to use standard RFC 2045 implementations. The reference validator applies a supplemental lexical check for basic type/subtype format verification.
11.3 Payload Document Type¶
The doc_type field is OPTIONAL.
When present, doc_type SHALL be a URI identifying the semantic document type of the payload. It is distinct from payload.type, which identifies the encoding format using a media type string. Where payload.type answers "how is this encoded?", doc_type answers "what does this payload represent?".
Producers SHOULD include doc_type when the payload represents a recognized document type. Producers SHOULD use Schema.org type URIs where applicable, such as https://schema.org/Report, https://schema.org/Invoice, https://schema.org/Dataset, or https://schema.org/MedicalRecord. The Schema.org type hierarchy is available at schema.org/docs/full.html. A stable domain-specific URI MAY be used when no suitable Schema.org type exists.
When doc_type is present, it is part of the quick-discovery contract and enables consumers to semantically classify candidate manifests without decoding or inspecting payload.content. In practice, a given producer will emit a small, fixed set of doc_type values determined by the payload categories that producer supports.
11.4 Payload Content¶
The payload content SHALL be present in every valid EPM. payload.content SHALL be non-empty. An EPM with no payload content has no purpose and is invalid; the schema enforces this with minLength: 1.
content SHALL contain the payload as it exists inside the EPM.
content SHALL be stored as a base64-encoded string regardless of the underlying payload media type. Producers SHALL encode payload.content using the base64 alphabet defined in RFC 4648 §4 without line breaks or whitespace.
This is an intentional simplification of representation within the EPM wrapper. It does not replace or redefine base64 itself, and implementations SHOULD interpret that representation according to RFC 4648.
EPM v1.0 schema validation does not enforce base64 compliance. Malformed base64 in payload.content will produce a decode failure at processing time, which is the appropriate enforcement point. The reference validator's semantic checks will surface decode failures when integrity or size verification is attempted.
Consumers SHALL base64-decode content before applying any declared compression or encryption reversal steps.
11.5 Encoding¶
The encoding object describes transformations or character encoding relevant to the payload.
Encoding declarations are descriptive transport metadata for payload.content as stored in the EPM. EPM v1.0 does not require producers or consumers to prove that a declared compression or encryption transform was actually applied, nor does it define key management, algorithm approval, or confidentiality guarantees.
The encoding object is OPTIONAL. Producers SHOULD include it only when one or more encoding-related declarations are relevant to the payload. When encoding is present, it SHALL declare at least one of compression, encryption, or charset. Producers SHALL NOT include an empty encoding object; omit the object entirely when no encoding declarations apply.
If both encryption and compression are declared, producers SHALL apply compression before encryption. Consumers that reverse both declared transforms SHALL reverse them in the opposite order after base64-decoding payload.content: decrypt first, then decompress.
This processing rule is a specification-level conformance requirement. It cannot be established through JSON Schema validation alone.
The complete transform pipeline is as follows. Base64 is always the outermost layer — it wraps the result of all other transforms on the producer side and is the first step reversed on the consumer side.
Producer pipeline:
- Begin with the original payload bytes.
- If
compressionis declared: compress the bytes using the declared algorithm. - If
encryptionis declared: encrypt the result of step 2 using the declared algorithm. - Base64-encode the final result. This encoded string is
payload.content.
Consumer pipeline:
- Base64-decode
payload.content. - If
encryptionis declared: decrypt the decoded bytes. - If
compressionis declared: decompress the result of step 2. - The result is the original payload bytes.
payload.size and payload.integrity.hash.value both describe the result of consumer step 1 — the base64-decoded bytes — before any declared compression or encryption is reversed.
compressionMAY be omitted if no compression is used.encryptionMAY be omitted if no encryption is used.charsetMAY be omitted when not applicable.
When charset is declared, producers SHOULD use names consistent with the IANA character set registry.
The charset field is meaningful only for payloads whose content is character-encoded text, such as application/json or text/plain. Producers SHOULD NOT declare charset for binary media types where character encoding is not applicable, such as application/octet-stream or image/png. Declaring charset on a binary payload has no standard meaning and may mislead consumers attempting to process the encoding declarations.
If a consumer encounters charset declared for a payload whose media type is not character-based, the consumer MAY ignore the charset value without treating the manifest as invalid. Consumers SHOULD NOT reject an otherwise conforming manifest solely because charset appears alongside a binary media type.
11.6 Integrity¶
The integrity object is OPTIONAL.
If present, it SHALL contain:
- a
hashobject containingalgorithmandvalue.
If integrity is present, the hash value SHALL describe the decoded payload bytes.
For interoperability, hash.value SHALL be computed over the decoded payload bytes. Consumers SHALL base64-decode payload.content before computing or verifying the hash.
hash.value SHALL be a lowercase hexadecimal (base16, per RFC 4648 §8) encoding of the digest output.
When declaring hash.algorithm, producers SHOULD use lowercase IANA Named Information Hash Algorithm identifiers such as sha-256 or sha-512. Consumers SHOULD recognize at minimum sha-256. If a consumer encounters an unknown algorithm, it SHOULD treat integrity verification as unsupported rather than attempt a fallback.
This approach ensures that hash values are consistent across conforming producers regardless of base64 encoding conventions.
Integrity declarations are descriptive metadata that provide enough information for optional verification. EPM v1.0 does not require consumers to perform integrity verification as a condition of conformance.
11.7 Payload Size¶
The size property is OPTIONAL.
If present, it SHALL describe the exact byte length of payload.content after base64-decoding. This aligns size with the integrity model, which also operates over the decoded payload bytes.
Payload size is an implementation concern rather than a conformance rule.
Implementations SHOULD consider payload size carefully. Compression, encryption, and large embedded media may materially increase document size.
11.8 Payload Schema¶
The payload.schema object is OPTIONAL.
If present, it SHALL contain:
id; andversion.
Producers SHOULD include payload.schema when the payload content conforms to an externally defined schema that consumers are expected to recognize.
When such a schema already exists, producers SHOULD prefer referencing it over restating its semantics inside EPM-specific fields.
11.9 Metadata¶
The metadata object is OPTIONAL and informational. It does not affect conformance or payload processing. When metadata is present, it SHALL contain at least one property. Producers SHALL omit metadata entirely rather than include an empty metadata object.
Implementations MAY include additional keys in metadata beyond those defined in this specification. Consumers MUST NOT reject a manifest solely because metadata contains unrecognized keys. Unrecognized metadata keys MUST be ignored and MUST NOT affect payload processing or validation.
12. Conformance Expectations¶
Schema validation is part of EPM v1.0 conformance, but it is not the whole of conformance.
The JSON Schema defines the permitted manifest structure for EPM v1.0. Specification-level rules that describe processing behavior, transform order, and integrity interpretation remain normative even when they cannot be mechanically enforced by JSON Schema alone.
The repository validation suite applies supplemental lexical checks beyond JSON Schema validation. These checks are informative and implementation-oriented; they do not replace the normative requirements of this specification. The following supplemental checks are currently applied:
manifest_id: expected to match URI formscheme:restwith no whitespace.schema.idandpayload.schema.id: expected to be URI-shaped identifiers.payload.type: expected to be a media type intype/subtypeform per RFC 2045.payload.encoding.compression,payload.encoding.encryption,payload.encoding.charset: when present, expected to be identifier-like values (alphanumeric, dots, hyphens, underscores).payload.integrity.hash.algorithm: expected to be an identifier-like value.payload.metadata.created_at,payload.metadata.modified_at: when present, expected to be ISO 8601 date-time strings.
The repository also includes a non-normative reference validator script at scripts/validate_epm.py. That tool reflects the repository's current validation profile by combining schema validation with the supplemental lexical checks and semantic checks for declared size and integrity metadata. Its behavior is informative and implementation-oriented; it does not replace the normative requirements of this specification.
12.1 Producers¶
An EPM v1.0 producer SHALL:
- emit
epm_versionas1.0; - emit a stable and distinct
manifest_idURI for each EPM embedded in the same host document; - emit exactly one
payloadobject; - emit
payload.content; - when the payload represents a recognized document type, SHOULD emit
payload.doc_typeas a stable URI identifying the semantic type; - set
schema.idtohttps://epmstandard.org/schema/epm-1, the canonical EPM v1.0 wrapper schema identifier; and when declaringpayload.schema, use a stable URI forpayload.schema.id, the payload schema identifier; - declare encoding transformations only when they are actually relevant; and
- provide complete integrity information if integrity is declared.
EPM v1.0 does not require producers to prove that declared encryption or integrity metadata is cryptographically sound. Such validation is implementation-specific.
EPM v1.0 conformance is defined per EPM. This specification does not require a host PDF that contains one or more candidate attachments to prove that every candidate attachment is a valid EPM.
Where producers rely on existing payload standards, they SHOULD preserve those standards' semantics in the payload itself and use EPM only as the surrounding wrapper.
When producers declare compression, encryption, or charset, they SHOULD use stable, widely recognized identifiers rather than ad hoc labels. EPM v1.0 does not yet define its own registry for these values.
12.2 Consumers¶
An EPM v1.0 consumer SHALL:
- reject manifests that do not conform to the schema for the declared version;
- treat unknown top-level and payload properties as invalid in EPM v1.0;
- treat
metadataas informational only, ignoring unrecognized metadata keys; and - avoid assuming that compression, encryption, or integrity are present unless explicitly declared.
If payload.content is missing, the manifest is not a valid EPM.
An EPM v1.0 consumer MAY verify declared compression, encryption, or integrity metadata, but such verification is outside core conformance requirements.
An EPM v1.0 consumer MAY satisfy structural validation through a conforming JSON Schema draft 2020-12 validator, but successful schema validation alone does not establish that all processing requirements of this specification were followed.
12.3 Error Handling¶
Discovery-then-Validation Failures¶
When a consumer discovers a candidate EPM through the PDF Associated Files mechanism — one that exposes the discovery fields epm_version, manifest_id, and payload.type — but that candidate subsequently fails full schema or semantic validation, the consumer SHOULD:
- report the validation failure for that candidate, including the
manifest_idif readable and the nature of the failure (schema non-conformance, semantic error, or processing failure); and - continue discovery and processing of any remaining candidates in the same host document.
A validation failure on one candidate MUST NOT be treated as a PDF-level failure that blocks access to other candidates. EPM v1.0 conformance is defined per manifest, not per host document. Valid candidates MUST remain accessible and selectable even when other candidates in the same PDF are invalid, non-conforming, or unrecognized.
Consumers SHOULD surface per-candidate results to calling applications or users so that the full discovery outcome is visible. Silent discard of failed candidates is strongly discouraged: the caller cannot distinguish between "no EPMs present" and "EPMs present but all invalid" if failures are not reported.
Specific Error Conditions¶
An EPM v1.0 consumer SHOULD reject malformed manifests and manifests that do not conform to the declared EPM schema version.
If a consumer encounters an unknown compression or encryption algorithm declaration, it SHOULD treat that declaration as unsupported rather than infer a fallback transform.
If integrity verification is attempted and the declared integrity value does not match the computed hash of the decoded payload bytes, the result SHOULD be treated as an integrity verification failure rather than as a schema-conformance failure.
If a consumer encounters an unknown hash algorithm, it SHOULD treat integrity verification as unsupported rather than attempt a fallback.
If a consumer encounters a payload.content value that cannot be decoded as valid base64, it SHOULD treat this as an irrecoverable processing error for that manifest.
13. Versioning¶
EPM v1.0 is intentionally constrained.
epm_version governs conformance to the EPM specification itself.
Each value of epm_version defines a distinct conformance class. A manifest is valid under the schema for its declared version only. EPM v1.0 does not define backward compatibility for future versions; each future version will define its own schema and its own canonical schema.id. Implementations encountering an unknown epm_version value SHOULD treat the manifest as not conforming to any known EPM version rather than inferring compatibility.
schema.version identifies the version of the EPM wrapper schema definition referenced by the manifest.
In EPM v1.0, implementations SHOULD treat epm_version and schema.version as aligned. Both SHALL carry 1.0 in conforming EPM v1.0 manifests.
Future versions MAY add new capabilities, including broader host-format guidance or support for multiple payloads, but such changes SHALL be versioned explicitly rather than inferred.
14. Design Position¶
EPM v1.0 should not blaze a new trail unnecessarily.
Its design position is:
- inherit accepted definitions where they already solve the problem;
- constrain them only where EPM needs a smaller interoperability surface; and
- introduce new structure only for the wrapper behavior that existing standards do not already define.
That approach reduces conceptual drift, improves implementer comprehension, and makes future alignment with adjacent standards easier.
In practical terms, EPM v1.0 reuses RFC 2045 media type conventions, RFC 4648 base64 representation, and ISO 32000-2 PDF embedding concepts, while narrowing the model to one required embedded payload carried by one EPM.
For present implementation work, the repository's preferred PDF profile is to embed the EPM itself as one document-level associated JSON file and to discover it through PDF Associated Files before falling back to candidate JSON inspection.