1. Trang chủ
  2. » Kỹ Thuật - Công Nghệ

Tiêu chuẩn iso 19005 1 2005

36 1 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Document Management — Electronic Document File Format For Long-Term Preservation — Part 1: Use Of PDF 1.4 (PDF/A-1)
Trường học International Organization for Standardization
Chuyên ngành Document Management
Thể loại tiêu chuẩn
Năm xuất bản 2005
Thành phố Geneva
Định dạng
Số trang 36
Dung lượng 302,16 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Cấu trúc

  • 5.1 General (10)
  • 5.2 Level A conformance (11)
  • 5.3 Level B conformance (11)
  • 5.4 Conforming readers (11)
  • 6.1 File structure (11)
  • 6.2 Graphics (13)
  • 6.3 Fonts (16)
  • 6.4 Transparency (18)
  • 6.5 Annotations (18)
  • 6.6 Actions (19)
  • 6.7 Metadata (20)
  • 6.8 Logical structure (25)
  • 6.9 Interactive Forms (27)

Nội dung

Microsoft Word C038920e doc Reference number ISO 19005 1 2005(E) © ISO 2005 INTERNATIONAL STANDARD ISO 19005 1 First edition 2005 10 01 Document management — Electronic document file format for long t[.]

General

ISO 19005 specifies the PDF/A-1 file format for electronic documents, requiring compliance with the modified PDF Reference standards Conforming PDF/A-1 files can utilize any valid PDF features not explicitly prohibited by ISO 19005 However, features from PDF specifications earlier than Version 1.4 that are not detailed in the PDF Reference should be avoided Additionally, the version number in the header of a PDF file is not a determining factor for compliance.

`,,```,,,,````-`-`,,`,,`,`,,` - the value of the Version key in the document catalog dictionary shall be used in determining whether a file is in accordance with this part of ISO 19005

NOTE 1 A conforming file is not obligated to use any PDF feature other than those explicitly required by PDF

Reference or this part of ISO 19005

NOTE 2 The proper mechanism by which a file can presumptively identify itself as being a PDF/A-1 file of a given conformance level is described in 6.7.11.

Level A conformance

Level A conforming files shall adhere to all of the requirements of this part of ISO 19005 A file meeting this conformance level is said to be a “conforming PDF/A -1a file.”

Level B conformance

ISO 19005 establishes a Level B conformance level to address the diverse preservation needs of users utilizing PDF files Files that conform to this level must comply with all requirements of ISO 19005, with the exception of sections 6.3.8 and 6.8 Such files are referred to as "conforming PDF/A-1b files."

Level B conformance requirements ensure the long-term preservation of a file's visual appearance, but may lack the internal information necessary for maintaining the document's logical structure and reading order In contrast, Level A conformance imposes greater responsibilities on content creators, offering enhanced document preservation and accessibility, particularly for users with physical impairments.

Conforming readers

A conforming reader must adhere to the functional behavior requirements outlined in ISO 19005 These requirements are defined as general functional standards applicable to all conforming readers, without specifying particular technical designs, user interfaces, or implementation details.

The rendering of conforming files must adhere to the guidelines outlined in the PDF Reference, along with additional requirements specified in ISO 19005 Conforming readers may disregard features from PDF specifications prior to Version 1.4 that are not explicitly mentioned in the PDF Reference.

Readers conforming to PDF/A-1 standards must accurately read and process files according to their designated conformance levels Level A readers are equipped to handle both Level A and B files, while Level B readers are specifically designed to process only Level B conforming files.

File structure

6.1.2 to 6.1.13 address overall file format issues and the base elements that form the general structure of a conforming file

The % character of the file header shall occur at byte offset 0 of the file

The file header must be succeeded by a comment that starts with a % character, followed by at least four characters, where each character's encoded byte value exceeds 127 in decimal.

Encoded character byte values exceeding decimal 127 at the start of a file are utilized by different software tools and protocols to identify the file as containing 8-bit binary data, which must be preserved during processing.

The file trailer dictionary must include the ID keyword, while the keyword "Encrypt" is prohibited Additionally, no data is allowed after the final end-of-file marker, except for a single optional end-of-line marker.

The file trailer in a PDF can either be the last trailer dictionary or the first page trailer in a linearized PDF According to PDF Reference 3.4.4 and 3.4.5, the last trailer dictionary is crucial, while the first page trailer is essential for linearized files as outlined in PDF Reference F.2 In linearized PDFs, the ID keyword must appear in both trailers, and its value must be identical in each instance.

NOTE The explicit prohibition of the Encrypt keyword has the implicit effect of disallowing encryption and password- protected access permissions

In a cross reference subsection header the starting object number and the range shall be separated by a single SPACE character (20h)

The xref keyword and the cross reference subsection header shall be separated by a single EOL marker

Any object whose offset is not referenced in the cross reference table shall be exempt from all requirements of this part of ISO 19005

A document information dictionary may be defined in a conforming file If defined, its elements shall be consistent with analogous XMP metadata properties as specified in 6.7.3

Hexadecimal strings shall contain an even number of non-white-space characters, each in the range 0 to 9,

The stream keyword must be followed by either a carriage return and line feed character sequence or a single line feed character Additionally, the endstream keyword should be preceded by an end-of-line marker.

The Length key in the stream dictionary must correspond to the byte count in the file that appears after the LINE FEED character and before the EOL marker, just prior to the endstream keyword.

NOTE 1 These requirements remove potential ambiguity regarding the ending of stream content

A stream object dictionary shall not contain the F, FFilter, or FDecodeParams keys

The prohibition of certain keys prevents the inclusion of external document content, thereby eliminating external dependencies that could complicate preservation efforts.

The object number and generation number shall be separated by a single white-space character The generation number and obj keyword shall be separated by a single white-space character

The object number and endobj keyword shall each be preceded by an EOL marker The obj and endobj keywords shall each be followed by an EOL marker

Linearization shall be permitted but any linearization information supplied within a file should be ignored by conforming readers

The LZWDecode filter shall not be permitted

NOTE The use of the LZW compression algorithm has been subject to intellectual property constraints

A file specification dictionary, as defined in PDF 3.10.2, shall not contain the EF key A file’s name dictionary, as defined in PDF Reference 3.6.3, shall not contain the EmbeddedFiles key

These keys are essential for encapsulating files with various content types within a PDF Prohibiting these keys implicitly prevents the inclusion of embedded files that could lead to external dependencies, thereby complicating preservation efforts.

A conforming file shall not violate any of the architectural limits specified in PDF Reference Table C.1

NOTE By complying with these limits, a conforming file is compatible with the widest possible range of readers

The document catalog dictionary shall not contain a key with the name OCProperties

NOTE The explicit prohibition of the OCProperties key, which is allowed in PDF 1.5 [19] , has the implicit effect of disallowing optional content that generates alternative renderings of a document.

Graphics

6.2.2 to 6.2.10 describe restrictions placed on both conforming files and readers They are intended to address graphical rendering issues that do not involve fonts and interactive elements

A conforming file can define the color characteristics of the intended rendering device through a PDF/A-1 OutputIntent, which is an OutputIntent dictionary as specified by the PDF standard.

Reference 9.10.4 is part of the file's OutputIntents array, featuring GTS_PDFA1 as the value for the S key and a valid ICC profile stream assigned to the DestOutputProfile key.

If a file's OutputIntents array has multiple entries, all entries with a DestOutputProfile key must have the same value for that key, which should be a valid ICC profile stream.

All colors must be defined in a device-independent way, either through a direct device-independent color space or via an OutputIntent A compliant file can utilize any color space outlined in the PDF Reference, with specific restrictions noted in sections 6.2.3.2 to 6.2.3.4.

Specifying color in a device-independent manner, as outlined in section 6.2.3, ensures predictable color rendering based on a colorimetric definition, eliminating the need for external assumptions or information This approach also allows for the association of a colorimetric definition with device-dependent color data.

All ICCBased colour spaces shall be embedded as ICC profile streams as described in PDF Reference 4.5

A conforming reader shall render ICCBased colour spaces as specified by the ICC specification, and shall not use the Alternate colour space specified in an ICC profile stream dictionary

A conforming file must utilize either the DeviceRGB or DeviceCMYK color space, but not both simultaneously If an uncalibrated color space is employed, the file must include a PDF/A-1 OutputIntent as specified in section 6.2.2 DeviceRGB is permissible only if the file contains a PDF/A-1 OutputIntent that specifies an RGB color space.

DeviceCMYK may be used only if the file has a PDF/A-1 OutputIntent that uses a CMYK colour space

When rendering a DeviceGray colour specification in a file whose OutputIntent is an RGB profile, a conforming reader shall convert the DeviceGray colour specification to RGB by the method described in PDF

When rendering a DeviceGray colour specification in a file whose OutputIntent is a CMYK profile, a conforming reader shall convert the DeviceGray colour specification to DeviceCMYK by the method described in PDF Reference 6.2.2

When rendering colours specified in a device-dependent colour space a conforming reader shall use the file’s PDF/A-1 OutputIntent dictionary, as defined in 6.2.2, as the source colour space

6.2.3.4 Separation and DeviceN colour spaces

A conforming reader shall obey the following rules when rendering colour spaces based on DeviceN or

If the colorants in the color space are exclusively Cyan, Magenta, Yellow, and Black, and the file includes an OutputIntent that is a CMYK profile, then these colorants must be regarded as components of the color space defined by the PDF/A-1 OutputIntent dictionary, as outlined in section 6.2.2 In this case, the alternate color space should not be utilized.

⎯ If the output device does not support the Separation colour space or DeviceN colourants, the Alternate colour space shall be used

The Alternate colour space of a Separation or DeviceN colour space shall obey all restrictions on colour spaces specified in 6.2.3.2 and 6.2.3.3

An Image dictionary shall not contain the Alternates key or the OPI key

If an Image dictionary contains the Interpolate key, its value shall be false

Use of the Intent key shall conform to the rules given in 6.2.9

A form XObject dictionary shall not contain any of the following:

⎯ the Subtype2 key with a value of PS;

In earlier PDF versions, the Subtype2 key with a value of PS and the PS key were utilized to define arbitrary executable PostScript code streams, which could disrupt reliable and predictable rendering.

A conforming file shall not contain any reference XObjects

NOTE Reference XObjects refer to arbitrary document content in external PDF files, creating external dependencies that complicate preservation efforts

A conforming file shall not contain any PostScript XObjects

NOTE PostScript XObjects contain arbitrary executable PostScript code streams that have the potential to interfere with reliable and predictable rendering

An ExtGState dictionary must not include the TR key or the TR2 key with any value other than Default Additionally, a conforming reader is permitted to disregard any occurrence of the HT key within an ExtGState dictionary.

Use of the RI key shall conform to the rules of 6.2.9

Where a rendering intent is specified, its value shall be one of the four values defined in PDF Reference

RelativeColorimetric, AbsoluteColorimetric, Perceptual or Saturation

NOTE The default rendering intent is RelativeColorimetric

A content stream shall not contain any operators not defined in PDF Reference even if such operators are bracketed by the BX/EX compatibility operators

Use of the ri operator shall conform to the rules of 6.2.9

Content streams are essential for page descriptions, including the Contents stream of a page object and the stream of a form XObject They also play a crucial role in the appearance stream of annotations, such as form fields and Widget annotations.

NOTE 2 In earlier versions of the PDF format a PostScript operator PS was defined As this operator is not defined in

PDF Reference its use is implicitly prohibited by 6.2.10

Fonts

The requirements outlined in sections 6.3.2 to 6.3.8 aim to guarantee that the future rendering of textual content in a conforming file replicates the original static appearance on a glyph-by-glyph basis, while also enabling the recovery of semantic properties for each character within the text.

All fonts used in a conforming file shall conform to the font specifications defined in PDF Reference 5.5

In the context of ISO 19005, multiple master fonts are classified as a specific type of Type 1 fonts Consequently, any requirements that apply to Type 1 fonts are also applicable to multiple master fonts.

Writers must ensure that all fonts comply with the necessary standards, as outlined in ISO 19005 However, this section does not specify how to determine font conformance.

For any given composite (Type 0) font referenced within a conforming file, the CIDSystemInfo entries of its

CIDFont and CMap dictionaries shall be compatible, as described in PDF Reference 5.6.2; in other words, the

Registry and Ordering strings of the CIDSystemInfo dictionaries for that font shall be identical, unless the value of the CMap dictionary UserCMap key is Identity-H or Identity-V

All Type 2 CIDFonts must include a CIDToGIDMap entry in the CIDFont dictionary, which can either be a stream mapping CIDs to glyph indices or the name Identity, as outlined in PDF Reference Table 5.13.

All CMaps in a conforming file, excluding Identity-H and Identity-V, must be embedded as specified in PDF Reference 5.6.4 Additionally, for embedded CMaps, the integer value of the WMode entry in the CMap dictionary must match the WMode value in the embedded CMap stream.

The font programs for all fonts used within a conforming file shall be embedded within that file, as defined in

Fonts are deemed to be in use when any of their glyphs are referenced, particularly in text rendering mode 3, as outlined in PDF Reference 5.8.

⎯ the Contents stream of a page object;

⎯ the stream of a Form XObject;

⎯ the appearance stream of an annotation, including form fields;

⎯ the content stream of a Type 3 font glyph;

⎯ the stream of a tiling pattern

Only fonts that are legally embeddable in a file for unlimited, universal rendering shall be used

All conforming readers shall use the embedded fonts, rather than other locally resident, substituted or simulated fonts, for rendering

Text rendering mode 3, as outlined in PDF Reference 5.2.5, indicates that glyphs are neither stroked nor filled, nor are they utilized as a clipping boundary Consequently, any font designated for exclusive use in this mode is not rendered and is therefore not subject to the embedding requirement.

Type 3 fonts are exempt from the requirements of section 6.3.4, as they are always embedded within PDF files, although their embedding mechanism differs from that of PDF Reference 5.8 In contrast, the 14 standard Type 1 fonts do not have any exemptions from these requirements Additionally, the specifications for font program metadata can be found in section 6.7.10.

Font subsets are permissible if the embedded font programs include glyph definitions for all characters used in the file By embedding these font programs, any compliant reader can accurately reproduce all glyphs as they were originally published, without relying on potentially temporary external resources.

ISO 19005 prohibits the embedding of fonts that require special agreements with copyright holders, as this creates significant challenges for archives in verifying the existence, validity, and longevity of such agreements.

As stated in 6.3.4, embedded font programs shall define all font glyphs referenced for rendering with a conforming file Type 0 CIDFont and Type 1 and TrueType font subsets, as described in PDF

Reference 5.5.3, may be used as long as the embedded font programs define all appropriate glyphs

For all Type 1 font subsets referenced within a conforming file, the font descriptor dictionary shall include a

CharSet string listing the character names defined in the font subset, as described in PDF Reference Table 5.18.

For all CIDFont subsets referenced within a conforming file, the font descriptor dictionary shall include a

CIDSet stream identifying which CIDs are present in the embedded CIDFont file, as described in

NOTE The use of font subsets allows a potentially substantial reduction in the size of conforming files

For every font embedded in a conforming file, the glyph width information stored in the Widths entry of the font dictionary and in the embedded font program shall be consistent

NOTE This requirement is necessary to ensure predictable font rendering, regardless of whether a given reader uses the metrics in the Widths entry or those in the font program.

All non-symbolic TrueType fonts shall specify MacRomanEncoding or WinAnsiEncoding as the value of the

Symbolic TrueType fonts must not include an Encoding entry in their font dictionary, and their font programs should have a "cmap" table that contains precisely one encoding.

NOTE This requirement makes normative the suggested guidelines described in PDF Reference 5.5.5.

6.3.8 is applicable only for files meeting Level A conformance For Level B conformance the requirements of 6.3.8 can be ignored

The font dictionary must contain a ToUnicode entry, which is a CMap stream object that associates character codes with Unicode values, as outlined in PDF Reference 5.9, unless the font satisfies one of the following three conditions.

⎯ fonts that use the predefined encodings MacRomanEncoding, MacExpertEncoding or WinAnsiEncoding, or that use the predefined Identity-H or Identity-V CMaps;

Type 1 fonts utilize character names derived from the Adobe standard Latin character set and the named characters in the Symbol font, as outlined in Appendix D of the PDF Reference.

⎯ Type 0 fonts whose descendant CIDFont uses the Adobe-GB1, Adobe-CNS1, Adobe-Japan1 or Adobe-Korea1 character collections

NOTE Unicode mapping allows the retrieval of semantic properties about every character referenced in the file.

Transparency

If an SMask key appears in an ExtGState or XObject dictionary, its value shall be None

A Group object with an S key with a value of Transparency shall not be included in a form XObject

The following keys, if present in an ExtGState object, shall have the values shown:

The use of transparency in conforming files is prohibited, but similar visual effects can be achieved through alternative methods such as pre-rendered data or flattened vector objects Employing these techniques allows a file to remain compliant with PDF/A-1 standards.

Annotations

Conforming interactive readers must include a feature that displays the values of the Contents key from annotation dictionaries, in accordance with the rendering behavior specified by the PDF Reference and modified by ISO 19005.

NOTE This part of ISO 19005 does not prescribe the specific behaviour or technical implementation details that interactive readers may use to implement this functional requirement

Annotation types not defined in PDF Reference shall not be permitted Additionally, the FileAttachment,

Sound and Movie types shall not be permitted

NOTE Support for multimedia content is outside the scope of this part of ISO 19005

An annotation dictionary shall not contain the CA key with a value other than 1.0

An annotation dictionary shall contain the F key The F key’s Print flag bit shall be set to 1 and its Hidden,

Invisible and NoView flag bits shall be set to 0

Text annotations should set the NoZoom and NoRotate flag bits of the F key to 1

Restrictions on annotation flags limit the use of hidden or non-printable annotations However, the NoZoom and NoRotate flags are allowed, enabling annotation types to function similarly to standard text annotations According to PDF Reference 8.4.5, text annotations inherently display NoZoom and NoRotate behavior, and explicitly setting these flags clarifies any ambiguity between the annotation dictionary settings and how readers interpret them.

An annotation dictionary shall not contain the C array or the IC array unless the colour space of the

DestOutputProfile in the PDF/A-1 OutputIntent dictionary, defined in 6.2.2, is RGB

NOTE 2 These provisions ensure that the device colour spaces used in annotations by mechanisms other than an appearance stream are indirectly defined by means of the PDF/A-1 OutputIntent

An annotation dictionary with the AP key must have an appearance dictionary as its value, which should exclusively include the N key The value of the N key is a stream that specifies the annotation's appearance.

NOTE 3 All of the provisions of 6.5.3 apply to all annotation types, including the Widget type used for form fields.

Actions

The Launch, Sound, Movie, ResetForm, ImportData and JavaScript actions shall not be permitted

Additionally, the deprecated set-state and no-op actions shall not be permitted Named actions other than

NextPage, PrevPage, FirstPage, and LastPage shall not be permitted In response to each of the four allowed named actions, conforming interactive readers shall perform the appropriate action described in

Interactive form fields shall not perform actions of any type

This section of ISO 19005 does not cover support for multimedia content The ResetForm action alters the visual presentation of a form, while the ImportData action allows for the importation of form data from an external file Additionally, JavaScript actions can execute arbitrary code, which may disrupt consistent and reliable rendering.

NOTE 2 Additional requirements for interactive form fields are specified in 6.9

A Widget annotation dictionary and a document catalog dictionary must not contain an AA entry for an additional-actions dictionary.

The additional-actions dictionaries specify arbitrary JavaScript actions, and the explicit prohibition of the AA entry implicitly prevents JavaScript actions that could create external dependencies, thereby simplifying preservation efforts.

Interactive readers can opt to make hyperlinks non-actionable In addition to the rendering behavior specified by the PDF Reference and modified by ISO 19005, they must also offer a mechanism to display the F and D keys from a GoToR action dictionary, the URI key from a URI action dictionary, and the F key from a SubmitForm action dictionary.

Hyperlinks can transfer execution control away from an interactive reader, so this subclause allows readers to opt out of making them actionable To ensure complete archival disclosure of conforming files, it's crucial for interactive readers to have a way to reveal the destinations of all hyperlinks However, ISO 19005 does not specify the exact behavior or technical implementation that interactive readers should adopt to fulfill this requirement.

Metadata

Sections 6.7.2 to 6.7.11 detail the essential requirements for metadata in conforming files, which are crucial for effective file management throughout their life cycle Metadata plays a vital role in the identification and description of files, as well as in addressing relevant technical and administrative aspects Consequently, authors of conforming files must adhere to various domain-specific metadata requirements established outside of this section of ISO 19005 This part of ISO 19005 provides a structured and consistent framework that accommodates a wide range of metadata needs.

The document catalog dictionary of a conforming file must include the Metadata key, with its value adhering to the XMP Specification All embedded metadata properties should be in XMP format, except for document information dictionary entries lacking XMP equivalents, as outlined in section 6.7.3 Additionally, properties specified in XMP format should utilize the predefined schemas established in XMP.

Specification 4, or extension schemas that comply with XMP Specification 4, and 6.7.8 Metadata object stream dictionaries shall not contain the Filter key

The explicit ban on the Filter key implicitly ensures that the contents of XMP metadata streams remain as plain text, making them accessible to tools that do not support PDF formats.

NOTE 2 An extension schema is any XMP schema that is not defined in XMP Specification

A document information dictionary may be included in a conforming file, and if present, all entries with similar properties in the specified XMP schemas must be embedded in XMP format with corresponding values However, any entries not specified in Table 1 should not be embedded using predefined XMP schema properties.

NOTE 1 Since a document information dictionary is allowed within a conforming file, it is possible for a single file to be both PDF/A-1 (ISO 19005-1) and PDF/X (ISO 15930-4 and ISO 15930-6 [11] ) conformant

Table 1 — Crosswalk between document information dictionary and XMP properties

Entry PDF type Property XMP type

Title text string dc:title Text

Author text string dc:creator seq Text

Subject text string dc:subject Text

Keywords text string pdf:Keywords Text

Creator text string xmp:CreatorTool Text

Producer text string pdf:Producer Text

CreationDate date xmp:CreateDate Date

ModDate date xmp:ModifyDate Date

The XML namespace URIs for the Dublin Core (dc), PDF, and XMP prefixes are essential for metadata specification The dc prefix is defined by the URI , while the PDF prefix is identified by , and the XMP prefix is represented by These namespaces facilitate the organization and interoperability of metadata across various applications and platforms.

The values of document information dictionary entries must match their corresponding XMP properties For properties transitioning from the PDF text string type to the XMP Text type, this equivalence is determined on a character-by-character basis, regardless of encoding, by comparing the numeric ISO/IEC 10646-1 code points of the characters.

The explicit requirement for equivalence between document information dictionary entries and their corresponding XMP properties ensures a clear and unambiguous interpretation of the property's value.

The dc:creator property in XMP metadata must be represented as an ordered Text array of length one, containing a single entry with one or more names The equivalence between Author and dc:creator is determined on a character-by-character basis, regardless of encoding, by comparing the numeric values.

ISO/IEC 10646-1 code points for the characters

EXAMPLE 1 The document information dictionary entry:

/Author (Peter, Paul and Mary) is equivalent to the XMP property:

Peter, Paul, and Mary

Date properties consist of a variable-length sequence of temporal components, including year, month, day, hour, minute, and second, which vary in granularity These properties are designed to align with the PDF date type specifications.

The PDF Reference 3.8.2 specifies that the XMP Date type must adhere to Date and Time Formats, ensuring value equivalence is determined on a component-by-component basis in relation to Coordinated Universal Time (UTC), while also accounting for local time zone offsets.

EXAMPLE 2 The document information dictionary entries:

/CreationDate (D:20040402) /ModDate (D:20040408091132-05'00') are equivalent to the XMP properties:

All XMP schemas must establish normalization rules for their properties Metadata properties within these schemas that include normalization rules require that property values be entered, saved, and maintained in the specified normalized format This ensures seamless interchange and consistent interpretation of metadata by compliant readers.

The bytes and the encoding attributes shall not be used in the header of an XMP packet

NOTE Both the bytes and encoding attributes are deprecated in XMP Specification.

A conforming file must include one or more metadata properties to effectively characterize and identify it ISO 19005 does not specify a particular identification scheme; identifiers can be externally based, like an International Standard Book Number (ISBN) or a Digital Object Identifier (DOI), or internally based, such as a Globally Unique Identifier (GUID) or Universally Unique Identifier (UUID) These identifiers can also be assigned during workflow operations.

The xmp:Identifier property can utilize the xmpMM:DocumentID, xmpMM:VersionID, and xmpMM:RenditionClass properties, or properties from an extension schema Any identification system is acceptable as long as it adheres to XMP requirements and the relevant section of ISO 19005.

Any alteration to a conforming file, including the addition of an xmpMM:History entry as outlined in section 6.7.7, necessitates an update to the changing identifier in the file trailer dictionary ID key, as specified in PDF Reference 9.3.

NOTE The XML namespace URI for the xmp prefix is ; the namespace URI for the xmpMM prefix is

To accurately document all significant user actions involved in creating, transforming, or instantiating a conforming file, it is essential to record each action in the xmpMM:History property.

⎯ the action, parameters and when fields shall be specified;

⎯ the softwareAgent field should be specified;

⎯ the instanceID field shall not be specified

NOTE 1 The XML namespace URI for the prefix xmpMM is

Logical structure

Subclause 6.8 is applicable only for files meeting Level A conformance For Level B conformance the requirements of 6.8 can be ignored

The requirements outlined in sections 6.8.2 to 6.8.8 aim to guarantee the retrieval of textual content from a compliant file in the natural reading order of the language used They also ensure that the characters within each word can be accessed in their correct sequence Additionally, these provisions facilitate the recovery of higher-level semantic information related to the logical structure of the document.

PDF/A-1 writers must avoid adding any structural or semantic information that is not explicitly or implicitly found in the source material just to meet conformance standards This includes elements such as structure hierarchy, natural language specifications, alternative descriptions, non-textual annotations, replacement text, and expansions of abbreviations and acronyms.

NOTE It is inadvisable for writers to generate structural or semantic information using automated processes without appropriate verification

A Level A conforming file shall meet all of the requirements set forth for Tagged PDF in PDF Reference 9.7

NOTE Tagged PDF defines conventions for explicitly declaring and describing the logical structural aspects of document content

The document catalog dictionary shall include a MarkInfo dictionary whose sole entry, Marked, shall have a value of true

NOTE This setting indicates that the file conforms to the Tagged PDF conventions.

Pagination features, including running heads and page numbers, along with cosmetic layout elements like footnote rules and background screens, should be categorized as pagination, layout, and page artifacts, as outlined in PDF Reference 9.7.2.

For languages and script systems that normally use space characters to indicate word breaks, the following additional restriction shall apply:

In show strings, word breaks are marked by one or more space characters between individual words If a word ends at the boundary of a show string, additional space characters must be added at the end It's important to note that a single word can extend across multiple show strings, with word breaks indicated solely by the presence of space characters, rather than the show string boundaries Additionally, a sequence of two or more consecutive space characters is considered equivalent to a single space for indicating word breaks.

The logical structure of the conforming file shall be described by a structure hierarchy rooted in the

StructTreeRoot entry of the document catalog dictionary, as described in PDF Reference 9.6

Each structure element dictionary in the structure hierarchy shall have a Type entry with the name value of

Writers of conforming files should strive to accurately represent a document's logical structure hierarchy with maximum detail This involves utilizing standard structure types for grouping elements, including block-level, paragraph-like, list, table, inline-level, link, and illustration elements, as outlined in PDF Reference 9.7.4.

A clear outline of a document's logical structure is essential for maximizing its semantic value, especially when it comes to rendering or migrating to different data formats.

The definition of block-level structuring elements should follow the strongly structured paradigm as described in PDF Reference 9.7.4

All non-standard structure types must be mapped to the closest functionally equivalent standard type as outlined in PDF Reference 9.7.4 within the role map dictionary of the structure tree root This mapping can be indirect, allowing a non-standard type to connect to another non-standard type, but it ultimately must conclude with a standard type.

The default natural language for all text in a file should be specified by the Lang entry in the document catalog dictionary

All text in a file that is not in the default language must be marked with a Lang property in a marked-content sequence or included as a Lang entry in a structure element dictionary, as outlined in PDF Reference 9.8.1.

The presence of the Lang entry in the document catalog dictionary, structure element dictionary, or property list indicates that its value must be a language identifier, as specified by RFC 1766, which outlines the Tags for the Identification of Languages, according to PDF Reference 9.8.1.

Text strings in Unicode that do not match the default language of the file or the language specified by the closest enclosing structure element must indicate their language using the internal escape sequence outlined in PDF Reference 3.8.1.

The distinction between foreign words and those commonly integrated into a language is complex These requirements aim to ensure clear and unambiguous semantic interpretation of textual content in the future.

All structural elements that contain content without a natural textual equivalent, such as images and formulas, must provide an alternative text description through the Alt entry in the structure element dictionary, as outlined in PDF Reference 9.8.2.

NOTE Alternate descriptions provide textual descriptions that aid in the proper interpretation of otherwise opaque non-textual content

For annotation types that lack text display, it is essential to include the Contents key in the annotation dictionary, providing an alternative description of the annotation's content in a format that is easily understandable by humans.

All non-standard textual structure elements, such as custom characters or inline graphics, must include replacement text through the ActualText entry in the structure element dictionary, as outlined in PDF Reference 9.8.3.

NOTE Replacement text provides textual equivalents that aid in the proper interpretation of otherwise opaque, unusual representations of textual components

6.8.8 Expansions of abbreviations and acronyms

All abbreviations and acronyms in text must be formatted using a Span tag that includes a textual expansion, as outlined in PDF Reference 9.8.4.

NOTE Abbreviation and acronym expansion provides textual equivalents that aid in the proper interpretation of otherwise opaque nomenclature.

Interactive Forms

The intent of the requirements of this subclause is to ensure that there is no ambiguity about the rendering of form fields

A conforming reader must not utilize form fields to alter the displayed representation of the page or the content of the file Additionally, neither a Widget annotation dictionary nor a Field dictionary should include the A or AA keys.

The NeedAppearances flag of the interactive form dictionary shall either not be present or shall be false

Each form field must be linked to an appearance dictionary that defines how the field's data is displayed A compliant reader is required to present the field based on the appearance dictionary, independent of the actual form data.

NOTE Requiring an appearance dictionary ensures the reliable rendering of the form

Sections A.1 to A.3 offer a convenient overview of how the PDF/A-1 requirements differ from the PDF Reference, but they do not encompass all PDF/A-1 requirements The complete normative requirements are detailed in Clauses 2 to 6 In case of any discrepancies between this informative annex and the normative text, the normative text is the authoritative source for the requirements.

Tables A.1 and A.2 outline the PDF 1.4 operators, objects, and keys that differ from the PDF Reference in relation to PDF/A-1 conformance as specified in ISO 19005 These tables provide the status of each operator, object, or key, along with the normative clause that defines that status The status values utilized are clearly indicated.

⎯ Required The operator, object or key is required in conforming files

⎯ Prohibited The operator, object or key is prohibited from conforming files

⎯ Restricted The operator, object or key may appear in conforming files, but only subject to specific constraints on its use, contents or value

⎯ Recommended The operator, object or key should appear in conforming files

⎯ Ignored The operator, object or key may appear in conforming files but is ignored by conforming readers

When a PDF dictionary object is referenced in tables without explicitly listing its keys, all keys within that object and its descendants inherit their status from the object displayed in the table An object is considered a descendant of another object, known as its ancestor, if certain conditions are met.

⎯ the object is the value of a key in the ancestor object;

⎯ the ancestor is an array and the object is an element of that array;

⎯ the object is a descendant of a descendant of the ancestor object

All operators defined in PDF Reference for use in Contents streams may be included in a conforming file, subject to the conditions shown in Table A.1

RG Restricted 6.2.3 rg Restricted 6.2.3 ri Restricted 6.2.9

Operators not defined in PDF Reference Prohibited 6.2.10

In a conforming PDF file, all objects and keys specified in the PDF Reference can be included, provided they meet the conditions outlined in Table A.2 Certain requirements for keys pertain to specific key/value pairs, with the corresponding value indicated after the key.

Table A.2 — Object and key status

Object Key (and value) Status Subclause

Contents Recommended (for Level A conformance of non-textual annotations)

Artifact property list dictionary Recommended (for Level A conformance) 6.8.3

Object Key (and value) Status Subclause

Lang Recommended (for Level A conformance) 6.8.4

StructTreeRoot Recommended (for Level A conformance) 6.8.3.3

ToUnicode Required a (for Level A conformance) 6.3.8

Encoding Prohibited (if symbolic font)

Font file stream Metadata Recommended 6.7.10

FontFile or FontFile2 or FontFile3

Object Key (and value) Status Subclause

MarkInfo Marked true Required (for Level A conformance) 6.8.2.2

DestOutputProfile Restricted 6.2.2 PDF/A output intent dictionary

Span dictionary E Recommended (for Level A conformance) 6.8.8

Lang Recommended (for Level A conformance of non-default language content)

ActualText Recommended (for Level A conformance of non-standard elements)

Alt Recommended (for Level A conformance of non-textual elements) 6.8.5 Structure element dictionary

Type StructElem Required (for Level A conformance) 6.8.3.3

SMask Restricted 6.4 a There are three specific exemptions from this status defined in 6.3.8

B.1 Use of non-XMP metadata

The use of non-XMP metadata at the file level is not recommended due to the lack of assurance for its preservation It is preferable to convert any existing non-XMP metadata to XMP format, embed it within the file, and document this conversion in the xmpMM:History property Additionally, the xmpMM:History property should indicate any non-XMP elements that remain unconverted.

Failure to preserve metadata may cause problems in locating, interpreting, managing, and authenticating a file in the future, which may in turn diminish or cancel its archival value.

Languages must be identified using registered identifiers from ISO 639-1, ISO 3166-1, or IANA Private use identifiers are only appropriate when a language lacks a defined identifier in these registries If a language is completely unknown, the identifier x-unknown should be utilized.

NOTE The use of ISO 639-1 [3] , ISO 3166-1 [5] and IANA-registered identifiers is defined in RFC 1766, Tags for the

Identification of Languages, which PDF uses as the basis for its language identifiers ISO 639-2 [24] defines three-letter language identifiers that are not allowed under RFC 1766

B.3 Recommendations for capturing or converting documents to PDF/A

This Best Practices statement offers recommendations for converting documents to PDF/A format for archival preservation, ensuring that the resulting files maintain their quality and integrity Archival institutions and organizations with long-term preservation needs should promote the use of Level A conformance as outlined in section 5.1, along with the additional guidelines provided.

ISO 15489-1:2001 emphasizes the importance of organizations creating and maintaining authentic, reliable, and usable records to support ongoing business operations, comply with regulatory requirements, and ensure accountability It is crucial for organizations to protect the integrity of these records for as long as necessary.

The regulatory framework for submitting documents to an archival institution encompasses requirements, standards, and policies for electronic documents, which include quality rules like minimum image resolution and compression restrictions It is crucial to avoid processes that alter or dispose of approved data To ensure archival preservation, the quality and integrity of documents must be maintained in compliance with legal and regulatory requirements when they are captured or converted to PDF/A format.

To address the essential requirement for archiving, it is crucial that PDF/A capture or conversion methods accurately reproduce the original document's content and quality in the compliant file Below are examples of software development guidelines that achieve this objective.

Writers of conforming files must avoid lossy compression, subsampling, downsampling, or any processes that alter the content or degrade the quality of the source data.

Software should not replace the original scanned text in bit-mapped images with searchable text generated by optical character recognition when converting documents from paper or image formats to conforming files.

NOTE Optical character recognition processes may involve loss of data through imprecise interpretation of scanned characters

[1] ANSI X3.4, Information Systems — Coded Character Sets — 7-Bit American National Standard Code for Information Interchange (7-Bit ASCII)

[2] ECMA-6, 7-Bit coded Character Set, Available from Internet

[3] ISO 639-1, Codes for the representation of names of languages — Part 1: Alpha-2 code

[4] ISO 2108, Information and documentation — International standard book number (ISBN)

[5] ISO 3166-1, Codes for the representation of names of countries and their subdivisions — Part 1: Country codes

[6] ISO/IEC 10646-1:2000/Amd 1:2002, Amendment 1: Mathematical symbols and other characters

[7] ISO/IEC 10646-2:2001, Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 2: Supplementary Planes

[8] ISO/IEC 14492:2001, Information technology — Lossy/lossless coding of bi-level images

NOTE This International Standard is equivalent to ITU-T Recommendation T.88, Information technology — Coded representation of picture and audio information — Lossy/lossless coding of bi-level images

[9] ISO 15489-1:2001, Information and documentation — Records management — Part 1: General

[10] ISO/TR 15801, Electronic imaging — Information stored electronically — Recommendations for trustworthiness and reliability

[11] ISO 15930-6, Graphic technology — Prepress digital data exchange using PDF — Part 6: Complete exchange of printing data suitable for colour-managed workflows using PDF 1.4 (PDF/X-3)

[12] ISO/TR 18492, Long-term preservation of electronic document-based information

ISO 18509-1 outlines the specifications for electronic archival storage, focusing on the design and operation of information processing systems It emphasizes the importance of ensuring the integrity and long-term accessibility of recordings stored within these systems.

Ngày đăng: 12/04/2023, 18:17