XML Information Set

XML Information Set (XML Infoset) is a W3C specification describing an abstract data model of an XML document in terms of a set of information items.[1] The definitions in the XML Information Set specification are meant to be used in other specifications that need to refer to the information in a well-formed XML document.

An XML document has an information set if it is well-formed and satisfies the namespace constraints. There is no requirement for an XML document to be valid in order to have an information set.

An information set can contain up to eleven different types of information items:

  1. The Document Information Item (always present)
  2. Element Information Items
  3. Attribute Information Items
  4. Processing Instruction Information Items
  5. Unexpanded Entity Reference Information Items
  6. Character Information Items
  7. Comment Information Items
  8. The Document Type Declaration Information Item
  9. Unparsed Entity Information Items
  10. Notation Information Items
  11. Namespace Information Items

XML was initially developed without a formal definition of its infoset. This was only formalised by later work beginning in 1999, first published as a separate W3C Working Draft at the end of December that year.[2] Infoset recommendation Second Edition was adopted on 4 February, 2004.[3] If a 2.0 version of the XML standard is ever published, it is likely that this would absorb the Infoset recommendation as an integral part of that standard.

Infoset augmentation

Infoset augmentation or infoset modification refers to the process of modifying the infoset during schema validation, for example by adding default attributes. The augmented infoset is called the post-schema-validation infoset, or PSVI. [4]

Infoset augmentation is somewhat controversial, with claims that it is a violation of modularity and tends to cause interoperability problems, since applications get different information depending on whether or not validation has been performed. [5]

Infoset augmentation is supported by XML Schema but not RELAX NG.


Typically, XML Information Set is serialized as XML.[6] There are also serialization formats for Binary XML, CSV,[7] and JSON.[8]

See also

XML Information Set instances:


  1. ^ W3C XML Infoset
  2. ^ "XML Information Set" (Working Draft ed.). W3C. 20 December 1999.
  3. ^ "XML Information Set" (Second ed.). W3C. 4 February 2004.
  4. ^ XML Schema 1.1 Part 1: Structures
  5. ^ RELAX NG and W3C XML Schema Archived September 27, 2007, at the Wayback Machine, James Clark, 4 Jun 2002
  6. ^ "Extensible Markup Language (XML)". W3C. Retrieved 9 October 2014.
  7. ^ XmlCsvReader Implementation
  8. ^ Apache CXF JSON Support

External links

David Megginson

David Megginson (born 1964) is a Canadian computer software consultant and developer, specializing in open-source software development and application. He was the lead developer and original maintainer of the Simple API for XML, or SAX, a leading streaming API for XML.

Megginson has been part of the SGML and then XML communities since 1991.

For the World Wide Web Consortium, he served as chair of the XML Information Set Working Group and as a member of both the XML Working Group and XML Co-ordination Group.

In 2000, Sun Microsystems and JavaPro magazine awarded Megginson the Java Technology Achievement Award For Outstanding Individual Contribution to the Java Community .

He made significant contributions to other open source software projects including FlightGear, a cross-platform flight simulator making use of XML, the NewsML Toolkit library for NewsML, the XMLWriter libraries for Perl and Java, RDF Filter, and SGMLSpm, a mid-1990s precursor to many XML functionalities.

He is also known for providing the first response to Andrew S. Tanenbaum's "Linux is obsolete." Usenet post which then evolved into the famous Tanenbaum-Torvalds debate.

He is an instrument-rated private pilot, and maintains weblogs about technology and small-plane aviation. Formerly employed by the University of Ottawa, he maintains his consulting and development practice in Ottawa, Ontario.

Fast Infoset

Fast Infoset (or FI) is an international standard that specifies a binary encoding format for the XML Information Set (XML Infoset) as an alternative to the XML document format. It aims to provide more efficient serialization than the text-based XML format.

FI is effectively a lossless compression, analogous to gzip, for XML, except that while the original formatting is lost, no information is lost in the conversion from XML to FI, and back to XML. While the purpose of compression is to reduce physical data size, FI aims to optimize both document size and processing performance.

The Fast Infoset specification is defined by both the ITU-T and the ISO standards bodies. FI is officially defined in ITU-T Rec. X.891 and ISO/IEC 24824-1, and entitled Fast Infoset. The standard was published by ITU-T on May 14, 2005, and by ISO on May 4, 2007. The Fast Infoset standard document can be downloaded from the ITU website. Though the document does not assert intellectual property (IP) restrictions on implementation or use, page ii warns that it has received notices and the subject may not be completely free of IP assertions.

A common misconception is that FI requires ASN.1 tool support. Although the formal specification uses ASN.1 notation, the standard includes Encoding Control Notation (ECN) and ASN.1 tools are not required by implementations.

An alternative to FI is FleXPath.


ISO/IEC JTC 1/SC 6 Telecommunications and information exchange between systems is a standardization subcommittee of the Joint Technical Committee ISO/IEC JTC 1 of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), that develops and facilitates standards within the field of telecommunications and information exchange between systems. ISO/IEC JTC 1/SC 6 was established in 1964, following the creation of a Special Working Group under ISO/TC 97 on Data Link Control Procedures and Modem Interfaces. The international secretariat of ISO/IEC JTC 1/SC 6 is the Korean Agency for Technology and Standards (KATS), located in the Republic of Korea.

Information set

Information set may refer to:

Information set (game theory), in game theory, a set that, for a particular player, establishes all the possible moves that could have taken place in the game so far, given what that player has observed

XML Information Set or Infoset, a W3C specification dealing with XML documents

List of XML markup languages

This is a list of notable XML markup languages.

List of web service specifications

There are a variety of specifications associated with web services. These specifications are in varying degrees of maturity and are maintained or supported by various standards bodies and entities. These specifications are the basic web services framework established by first-generation standards represented by WSDL, SOAP, and UDDI. Specifications may complement, overlap, and compete with each other. Web service specifications are occasionally referred to collectively as "WS-*", though there is not a single managed set of specifications that this consistently refers to, nor a recognized owning body across them all.

"WS-*" is a prefix used to indicate specifications associated with web services and there exist many WS-* standards including WS-Addressing, WS-Discovery, WS-Federation, WS-Policy, WS-Security, and WS-Trust. This page includes many of the specifications that might be considered a part of "WS-*".


SOAP (abbreviation for Simple Object Access Protocol) is a messaging protocol specification for exchanging structured information in the implementation of web services in computer networks. Its purpose is to provide extensibility, neutrality and independence. It uses XML Information Set for its message format, and relies on application layer protocols, most often Hypertext Transfer Protocol (HTTP) or Simple Mail Transfer Protocol (SMTP), for message negotiation and transmission.

SOAP allows processes running on disparate operating systems (such as Windows and Linux) to communicate using Extensible Markup Language (XML). Since Web protocols like HTTP are installed and running on all operating systems, SOAP allows clients to invoke web services and receive responses independent of language and platforms.

World Wide Web Consortium

The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web (abbreviated WWW or W3).

Founded and currently led by Tim Berners-Lee, the consortium is made up of member organizations which maintain full-time staff for the purpose of working together in the development of standards for the World Wide Web. As of 19 November 2018, the World Wide Web Consortium (W3C) has 476 members.The W3C also engages in education and outreach, develops software and serves as an open forum for discussion about the Web.


XInclude is a generic mechanism for merging XML documents, by writing inclusion tags in the "main" document to automatically include other documents or parts thereof. The resulting document becomes a single composite XML Information Set. The XInclude mechanism can be used to incorporate content from either XML files or non-XML text files.


Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The W3C's XML 1.0 Specification and several other related specifications—all of them free open standards—define XML.The design goals of XML emphasize simplicity, generality, and usability across the Internet. It is a textual data format with strong support via Unicode for different human languages. Although the design of XML focuses on documents, the language is widely used for the representation of arbitrary data structures such as those used in web services.

Several schema systems exist to aid in the definition of XML-based languages, while programmers have developed many application programming interfaces (APIs) to aid the processing of XML data.

XML-binary Optimized Packaging

XML-binary Optimized Packaging (XOP) is a mechanism defined for the serialization of XML Information Sets (infosets) that contain binary data, as well as deserialization back into the XML Information Set.

XML schema

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.

There are languages developed specifically to express XML schemas. The document type definition (DTD) language, which is native to the XML specification, is a schema language that is of relatively limited capability, but that also has other uses in XML aside from the expression of schemas. Two more expressive XML schema languages in widespread use are XML Schema (with a capital S) and RELAX NG.

The mechanism for associating an XML document with a schema varies according to the schema language. The association may be achieved via markup within the XML document itself, or via some external means.

XML tree

XML documents have a hierarchical structure and can conceptually be interpreted as a tree structure, called an XML tree.

XML documents must contain a root element (one that is the parent of all other elements). All elements in an XML document can contain sub elements, text and attributes. The tree represented by an XML document starts at the root element and branches to the lowest level of elements.

Although there is no consensus on the terminology used on XML Trees, at least two standard terminologies have been released by the W3C:

The terminology used in the XPath Data Model

The terminology used in the XML Information Set.XPath defines a syntax named XPath expressions that identifies one or more internal components (elements, attributes, etc.) of an XML document. XPath is widely used to accesses XML-encoded data.

The XML Information Set, or XML infoset, describes an abstract data model for XML documents in terms of information items. It is often used in the specifications of XML languages, for its convenience in describing constraints on constructs those languages allow.

XQuery and XPath Data Model

The XQuery and XPath Data Model (XDM) is the data model shared by the XPath 2.0, XSLT 2.0 and XQuery programming languages. It is a W3C recommendation and forms an integral part of all three languages. Originally, it was based on the XPath 1.0 data model which in turn is based on the XML Information Set.

The XDM consists of flat sequences of zero or more items of different types. Items can be typed or untyped and include atomic values as well as XML nodes (with elements, attributes and text nodes, etc.). Instances of the XDM can optionally be XML schema-validated.

Products and

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.