Standard Generalized Markup Language

The Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 defines generalized markup:-

Generalized markup is based on two postulates:

  • Markup should be declarative: it should describe a document's structure and other attributes, rather than specify the processing to be performed on it. Declarative markup is less likely to conflict with unforeseen future processing needs and techniques.
  • Markup should be rigorous so that the techniques available for processing rigorously-defined objects like programs and databases can be used for processing documents as well.

HTML was theoretically an example of an SGML-based language until HTML 5, which browsers cannot parse as SGML for compatibility reasons.

DocBook SGML and LinuxDoc are examples which were used almost exclusively with actual SGML tools.

Standard Generalized Markup Language
SGML
Filename extension.sgml
Internet media typeapplication/sgml, text/sgml
Uniform Type Identifier (UTI)public.xml
Developed byISO
Type of formatMarkup Language
Extended fromGML
Extended toHTML, XML
StandardISO 8879

Standard versions

SGML is an ISO standard: "ISO 8879:1986 Information processing – Text and office systems – Standard Generalized Markup Language (SGML)", of which there are three versions:

  • Original SGML, which was accepted in October 1986, followed by a minor Technical Corrigendum.
  • SGML (ENR), in 1996, resulted from a Technical Corrigendum to add extended naming rules allowing arbitrary-language and -script markup.
  • SGML (ENR+WWW or WebSGML), in 1998, resulted from a Technical Corrigendum to better support XML and WWW requirements.

SGML is part of a trio of enabling ISO standards for electronic documents developed by ISO/IEC JTC1/SC34[1][2] (ISO/IEC Joint Technical Committee 1, Subcommittee 34 – Document description and processing languages) :

  • SGML (ISO 8879)—Generalized markup language
    • SGML was reworked in 1998 into XML, a successful profile of SGML. Full SGML is rarely found or used in new projects.
  • DSSSL (ISO/IEC 10179)—Document processing and styling language based on Scheme.
    • DSSSL was reworked into W3C XSLT and XSL-FO which use an XML syntax. Nowadays, DSSSL is rarely used in new projects apart from Linux documentation.
  • HyTime—Generalized hypertext and scheduling.[3]
    • HyTime was partially reworked into W3C XLink. HyTime is rarely used in new projects.

SGML is supported by various technical reports, in particular

  • ISO/IEC TR 9573 – Information processing – SGML support facilities – Techniques for using SGML[4]
    • Part 13: Public entity sets for mathematics and science
      • In 2007, the W3C MathML working group agreed to assume the maintenance of these entity sets.

History

SGML descended from IBM's Generalized Markup Language (GML), which Charles Goldfarb, Edward Mosher, and Raymond Lorie developed in the 1960s. Goldfarb, editor of the international standard, coined the “GML” term using their surname initials.[5] Goldfarb also wrote the definitive work on SGML syntax in "The SGML Handbook".[6] The syntax of SGML is closer to the COCOA format. As a document markup language, SGML was originally designed to enable the sharing of machine-readable large-project documents in government, law, and industry. Many such documents must remain readable for several decades—a long time in the information technology field. SGML also was extensively applied by the military, and the aerospace, technical reference, and industrial publishing industries. The advent of the XML profile has made SGML suitable for widespread application for small-scale, general-purpose use.

OED-LEXX-Bungler
A fragment of the Oxford English Dictionary (1985), showing SGML markup

Document validity

SGML (ENR+WWW) defines two kinds of validity. According to the revised Terms and Definitions of ISO 8879 (from the public draft[7]):

A conforming SGML document must be either a type-valid SGML document, a tag-valid SGML document, or both. Note: A user may wish to enforce additional constraints on a document, such as whether a document instance is integrally-stored or free of entity references.

A type-valid SGML document is defined by the standard as

An SGML document in which, for each document instance, there is an associated document type declaration (DTD) to whose DTD that instance conforms.

A tag-valid SGML document is defined by the standard as

An SGML document, all of whose document instances are fully tagged. There need not be a document type declaration associated with any of the instances. Note: If there is a document type declaration, the instance can be parsed with or without reference to it.

Terminology

Tag-validity was introduced in SGML (ENR+WWW) to support XML which allows documents with no DOCTYPE declaration but which can be parsed without a grammar or documents which have a DOCTYPE declaration that makes no XML Infoset contributions to the document. The standard calls this fully tagged. Integrally stored reflects the XML requirement that elements end in the same entity in which they started. Reference-free reflects the HTML requirement that entity references are for special characters and do not contain markup. SGML validity commentary, especially commentary that was made before 1997 or that is unaware of SGML (ENR+WWW), covers type-validity only.

The SGML emphasis on validity supports the requirement for generalized markup that markup should be rigorous. (ISO 8879 A.1)

Syntax

An SGML document may have three parts:

  1. the SGML Declaration,
  2. the Prologue, containing a DOCTYPE declaration with the various markup declarations that together make a Document Type Definition (DTD), and
  3. the instance itself, containing one top-most element and its contents.

An SGML document may be composed from many entities (discrete pieces of text). In SGML, the entities and element types used in the document may be specified with a DTD, the different character sets, features, delimiter sets, and keywords are specified in the SGML Declaration to create the concrete syntax of the document.

Although full SGML allows implicit markup and some other kinds of tags, the XML specification (s4.3.1) states:

Each XML document has both a logical and a physical structure. Physically, the document is composed of units called entities. An entity may refer to other entities to cause their inclusion in the document. A document begins in a "root" or document entity. Logically, the document is composed of declarations, elements, comments, character references, and processing instructions, all of which are indicated in the document by explicit markup.

For introductory information on a basic, modern SGML syntax, see XML. The following material concentrates on features not in XML and is not a comprehensive summary of SGML syntax.

Optional features

SGML generalizes and supports a wide range of markup languages as found in the mid 1980s. These ranged from terse Wiki-like syntaxes to RTF-like bracketed languages to HTML-like matching-tag languages. SGML did this by a relatively simple default reference concrete syntax augmented with a large number of optional features that could be enabled in the SGML Declaration. Not every SGML parser can necessarily process every SGML document. Because each processor's System Declaration can be compared to the document's SGML Declaration it is always possible to know whether a document is supported by a particular processor.

Many SGML features relate to markup minimization. Other features relate to concurrent (parallel) markup (CONCUR), to linking processing attributes (LINK), and to embedding SGML documents within SGML documents (SUBDOC).

The notion of customizable features was not appropriate for Web use, so one goal of XML was to minimize optional features. However XML's well-formedness rules cannot support Wiki-like languages, leaving them unstandardized and difficult to integrate with non-text information systems.

Concrete and abstract syntaxes

The usual (default) SGML concrete syntax resembles this example, which is the default HTML concrete syntax:

<QUOTE TYPE="example">
  typically something like <ITALICS>this</ITALICS>
</QUOTE>

SGML provides an abstract syntax that can be implemented in many different types of concrete syntax. Although the markup norm is using angle brackets as start- and end- tag delimiters in an SGML document (per the standard-defined reference concrete syntax), it is possible to use other characters—provided a suitable concrete syntax is defined in the document's SGML declaration.[8] For example, an SGML interpreter might be programmed to parse GML, wherein the tags are delimited with a left colon and a right full stop, thus, an :e prefix denotes an end tag: :xmp.Hello, world:exmp.. According to the reference syntax, letter-case (upper- or lower-) is not distinguished in tag names, thus the three tags: (i) <quote>, (ii) <QUOTE>, and (iii) <quOtE> are equivalent. (NOTE: A concrete syntax might change this rule via the NAMECASE NAMING declarations).

Markup minimization

SGML has features for reducing the number of characters required to mark up a document, which must be enabled in the SGML Declaration. SGML processors need not support every available feature, thus allowing applications to tolerate many types of inadvertent markup omissions; however, SGML systems usually are intolerant of invalid structures. XML is intolerant of syntax omissions, and does not require a DTD for checking well-formedness.

OMITTAG

Both start tags and end tags may be omitted from a document instance, provided:

  1. the OMITTAG feature is enabled in the SGML Declaration,
  2. the DTD indicates that the tags are permitted to be omitted,
  3. (for start tags) the element has no associated required (#REQUIRED) attributes, and
  4. the tag can be unambiguously inferred by context.

For example, if OMITTAG YES is specified in the SGML Declaration (enabling the OMITTAG feature), and the DTD includes the following declarations:

<!ELEMENT chapter - - (title, section+)>
<!ELEMENT title o o (#PCDATA)>
<!ELEMENT section - - (title, subsection+)>

then this excerpt:

<chapter>Introduction to SGML
<section>The SGML Declaration
<subsection>
...

which omits two <title> tags and two </title> tags, would represent valid markup.

Note also that omitting tags is optional – the same excerpt could be tagged like this:

<chapter><title>Introduction to SGML</title>
<section><title>The SGML Declaration</title>
<subsection>
...

and would still represent valid markup.

Note: The OMITTAG feature is unrelated to the tagging of elements whose declared content is EMPTY as defined in the DTD:

<!ELEMENT image - o EMPTY>

Elements defined like this have no end tag, and specifying one in the document instance would result in invalid markup. This is syntactically different than XML empty elements in this regard.

SHORTREF

Tags can be replaced with delimiter strings, for a terser markup, via the SHORTREF feature. This markup style is now associated with wiki markup, e.g. wherein two equals-signs (==), at the start of a line, are the “heading start-tag”, and two equals signs (==) after that are the “heading end-tag”.

SHORTTAG

SGML markup languages whose concrete syntax enables the SHORTTAG VALUE feature, do not require attribute values containing only alphanumeric characters to be enclosed within quotation marks—either double " " (LIT) or single ' ' (LITA)—so that the previous markup example could be written:

<QUOTE TYPE=example>
  typically something like <ITALICS>this</>
</QUOTE>

One feature of SGML markup languages is the "presumptuous empty tagging", such that the empty end tag </> in <ITALICS>this</> "inherits" its value from the nearest previous full start tag, which, in this example, is <ITALICS> (in other words, it closes the most recently opened item). The expression is thus equivalent to <ITALICS>this</ITALICS>.

NET

Another feature is the NET (Null End Tag) construction: <ITALICS/this/, which is structurally equivalent to <ITALICS>this</ITALICS>.

Other features

Additionally, the SHORTTAG NETENABL IMMEDNET feature allows shortening tags surrounding an empty text value, but forbids shortening full tags:

<QUOTE></QUOTE>

can be written as

<QUOTE// <!-- not a typo! -->

wherein the first slash ( / ) stands for the NET-enabling “start-tag close” (NESTC), and the second slash stands for the NET. NOTE: XML defines NESTC with a /, and NET with an > (angled bracket)—hence the corresponding construct in XML appears as <QUOTE/>.

The third feature is 'text on the same line', allowing a markup item to be ended with a line-end; especially useful for headings and such, requiring using either SHORTREF or DATATAG minimization. For example, if the DTD includes the following declarations:

<!ELEMENT lines (line*)
<!ELEMENT line O - (#PCDATA)>
<!ENTITY   line-tagc  "</line>">
<!SHORTREF one-line "&#RE;&#RS;" line-tagc>
<!USEMAP   one-line line>

(and "&#RE;&#RS;" is a short-reference delimiter in the concrete syntax), then:

<lines>
first line
second line
</lines>

is equivalent to:

<lines>
<line>first line</line>
<line>second line</line>
</lines>

Formal characterization

SGML has many features that defied convenient description with the popular formal automata theory and the contemporary parser technology of the 1980s and the 1990s. The standard warns in Annex H:

The SGML model group notation was deliberately designed to resemble the regular expression notation of automata theory, because automata theory provides a theoretical foundation for some aspects of the notion of conformance to a content model. No assumption should be made about the general applicability of automata to content models.

A report on an early implementation of a parser for basic SGML, the Amsterdam SGML Parser,[9] notes

the DTD-grammar in SGML must conform to a notion of unambiguity which closely resembles the LL(1) conditions

and specifies various differences.

There appears to be no definitive classification of full SGML against a known class of formal grammar. Plausible classes may include tree-adjoining grammars and adaptive grammars.

XML is described as being generally parsable like a two-level grammar for non-validated XML and a Conway-style pipeline of coroutines (lexer, parser, validator) for valid XML.[10] The SGML productions in the ISO standard are reported to be LL(3) or LL(4).[11] XML-class subsets are reported to be expressible using a W-grammar.[12] According to one paper,[13] and probably considered at an information set or parse tree level rather than a character or delimiter level:

The class of documents that conform to a given SGML document grammar forms an LL(1) language. … The SGML document grammars by themselves are, however, not LL(1) grammars.

The SGML standard does not define SGML with formal data structures, such as parse trees, however, an SGML document is constructed of a rooted directed acyclic graph (RDAG) of physical storage units known as “entities”, which is parsed into a RDAG of structural units known as “elements”. The physical graph is loosely characterized as an entity tree, but entities might appear multiple times. Moreover, the structure graph is also loosely characterized as an element tree, but the ID/IDREF markup allows arbitrary arcs.

The results of parsing can also be understood as a data tree in different notations; where the document is the root node, and entities in other notations (text, graphics) are child nodes. SGML provides apparatus for linking to and annotating external non-SGML entities.

The SGML standard describes it in terms of maps and recognition modes (s9.6.1). Each entity, and each element, can have an associated notation or declared content type, which determines the kinds of references and tags which will be recognized in that entity and element. Also, each element can have an associated delimiter map (and short reference map), which determines which characters are treated as delimiters in context. The SGML standard characterizes parsing as a state machine switching between recognition modes. During parsing, there is a stack of maps that configure the scanner, while the tokenizer relates to the recognition modes.

Parsing involves traversing the dynamically-retrieved entity graph, finding/implying tags and the element structure, and validating those tags against the grammar. An unusual aspect of SGML is that the grammar (DTD) is used both passively — to recognize lexical structures, and actively — to generate missing structures and tags that the DTD has declared optional. End- and start- tags can be omitted, because they can be inferred. Loosely, a series of tags can be omitted only if there is a single, possible path in the grammar to imply them. It was this active use of grammars that made concrete SGML parsing difficult to formally characterize.

SGML uses the term validation for both recognition and generation. XML does not use the grammar (DTD) to change delimiter maps or to inform the parse modes, and does not allow tag omission; consequently, XML validation of elements is not active in the sense that SGML validation is active. SGML without a DTD (e.g. simple XML), is a grammar or a language; SGML with a DTD is a metalanguage. SGML with an SGML declaration is, perhaps, a meta-metalanguage, since it is a metalanguage whose declaration mechanism is a metalanguage.

SGML has an abstract syntax implemented by many possible concrete syntaxes, however, this is not the same usage as in an abstract syntax tree and as in a concrete syntax tree. In the SGML usage, a concrete syntax is a set of specific delimiters, while the abstract syntax is the set of names for the delimiters. The XML Infoset corresponds more to the programming language notion of abstract syntax introduced by John McCarthy.

Derivatives

XML

The W3C XML (Extensible Markup Language) is a profile (subset) of SGML designed to ease the implementation of the parser compared to a full SGML parser, primarily for use on the World Wide Web. In addition to disabling many SGML options present in the reference syntax (such as omitting tags and nested subdocuments) XML adds a number of additional restrictions on the kinds of SGML syntax. For example, despite enabling SGML shortened tag forms, XML does not allow unclosed start or end tags. It also relied on many of the additions made by the WebSGML Annex. XML currently is more widely used than full SGML. XML has lightweight internationalization based on Unicode. Applications of XML include XHTML, XQuery, XSLT, XForms, XPointer, JSP, SVG, RSS, Atom, XML-RPC, RDF/XML, and SOAP.

HTML

While HTML was developed partially independently and in parallel with SGML, its creator, Tim Berners-Lee, intended it to be an application of SGML. The design of HTML (Hyper Text Markup Language) was therefore inspired by SGML tagging, but, since no clear expansion and parsing guidelines were established, most actual HTML documents are not valid SGML documents. Later, HTML was reformulated (version 2.0) to be more of an SGML application, however, the HTML markup language has many legacy- and exception- handling features that differ from SGML's requirements. HTML 4 is an SGML application that fully conforms to ISO 8879 – SGML.[14]

The charter for the 2006 revival of the World Wide Web Consortium HTML Working Group says, "the Group will not assume that an SGML parser is used for 'classic HTML'".[15] Although HTML syntax closely resembles SGML syntax with the default reference concrete syntax, HTML5 abandons any attempt to define HTML as an SGML application, explicitly defining its own parsing rules,[16] which more closely match existing implementations and documents. It does, however, define an alternative XHTML serialization, which conforms to XML and therefore to SGML as well.[17]

OED

The second edition of the Oxford English Dictionary (OED) is entirely marked up with an SGML-based markup language using the LEXX (text editor)[18]

The third edition is marked up as XML.

Others

Other document markup languages are partly related to SGML and XML, but—because they cannot be parsed or validated or other-wise processed using standard SGML and XML tools—they are not considered either SGML or XML languages; the Z Format markup language for typesetting and documentation is an example.

Several modern programming languages support tags as primitive token types, or now support Unicode and regular expression pattern-matching. An example is the Scala programming language.

Applications

Document markup languages defined using SGML are called "applications" by the standard; many pre-XML SGML applications were proprietary property of the organizations which developed them, and thus unavailable in the World Wide Web. The following list is of pre-XML SGML applications.

  • Text Encoding Initiative (TEI) is an academic consortium that designs, maintains, and develops technical standards for digital-format textual representation applications.
  • DocBook is a markup language originally created as an SGML application, designed for authoring technical documentation; DocBook currently is an XML application.
  • CALS (Continuous Acquisition and Life-cycle Support) is a US Department of Defense (DoD) initiative for electronically capturing military documents and for linking related data and information.
  • HyTime defines a set of hypertext-oriented element types that allow SGML document authors to build hypertext and multimedia presentations.
  • EDGAR (Electronic Data-Gathering, Analysis, and Retrieval) system effects automated collection, validation, indexing, acceptance, and forwarding of submissions, by companies and others, who are legally required to file data and information forms with the US Securities and Exchange Commission (SEC).
  • LinuxDoc. Documentation for Linux packages has used the LinuxDoc SGML DTD and Docbook XML DTD.
  • AAP DTD is a document type definition for scientific documents, defined by the Association of American Publishers.
  • SGMLguid was an early SGML document type definition created, developed and used at CERN.

Open-source implementations

Significant open-source implementations of SGML have included:

  • ASP-SGML
  • ARC-SGML, by Standard Generalized Markup Language Users', 1991, C language
  • SGMLS, by James Clark, 1993, C language
  • Project YAO, by Yuan-ze Institute of Technology, Taiwan, with Charles Goldfarb, 1994, object
  • SP by James Clark, C++ language

SP and Jade, the associated DSSSL processors, are maintained by the OpenJade project, and are common parts of Linux distributions. A general archive of SGML software and materials resides at SUNET. The original HTML parser class, in Sun System's implementation of Java, is a limited-features SGML parser, using SGML terminology and concepts.

See also

References

  1. ^ ISO. "JTC 1/SC 34 – Document description and processing languages". ISO. Retrieved 2009-12-25.
  2. ^ ISO JTC1/SC34. "JTC 1/SC 34 – Document Description and Processing Languages". Retrieved 2009-12-25.
  3. ^ ISO/IEC 10744 – Hytime
  4. ^ "ISO/IEC TR 9573" (PDF). ISO. 1991. Retrieved 5 December 2017.
  5. ^ Goldfarb, Charles F. (1996). "The Roots of SGML – A Personal Recollection". Retrieved July 7, 2007.
  6. ^ Goldfarb, Charles F. (1990). "The SGML Handbook".
  7. ^ Terms and Definitions of ISO 8879 draft
  8. ^ Wohler, Wayne (July 21, 1998). "SGML Declarations". Retrieved August 17, 2009.
  9. ^ Egmond (December 1989). "The Implementation of the Amsterdam SGML Parser" (PDF).
  10. ^ Carroll, Jeremy J. (November 26, 2001). "CoParsing of RDF & XML" (PDF). Hewlett-Packard. Retrieved October 9, 2009.
  11. ^ "SGML: Grammar Productions".
  12. ^ "Re: Other whitespace problems was Re: Whitespace rules (v2)".
  13. ^ Bruggemann-Klein. "Compiler-Construction Tools and Techniques for SGML parsers: Difficulties and Solutions".
  14. ^ "HTML 4–4 Conformance: requirements and recommendations". Retrieved 2009-12-30.
  15. ^ Lilley, Chris; Berners-Lee, Tim (February 6, 2009). "HTML Working Group Charter". Retrieved April 19, 2007.
  16. ^ "HTML5 — Parsing HTML documents". World Wide Web Consortium. October 28, 2014. Retrieved June 29, 2015.
  17. ^ Dubost, Karl (January 15, 2008). "HTML 5, one vocabulary, two serializations". Questions & Answers blog. W3C. Retrieved February 25, 2009.
  18. ^ Cowlishaw, M. F. (1987). "LEXX—A programmable structured editor". IBM Journal of Research and Development. IBM. 31 (1): 73. doi:10.1147/rd.311.0073.

External links

.sgm

.sgm may refer to the following file formats:

Standard Generalized Markup Language

Encoded Archival Description Document, an XML standard for encoding archival finding aids, maintained by the Library of Congress and the Society of American Archivists

SoftQuad XMetaL File, for the SoftQuad Software XMetaL

Visual Boy Advance Saved State File, VisualBoyAdvance

Charles Goldfarb

Charles F. Goldfarb is known as the father of Standard Generalized Markup Language (SGML) and grandfather of HTML and the World Wide Web. He co-invented the concept of markup languages.

In 1969 Charles Goldfarb, leading a small team at IBM, developed the first markup language, called Generalized Markup Language, or GML. Goldfarb coined the term GML, an initialism for the three researchers, Charles Goldfarb, Ed Mosher and Ray Lorie who worked on the project.In 1974, he designed SGML and subsequently wrote the first SGML parser, ARCSGML. Goldfarb went on working to turn SGML into the ISO 8879 standard, and served as its editor in the standardization committee.

Goldfarb holds a J.D. from Harvard Law School. He worked at IBM's Almaden Research Center for many years and is now an independent consultant based in Belmont, California.

Empty element

An empty element may be:

An empty HTML element, one with tag(s) but no content (HTML element § Empty element)

An empty XML element, one with tag(s) but no content (XML § Key terminology)

An empty SGML element, one with tag(s) but no content (Standard Generalized Markup Language § EMPTY)

Frank E. Grizzard Jr.

Frank E. Grizzard Jr., is an American historian, writer, and documentary editor. He was born in 1954 in Emporia, Virginia, graduating from Greensville County High School in 1971. He earned B.A. degrees in history and religious studies from the Virginia Commonwealth University, and M.A. and Ph.D. degrees in history from the University of Virginia. His doctoral dissertation, Documentary History of the Construction of the Buildings at the University of Virginia, 1817–1828dead link], consisting of a lengthy narrative and more than 1,750 documents chronicling the construction of Thomas Jefferson's architectural masterpiece, the Academical Village, became the first electronic dissertation to be placed online when it was completed in 1996. The dissertation was tagged in the Standard Generalized Markup Language (SGML) while Grizzard was a fellow at the University of Virginia's Institute for Advanced Technologies in the Humanities (IATH).

Grizzard spent fifteen years at The Papers of George Washington editorial project at the University of Virginia, editing volumes in the Revolutionary War Series and overseeing the project's computer initiatives. While at the Washington Papers, he was responsible for placing online the 39-volume edition of The Writings of George Washington from the Original Manuscript Sources 1745–1799, edited between 1931 and 1944 by John C. Fitzpatrick, the Assistant Chief of the Manuscripts Division of the Library of Congress. (Fitzpatrick's Writings of Washington, justly celebrated as the "first systematic effort to transcribe and publish the entirety of Washington’s personal papers," is being superseded by the more comprehensive Papers of George Washington.)

Grizzard served on the Board of Directors of the Albemarle Charlottesville Historical Society (1999–2007) and as the Society's Vice-President (2002–2003), President (2003–2005), and Past-President (2005–2007). In 2004 Grizzard joined the Board of Directors of the Prism, a Virginia nonprofit music association that for 40 years hosted acoustic Americana and World Music at its intimate Coffeehouse in Charlottesville. For several years Grizzard hosted radio programs at WTJU in Charlottesville, including "The Old Home Place" (a traditional and gospel Bluegrass show), and "Just 'Nuther" (a 3-hour artist showcase of various genres).In addition to his own writings and work on historical documentary editions, Grizzard has written two encyclopedias, edited two history journals, and has been responsible for bringing massive collections of historical documents to the internet. The Association for Documentary Editing awarded Grizzard its Distinguished Service Award, in 1999, for his contributions to the Association's computer initiatives.Grizzard left the University of Virginia in 2005 to set up the Lee Family Digital Archive (LFDA), a long-term project aimed at creating an online edition of the papers of the prominent Lee family of Virginia. The LFDA was affiliated with Washington and Lee University before moving to Stratford Hall, which now administers the site. As Director of the LFDA—which is producing a historical edition covering about 350 years of American history—Grizzard oversaw all aspects of the project, including the search for Lee-related documents; the transcription, annotation, and electronic markup of documents; and project fundraising.At both the University of Virginia and at Washington and Lee University, Grizzard coordinated lecture series bringing together more than three dozen prominent historians and writers to speak about various historical subjects, including George Washington, Thomas Jefferson, Colonial Jamestown, and Robert E. Lee.

IBM Generalized Markup Language

Generalized Markup Language (GML) is a set of macros that implement intent-based (procedural) markup tags for the IBM text formatter, SCRIPT. SCRIPT/VS is the main component of IBM's Document Composition Facility (DCF). A starter set of tags in GML is provided with the DCF product.

ISO/IEC JTC 1/SC 34

ISO/IEC JTC 1/SC 34, Document description and processing languages is a subcommittee of the ISO/IEC JTC1 joint technical committee, which is a collaborative effort of both the International Organization for Standardization and the International Electrotechnical Commission, which develops and facilitates standards within the field of document description and processing languages. The international secretariat of ISO/IEC JTC 1/SC 34 is the Japanese Industrial Standards Committee (JISC) located in Japan.

MDO

MDO may refer to:

MDO (band), a Puerto Rican boy band, a spin-off from Menudo

Marine Diesel Oil, a type of fuel oil used in the maritime field

Medium density overlay, a type of plywood

Multidisciplinary design optimization, a field of engineering

Mixed-domain oscilloscope, a kind of oscilloscope that allows combining information from time and frequency domain

MOD - Markup Declaration Open: " MIL-STD-2361

This military standard established the Standard Generalized Markup Language (SGML) and the Extensible

Markup Language (XML) requirements for use in Army digital publications. Within this military standard, Army

publications SGML/XML requirements are separated by publication types. There are specified sections

for administrative publications, training and doctrine publications, technical and equipment publications

and Global Combat Support System-Army (GCSS-A). This new publication of the standard contains the

XML requirements for Technical Manuals (TM) developed in accordance with the functional requirements

contained in MIL-STD-40051-1 and MIL-STD-40051-2, GCSS-A collection and reporting of maintenance data developed in

accordance with MIL-STD-3008, and administrative publications developed in accordance with AR 25-30.

The XML requirements are applicable for the development, acquisition, and delivery of Electronic and

Interactive Electronic Publications (EP/IEP) such as Electronic and Interactive Electronic Technical Manuals

(ETM/IETM). The previous SGML for training and doctrine publications functional requirements, developed

in accordance with TRADOC Reg 350-70 and TRADOC Reg 25-36, remain unchanged. Specific Interactive

Multimedia Instruction (IMI) functionality is currently contained in MIL-PRF-29612, The Development

and Acquisition of Training Data Products and TRADOC Reg 350-70, Systems Approach to Training

Management, Processes, and Products.

National Digital Library Program

The Library of Congress National Digital Library Program (NDLP) is assembling a digital library of reproductions of primary source materials to support the study of the history and culture of the United States. Begun in 1995 after a five-year pilot project, the program began digitizing selected collections of Library of Congress archival materials that chronicle the nation's rich cultural heritage. In order to reproduce collections of books, pamphlets, motion pictures, manuscripts and sound recordings, the Library has created a wide array of digital entities: bitonal document images, grayscale and color pictorial images, digital video and audio, and searchable e-texts. To provide access to the reproductions, the project developed a range of descriptive elements: bibliographic records, finding aids, and introductory texts and programs, as well as indexing the full texts for certain types of content.

The reproductions were produced with a variety of tools: image scanners, digital cameras, devices that digitize audio and video, and human labor for rekeying and encoding texts. American Memory employs national-standard and well established industry-standard formats for many digital reproductions, e.g., texts encoded with Standard Generalized Markup Language (SGML) and images stored in Tagged Image File Format (TIFF) files or compressed with the Joint Photographic Experts Group (JPEG) algorithm. In other cases, the lack of well established standards has led to the use of emerging formats, e.g., RealAudio (for audio), QuickTime (for moving images), and MrSID (for maps). Technical information by types of material and by individual collections is also available at this site.

PCDATA

Parsed Character Data (PCDATA) is a data definition that originated in Standard Generalized Markup Language (SGML), and is used also in Extensible Markup Language (XML) Document Type Definition (DTD) to designate mixed content XML elements.

SDIF (disambiguation)

SDIF is the Sound Description Interchange Format.

SDIF or S.D.I.F. may also refer to:

Six Days in Fallujah, a video game by Atomic Games

Standard Generalized Markup Language, SGML Document Interchange Format (SDIF), ISO 9069:1988

SGML entity

In the Standard Generalized Markup Language (SGML), an entity is a primitive data type, which associates a string with either a unique alias (such as a user-specified name) or an SGML reserved word (such as #DEFAULT). Entities are foundational to the organizational structure and definition of SGML documents. The SGML specification defines numerous entity types, which are distinguished by keyword qualifiers and context. An entity string value may variously consist of plain text, SGML tags, and/or references to previously-defined entities. Certain entity types may also invoke external documents. Entities are called by reference.

Scientigo

Scientigo is a United States company based in Charlotte, North Carolina that began asserting patent claims over XML technology in 2005. Since SGML (Standard Generalized Markup Language), from which XML (Extensible Markup Language) is derived, dates from the 1960s, and the patents were applied for in 1997, the notion that Scientigo's patents cover XML has been rejected by patent attorneys and other commentators including Microsoft.The company has purchased a 51% stake in the URL find.com. According to Scientigo-issued press releases, they plan to use find.com to utilize their tigo|search technology.Scientigo received a BERTL award in 2006 for document content retrieval and search technologies.

Steven DeRose

Steven J DeRose (born 1960) is a computer scientist noted for his contributions to Computational Linguistics and to key standards related to document processing, mostly around ISO's Standard Generalized Markup Language (SGML) and W3C's Extensible Markup Language (XML).

His contributions include the following:

HyTime

Text Encoding Initiative

XPath –- editor

XPointer –- editor

XLink –- editor

OSIS—chairman

XMLHe served as Chief Scientist of the Scholarly Technology Group, and Adjunct Associate Professor of Computer Science, at Brown University. While there he received NSF and NEH grants and contributed heavily to the Open eBook and Encoded Archival Description standards. Previously, he was co-founder and Chief Scientist at Electronic Book Technologies, Inc., where he designed the first SGML browser (Dynatext), which earned 11 US Patents and won Seybold. and other awards.

His 1987 article with James Coombs and Allen Renear, "Markup Systems and the Future of Scholarly Text Processing", is a seminal source for the theory of markup systems, and has been widely cited and reprinted.

The article "What is Text, Really?" has also been widely cited and reprinted, and led to several follow-on articles In addition, he has published 2 books (Making Hypermedia Work: A User's Guide to HyTime and The SGML FAQ Book); as well as articles in a variety of journals, magazines, and proceedings.

He has given papers and tutorials at the ACM Hypertext Conference and various SGML and XML conferences, a keynote address at the ACM Conference on Very Large DataBases (VLDB), and a plenary talk at the Text Encoding Initiative 10 Conference.In Computational Linguistics, he is known for pioneering the use of dynamic programming methods for part-of-speech tagging (DeRose 1988, 1990).

Structured document

A structured document is an electronic document where some method of embedded coding, such as mark-up, is used to give the whole, and parts, of the document various structural meanings according to a schema. A structured document whose mark-up doesn't break the schema and is designed to conform to and which obeys the syntax rules of its mark-up language is "well-formed".

The Standard Generalized Markup Language (SGML) has pioneered the concept of structured documents

As of 2009 the most widely used markup language, in all its evolving forms, is HTML, which is used to structure documents according to various Document Type Definition (DTD) schema defined and described by the W3C, which continually reviews, refines and evolves the specifications.

XML is the universal format for structured documents and data on the Web

XHTML

Extensible Hypertext Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used Hypertext Markup Language (HTML), the language in which Web pages are formulated.

While HTML, prior to HTML5, was defined as an application of Standard Generalized Markup Language (SGML), a flexible markup language framework, XHTML is an application of XML, a more restrictive subset of SGML. XHTML documents are well-formed and may therefore be parsed using standard XML parsers, unlike HTML, which requires a lenient HTML-specific parser.XHTML 1.0 became a World Wide Web Consortium (W3C) Recommendation on January 26, 2000. XHTML 1.1 became a W3C Recommendation on May 31, 2001. The standard known as XHTML5 is being developed as an XML adaptation of the HTML5 specification.

Yuri Rubinsky

Yuri Ivan Rubinsky was a writer, software executive, and well known promoter of the Standard Generalized Markup Language (SGML), which was the basis for the now-ubiquitous XML. In Canada, he is probably best known as founding co-director of the influential Banff Publishing Workshop and for his work in applying technology to help visually impaired people. He died at age 43 on January 21, 1996 after suffering a massive and unexpected heart attack at his home in Toronto, Canada. The Yuri Rubinsky Memorial Award was created posthumously in his memory.

ISO standards by standard number
1–9999
10000–19999
20000+

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.