2017 m. rugsėjo 28 d., ketvirtadienis

xml

James Clark invented the name Extensible Markup Language and its abbreviation XML. He has been quoted as saying, "XML isn't going to win any prizes for technical elegance. But it was just simple enough that it could get broad acceptance, and it has just enough Standard Generalized Markup Language (SGML) stuff in it that the SGML community felt they could embrace it." (SGML is a standard indicating how to specify a document language or tag set.)


The XML Recommendation defines a methodology for tag creation. It specifies neither tag semantics nor a specific tag set. That is, XML specifies structure, not meaning. You can define an infinite number of markup languages based on the XML Recommendation standards.


Once defined, tags are mixed with character data to form an "XML document." An XML document can take numerous forms. For example, it can be a logical structure within a computer program or an external file in the traditional sense. Likewise, an XML document can be sent as a data stream, reported as a database result set, or dynamically generated by one application and sent to another.

 XML is case sensitive.

Elements and tags

Element;
<Today> Breaking news </Today>

Tag
<Today>

Also closed tag:
<Today/>

Atribute
An attribute is a property of an element.
The attribute includes both the name and the value.
<Today answer="42">Some info</Today>

Well-Formed 
  •  Every start-tag must have a matching end-tag, or be a self-closing tag.
  •  Tags can’t overlap; elements must be properly nested.
  •  XML documents can have only one root element.
  •  Element names must obey XML naming conventions.
  •  XML is case sensitive.
  •  XML will keep whitespace in your PCDATA.

Well-formedness is the minimum requirement necessary for an XML document. It includes various syntactic constraints, such as every start-tag must have a matching end-tag and the document must have exactly one root element. If a document is not well formed, it is not an XML document. Parsers that encounter a malformed document are required to report the error and stop parsing.

XML Declaration
The following are all legal XML declarations.

<?xml version="1.0"?>
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<?xml version="1.0" standalone="yes"?>

Each XML declaration has up to three attributes.

  • version*: the version of XML in use. Currently this always has the value 1.0, though there may be an XML 1.1 in the future. 
  • encoding (optional): the character set in which the document is written.
  • standalone (optional): whether or not the external DTD subset makes important contributions to the document's infoset. No is the default.
    • yes specifies that this document exists entirely on its own, without depending on any other files.
    • no indicates that the document may depend on an external DTD (DTDs are covered in Chapter 4).
System Identifier
<? Some instruction ?>

DTD - Document Type Definitions
<!DOCTYPE name [
<!ELEMENT name (first, middle, last)>
<!ELEMENT first (#PCDATA)>
<!ELEMENT middle (#PCDATA)>
<!ELEMENT last (#PCDATA)>
]>

CDATA
Everything starting after the <![CDATA[ and ending at the ]]> is ignored by the parser.

XMLNamespaces
The key is the xmlns:ppp attribute (xmlns stands for XML NameSpace).

We can declare more than one namespace for an element, but only one can be the default.

Immediately following the XML header is the Document Type Declaration,
<!DOCTYPE name [

Built-In Entities
&amp;—the & character
&lt;—the < character
&gt;—the > character
&apos;—the ' character
&quot;—the " character

XML Schemas

W3C XML Sc allow you to describe the structure for an XML document. 

XML Schema divides data types into two broad categories: simple and complex.

Examples:
<schema xmlns="http://www.w3.org/2001/XMLSchema">
or
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
or
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">



Definitions:

Markup
The term  has its origins in the publishing industry. In traditional publishing, markup happens after the writing is complete but before the book goes to typesetting. An editor annotates the text with handwritten instructions for the typesetter. These instructions, which specify the layout, are known as markup. Many contemporary word processing programs insert electronic markup automatically as the user creates the text.

XML 
Is an open-specification, platform-independent, extensible, and increasingly successful profile of the Standard Generalized Markup Language (SGML) [ISO 8879].


Well-Formed Documents"Well formed" has an exact meaning in XML. A well-formed document adheres to the syntax rules specified by the XML 1.0 Recommendation. . If the document is not well-formed and an error appears in the XML syntax, the XML processor stops and reports the presence of a fatal error.

DTD
Document Type Definitions (DTDs) are important in data exchange. Parties exchanging data must agree on a format, and a DTD allows the specification of that format.

The DTD defines the grammar and vocabulary of a markup language, specifying what is and what is not allowed to appear in a document—for example, which tags can appear in the document and how they must nest within one another.

This grammar is known as a Document Type Definition. A DTD defines the allowable building blocks of an XML document; that is, it defines the document structure with a list of permissible elements, attributes, nestings, and so on.
<?xml version="1.0"?><!DOCTYPE memo SYSTEM "memo.dtd">

The term "PCDATA," which stands for "Parsed Character DATA,"

CDATA
CDATA is text that will notbe analyzed by a parser, except to look for the magic CDATA termination string
<![CDATA[<greeting>Hello, world!</greeting>]]>

Namespaces
The intent of XML namespaces is to eliminate naming conflicts in XML documents that contain element types and attributes from multiple XML languages.
<?xml version="1.0"?><name xmlns:family='http://mypage.com/classification'>
  • The prefix "xmlns" is used only for namespace bindings and is not itself bound to any namespace name.

The Extensible Stylesheet Language (XSL)
W3C Recommendation [XSL] defines the XSL language for expressing stylesheets. XSL builds on the prior work on Cascading Style Sheets Level 1 [CSS1] and Level 2 [CSS2] and the Document Style Semantics and Specification Language [ISO 10179].

XSL was developed to give designers greater control over needed features during pagination of documents and to provide an equivalent "frame"-based structure for browsing on the Web. It also incorporates the use of the XLink language [Xlink] to insert elements into XML documents that create and describe links between resources.

XML Schema 
A schema is a model for describing the structure of information.
XML Schema is a way of describing the allowable syntax of XML
xmlns:xs="htp://www.w3.org/2001/XMLSchema"

An XML processor
Is more commonly called a parser.

PCDATA
Parsed Character DATA,

whitespace
Space character, new lines (what you get when you hit the Enter key), and tabs.
XML elements are treated just as for the HTML <PRE> tag.
 “readability” whitespace is called extraneous whitespace, which whitespace is not actually part of an element’s PCDATA,

A URI (Uniform Resource Identifier) 
  • URL (Uniform Resource Locator)
  • URN (Uniform Resource Name)
    • They exist to provide a persistent, location example: urn:foo:a123,456

Resources:

XML formater without validation - http://www.webtoolkitonline.com/xml-formatter.html