XML Notes
XML Notes
XML BASICS
What is XML?
XML stands for eXtensible Markup Language. It is a popular markup language used to store
and transport structured data in a human-readable and machine-readable format. XML was
designed to be self-descriptive, which means it allows users to define their own tags,
elements, and document structure, making it highly flexible and customizable for various data
representation purposes.
The syntax of XML is based on a set of rules that define how elements should be structured.
Each XML document consists of a prologue, which includes the XML declaration, and the
document's root element. The content within the document is enclosed within tags, which
come in pairs: an opening tag and a closing tag. The opening tag contains the element name,
while the closing tag contains the same name but is preceded by a forward slash ("/").
For example:
XML is widely used in various applications and industries, including web development (e.g.,
RSS feeds, configuration files), data exchange between different platforms and systems, as
well as in representing hierarchical data structures in databases and documents. XML has
been a foundational technology for web services like SOAP (Simple Object Access Protocol),
but newer technologies like JSON have become more popular for certain use cases due to
their simplicity and compactness.
Points to remember:-
o XML was created to provide an easy to use and store self-describing data.
o XML tags are not predefined. You must define your own tags.
There are three important characteristics of XML that make it useful in a variety of
systems and solutions −
XML is extensible − XML allows you to create your own self-descriptive tags,
or language, that suits your application.
XML carries the data, does not present it − XML allows you to store the
data irrespective of how it will be presented.
What is Mark-up?
Mark-up refers to the practice of adding special annotations or tags to a text document to
provide additional information about the structure, formatting, or semantics of the content.
The purpose of mark-up is to instruct how the document should be displayed, processed, or
understood by various systems, applications, or users.
In mark-up languages, specific symbols or keywords (mark-up tags) are inserted within the
text to define the elements and their attributes. These tags are typically enclosed within angle
brackets ("<" and ">") and come in pairs: an opening tag and a closing tag. The opening tag
contains the element's name, and the closing tag includes the same name preceded by a
forward slash ("/"). The content that falls between the opening and closing tags is affected by
the markup's instructions.
5. Rich Media: Mark-up is employed in defining rich media content, such as SVG
(Scalable Vector Graphics), which uses XML-based markup to describe vector
graphics.
Mark-up languages play a crucial role in enabling the interoperability of data and content
across different platforms, devices, and applications. They provide a standardized way of
representing information and allow computers to interpret and process the data accurately.
Different markup languages cater to specific use cases, and the choice of markup language
depends on the requirements and the context in which it will be used.
History of XML
XML's history dates back to the late 1960s and early 1970s when the need for a standardized
way of representing and exchanging data across different systems and platforms emerged.
However, the real development of XML as we know it today began in the late 1990s. Here's a
brief history of XML:
1. SGML (Standard Generalized Markup Language): The roots of XML can be
traced back to SGML, a standard for defining markup languages. SGML was
introduced in the 1980s as a standard for defining the structure of documents with
markup tags. SGML allowed the definition of custom tags and was used in various
industries, including publishing and documentation.
2. HTML (Hypertext Markup Language): In the early 1990s, Tim Berners-Lee
developed HTML as a subset of SGML to create documents for the World Wide Web.
HTML provided a way to structure web pages using predefined tags, making it easier
to create and display content on the early web browsers.
3. The Need for a More Flexible Standard: As the web evolved, the limitations of
HTML became evident. There was a growing need for a more flexible and extensible
markup language that could represent a wide range of data and be easily parsed by
different systems. This led to the development of XML.
4. XML 1.0 Specification: In 1996, the World Wide Web Consortium (W3C) formed
the XML Working Group to develop a standard for XML. In February 1998, the first
official XML 1.0 specification was released, defining the syntax rules and guidelines
for creating XML documents. XML allowed users to create their own custom tags,
making it suitable for various data representation needs.
5. Adoption and Application: XML quickly gained popularity due to its versatility and
ease of use. It became the preferred format for data interchange and storage, with
applications in web services, configuration files, data exchange between applications,
and more.
6. XPath, XSLT, and Other XML Technologies: Over time, various XML-related
technologies were developed to complement XML's capabilities. XPath was
introduced as a query language for navigating XML documents, while XSLT enabled
the transformation of XML data into different formats. These technologies further
enhanced the usability and power of XML.
7. JSON's Emergence: Despite XML's widespread adoption, in the mid-2000s, a new
data interchange format called JSON (JavaScript Object Notation) gained popularity
due to its simplicity and compactness. JSON became the preferred format for certain
use cases, particularly in web APIs, due to its more straightforward syntax and smaller
data size compared to XML.
Despite the rise of JSON, XML continues to be used extensively, especially in domains that
require more complex data structures and where data self-description is critical. XML's rich
tooling and support for schema validation make it valuable in various industries, and it
remains an essential part of the web and data interchange technologies.
Origins of XML
The origins of XML (eXtensible Markup Language) can be traced back to the mid-1970s,
with the development of SGML (Standard Generalized Markup Language). SGML was the
first standardized markup language, introduced in the early 1980s, and it served as the
foundation for XML.
Here's a brief timeline of the key events leading to the development of XML:
1. SGML (Standard Generalized Markup Language):
In the late 1960s and early 1970s, the need arose for a standardized way to
define the structure of documents to ensure interoperability and information
exchange across different systems.
Charles F. Goldfarb, Ed Mosher, and Ray Lorie, working at IBM, started the
development of SGML in the mid-1970s.
SGML was designed to be a meta-mark-up language, allowing users to define
their own document types (mark-up languages) through Document Type
Definitions (DTDs).
It was standardized in 1986 as ISO 8879:1986, providing a formal
specification for representing the structure of documents using tags.
2. HTML (Hypertext Markup Language):
In the early 1990s, the World Wide Web was born, and there was a need for a
markup language to structure web content.
Tim Berners-Lee, a British computer scientist, developed HTML as a
simplified and practical application of SGML to create web pages and link
documents together.
HTML allowed the use of predefined tags for structuring text, images, links,
and other elements, making it accessible to non-experts.
3. The Need for More Flexible Data Representation:
As the web and internet technologies advanced, it became evident that HTML
had limitations in representing structured data beyond basic web content.
There was a growing need for a more extensible and versatile markup
language that could represent various types of data and allow users to define
custom document structures.
4. XML's Development:
In 1996, the World Wide Web Consortium (W3C) formed the XML Working
Group to develop a new markup language that addressed the limitations of
HTML and provided a standardized way to represent data.
The XML 1.0 specification was released in February 1998, introducing XML
as a simplified and more flexible version of SGML.
XML allowed users to define their own tags and document structures, making
it ideal for representing and exchanging a wide range of data types.
Unlike SGML, XML was more focused on simplicity and ease of use, which
contributed to its widespread adoption.
5. XML's Adoption and Growth:
XML quickly gained popularity due to its versatility and potential applications
in various domains, including data interchange, web services, configuration
files, and more.
Over time, additional XML-related technologies were developed, such as
XPath, XSLT, and XML Schema, enhancing XML's capabilities and usability.
Today, XML remains an essential part of the web and various industries, particularly where
data self-description and structured data representation are critical. It has influenced the
development of other markup languages, including XHTML (an XML-based version of
HTML) and specific domain-specific XML languages used in various sectors.
Applications of XML
XML (eXtensible Markup Language) is a versatile markup language with a wide range of
applications in various industries and domains. Some of the key applications of XML
include:
1. Data Interchange and Integration: XML is commonly used for data interchange
and integration between different systems, applications, and platforms. It provides a
standardized and self-descriptive format for representing structured data, making it
easier to exchange information across different systems.
2. Web Services: XML serves as the backbone for many web services and APIs
(Application Programming Interfaces). Web services use XML to send and receive
data in a format that can be easily understood and processed by different
programming languages.
3. Configuration Files: Many software applications and systems use XML for
configuration files. These files allow users to customize settings, preferences, and
parameters without altering the application's code.
4. RSS Feeds: XML is commonly used for creating RSS (Really Simple Syndication)
feeds, which allow websites to publish regularly updated content in a standardized
format. RSS feeds enable users to subscribe to content updates from their favourite
websites.
5. Document Mark-up and Authoring: XML can be used for structuring and marking
up documents, allowing authors to define the document's hierarchical structure,
headings, paragraphs, lists, and other elements.
6. Database and Data Storage: XML is employed in databases and data storage
systems to represent and store structured data. It provides a flexible way to model
complex data structures and relationships.
7. Metadata and Semantics: XML can be used to define and express metadata and
semantic information about documents, web resources, and data elements. This helps
in enhancing the discoverability and understanding of content.
8. Industry-Specific Standards: Many industries have adopted XML-based standards
to facilitate data exchange and communication
9. Cross-Platform Compatibility: XML's platform-independent nature makes it ideal
for exchanging data between different operating systems, programming languages,
and devices.
10. Healthcare and Electronic Medical Records (EMR): XML is utilized in the
healthcare industry for creating standardized electronic medical records and
exchanging patient data securely between healthcare providers.
11. Publishing and Content Management: XML is widely used in publishing
workflows, content management systems, and digital publishing to ensure
consistency, reusability, and easy content transformation.
12. Geospatial Data: In GIS (Geographic Information Systems) and geospatial
applications, XML is used for representing and sharing geographic data in a
structured format.
Overall, XML's flexibility, self-descriptiveness, and human-readability make it an
excellent choice for various data representation and interchange scenarios. While newer
formats like JSON have gained popularity for specific use cases, XML continues to be a
fundamental technology in many industries due to its robustness and rich tooling support.
1. Extensibility: The "X" in XML stands for "extensible," meaning users can define their own
tags and document structures to represent data in a way that suits their specific needs. This
flexibility allows XML to adapt to diverse data representation requirements.
2. Self-Descriptive: XML documents are self-descriptive, as they contain both the data and the
metadata defining the structure of the data. XML tags provide meaningful names for
elements, making it easier for humans and systems to understand the data's meaning and
relationships.
6. Data Validation: XML documents can be associated with XML Schema or Document Type
Definitions (DTDs) to define the rules and constraints that the data must adhere/follow to.
This validation ensures data consistency and correctness.
7. Data Transformation: XML can be transformed into other formats, such as HTML, using
technologies like XSLT (eXtensible Stylesheet Language Transformations). This feature is
valuable for presenting XML data in different ways for various applications.
8. Interoperability: XML enables seamless data exchange between different systems and
applications, interoperability and integration between distinct software solutions.
10. Versioning Support: XML provides built-in support for versioning, allowing users to evolve
their data representation over time without breaking existing implementations.
11. Industry-Specific Standards: XML has been adopted in many industries to create domain-
specific standards for data exchange. This standardization facilitates efficient communication
and data sharing within specific domains.
12. Metadata Support: XML allows the inclusion of metadata within the document, providing
additional information about the content, its origin, and other relevant details.
Overall, XML's features and advantages make it a versatile and widely used language for
representing structured data in a human-readable and machine-readable format. While newer
formats like JSON have gained popularity for certain use cases, XML remains an essential technology
in various industries due to its robustness, tooling support, and ability to handle complex data
structures.
Disadvantages of XML
While XML (eXtensible Markup Language) offers several benefits, it also has some
disadvantages that should be considered when choosing it as a data representation format.
Here are some of the main disadvantages of XML:
1. Verbose Syntax: XML's syntax can be quite verbose, leading to larger file sizes
compared to more compact formats like JSON. This verbosity can impact data
transfer times and storage requirements, especially for large datasets.
2. Parsing Overhead: Parsing XML documents can be computationally more expensive
than parsing simpler formats like JSON. The need to process nested elements and
attributes can result in increased parsing overhead, affecting performance in resource-
constrained environments.
3. Complexity: XML's extensibility and flexibility come at the cost of increased
complexity. Defining complex document structures with nested elements and
attributes can become harder to manage, especially for users unfamiliar with XML.
4. Redundancy: XML documents can be verbose and include redundant information,
leading to increased data size and inefficiency. The use of opening and closing tags
for every element, even when the content is empty, contributes to this redundancy.
5. Lack of Native Data Types: XML does not have native data types, such as integers
or booleans, unlike some other data formats like JSON. As a result, all data in XML is
represented as strings, requiring additional parsing and conversions when using the
data in programming languages.
6. Less Compact than Binary Formats: XML is a text-based format, which means it
may not be as compact as binary formats for representing certain types of data. In
scenarios where data size and transfer speed are critical, binary formats may be more
efficient.
7. Limited Support for Metadata: While XML allows for metadata to be included in
documents, the support for standardized metadata formats is less prevalent compared
to some other data formats like JSON-LD (JSON for Linked Data) or RDF (Resource
Description Framework).
8. Parsing Errors Handling: Handling parsing errors in XML can be more challenging
than in simpler formats, as nested structures and complex document hierarchies can
lead to harder-to-diagnose issues when errors occur.
9. Processing Overhead: XML processing can require significant memory and
processing resources, especially for large documents or when working with XML
documents in real-time streaming scenarios.
10. Alternative Formats: The popularity of other data interchange formats like JSON
has grown significantly due to their simplicity and efficiency in certain use cases. As
a result, some developers and systems may prefer these alternatives over XML for
specific applications.
HTML XML
2. HTML stands for Hyper Text Markup XML stands for Extensible Markup
Language. Language.
7. HTML can ignore small errors. XML does not allow errors.
10
HTML tags are predefined tags. XML tags are user-defined tags.
.
12
HTML does not preserve white spaces. White space can be preserved in XML.
.
13 HTML tags are used for displaying the XML tags are used for describing the
. data. data not for displaying.
15
HTML is used to display the data. XML is used to store data.
.
16 HTML does not carry data it just XML carries the data to and from the
. displays it. database.
20 Some of the tools used for HTML are: Some of the tools used for XML are:
.
Visual Studio Code Oxygen XML
Atom XML Notepad
Notepad++ Liquid Studio
Sublime Text and many more.
and many more.
Components of XML with example
XML documents are composed of several components that define the structure and content of
the data being represented. The main components of an XML document are as follows:
Example:
Example:
<book>
<title>Sample Book</title>
<author>John Doe</author>
</book>
3. Start Tag and End Tag: A start tag (also known as an opening tag) is used to begin
an element, and an end tag (also known as a closing tag) is used to close the element.
The content between the start and end tags represents the data or nested elements
associated with the element.
Example:
<book>
<!-- Content goes here -->
</book>
Example:
Shaktimaan
</book>
5. Text Content: Text content is the data enclosed within an element. It can include
plain text, numbers, or any other character data.
Example:
<title>Sample Book</title>
Example: //jgjkhkjhgkhk
/*dsfsdf
*/
7. CDATA Section: A CDATA section is used to include blocks of text that should be
treated as character data and not be parsed as XML markup.
Example:
Example:
<![CDATA[
<text>
line breaks.
</text>
1. Prologue: The prologue is the first line of the XML document, declaring the version
(1.0) and encoding (UTF-8) used in the document.
2. Comments: There are two comment sections in the XML, providing explanatory
notes to readers and developers.
4. Attributes: The <book> and <magazine> elements have attributes category and
lang.
5. Text Content: The elements <title>, <author>, <price>, <issue>, and <editorial>
contain text content representing various data values.
6. CDATA Section: The <description> element contains a CDATA section, preserving
the text as character data, including special characters like &.
This example demonstrates how XML components can be combined to create a well-formed
and structured XML document, allowing for the representation of different data elements in a
self-descriptive and easily readable manner.
If the XML declaration is omitted, a processor will make certain assumptions about your
document. In particular, it will expect it to be encoded in UTF-8, an encoding of the Unicode
character set. However, it is best to use the XML declaration wherever possible, both to avoid
confusion over the character encoding and to indicate to processors which version of XML
you're using.
Example:
2. Root Element: The root element is the outermost element in the XML document. It
acts as the container for all other elements and serves as the starting point for the
document's hierarchical structure. There can be only one root element in an XML
document.
Example:
<root>
</root>
The second line of the example begins an element, which has been named authors. The
contents of that element include everything between the right angle bracket (>)
in <authors> and the left angle bracket (<) in </authors>. The actual syntactic
constructs <authors> and </authors> are often referred to as the element start tag and end
tag, respectively. Do not confuse tags with elements! Note that elements may include other
elements, as well as text. An XML document must contain exactly one root element, which
contains all other content within the document. The name of the root element defines the type
of the XML document.
Elements that contain both text and other elements simultaneously are classified as mixed
content. The sample authors document uses elements named person to describe the authors
themselves. Each person element has an attribute named id. Unlike elements, attributes can
contain only textual content. Their values must be surrounded by quotes. Either single quotes
(') or double quotes (") may be used, as long as you use the same kind of closing quote as the
opening one.
Within XML documents, attributes are frequently used for metadata (i.e., "data about data"),
describing properties of the element's contents.
On the other hand, the information presented to an application by an XML processor upon
reading the following two lines will be different for each animal element, because the
ordering of elements is significant:
<animal><name>dog</name><legs>4</legs></animal>
<animal><legs>4</legs><name>dog</name></animal>
XML treats a set of attributes like a bunch of stuff in a bag ? there is no implicit ordering ?
while elements are treated like items on a list, where ordering matters.
4. Well-Formedness
An XML document that conforms to the rules of XML syntax is known as well-formed. At
its most basic level, well-formedness means that elements should be properly matched, and
all opened elements should be closed.
Table A-1 shows some XML documents that are not well-formed.
The elements are not properly nested, because foo is closed while inside its child
<foo>
element bar.
<bar>
</foo>
</bar>
<foo>
<bar> The bar element was not closed before its parent, foo, was closed.
</foo>
<foo baz> The baz attribute has no value. While this is permissible in HTML (e.g., <table
<foo
The baz attribute value, 23, has no surrounding quotes. Unlike HTML, all
baz=23>
attribute values must be quoted in XML.
</foo>
5. Comments
The start of a comment is indicated with <!--, and the end of the comment is indicated with --
>. Any sequence of characters, aside from the string --, may appear within a comment.
Comments tend to be used more in XML documents intended for human consumption than
those intended for machine consumption. Comments aren't widely used in RSS.
6. Entity References
Another feature of XML that is occasionally useful when writing RSS documents is the
mechanism for escaping characters.
Because some characters have special significance in XML, there needs to be a way to
represent them. For example, in some cases the < symbol might really be intended to mean
"less than," rather than to signal the start of an element name. Clearly, just inserting the
character without any escaping mechanism would result in a poorly formed document,
because a processing application would assume you were starting another element. Another
instance of this problem is needing to include both double quotes and single quotes
simultaneously in an attribute's value. Here's an example that illustrates both these
difficulties:
<badDoc>
<para>
</para>
</badDoc>
XML avoids this problem by the use of the predefined entity reference. The word entity in the
context of XML simply means a unit of content. The term entity reference means just that, a
symbolic way of referring to a certain unit of content. XML predefines entities for the
following symbols: left angle bracket (<), right angle bracket (>), apostrophe ('), double quote
("), and ampersand (&).
An entity reference is introduced with an ampersand (&), which is followed by a name (using
the word "name" in its formal sense, as defined by the XML 1.0 specification), and
terminated with a semicolon (;).
Table A-2 shows how the five predefined entities can be used within an XML document.
< <
< >
` '
" "
& &
<badDoc>
<para>
</para>
</badDoc>
7. Character References
Character references allow you to denote a character by its numeric position in Unicode
character set (this position is known as its code point). Table A-3 contains a few examples
that illustrate the syntax.
1 0
A A
~ Ñ
® ®
UTF-8
UTF-16
UTF stands for UCS Transformation Format, and UCS itself means Universal
Character Set. The number 8 or 16 refers to the number of bits used to represent a
character. They are either 8(1 to 4 bytes) or 16(2 or 4 bytes). For the documents
without encoding information, UTF-8 is set by default.
AD
Syntax
Encoding type is included in the prolog section of the XML document. The syntax for
UTF-8 encoding is as follows −
<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
The syntax for UTF-16 encoding is as follows −
<?xml version = "1.0" encoding = "UTF-16" standalone = "no" ?>
Example
Following example shows the declaration of encoding −
<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
In the above example encoding="UTF-8", specifies that 8-bits are used to represent
the characters. To represent 16-bit characters, UTF-16 encoding can be used.
The XML files encoded with UTF-8 tend to be smaller in size than those encoded
with UTF-16 format.
9. Validity
In addition to well-formedness, XML 1.0 offers another level of verification, called validity.
To explain why validity is important, let's take a simple example. Imagine you invented a
simple XML format for your friends' telephone numbers:
<phonebook>
<person>
<name>Albert Smith</name>
<number>123-456-7890</number>
</person>
<person>
<name>Bertrand Jones</name>
<number>456-123-9876</number>
</person>
</phonebook>
Based on your format, you also construct a program to display and search your phone
numbers. This program turns out to be so useful, you share it with your friends. However,
your friends aren't so accurate on detail as you are, and they try to feed your program this
phone book file:
<phonebook>
<person>
<name>Melanie Green</name>
<phone>123-456-7893</phone>
</person>
</phonebook>
Note that, although this file is perfectly well-formed, it doesn't fit the format you prescribed
for the phone book, and you find you need to change your program to cope with this
situation. If your friends had used number as you did to denote the phone number, and
not phone, there wouldn't have been a problem. However, as it is, this second file is not a
valid phonebook document.
10. XML Namespaces
XML 1.0 lets developers create their own elements and attributes, but it leaves open the
potential for overlapping names. "Title" in one context may mean something entirely
different than "Title" in a different context. The "Namespaces in XML" specification
provides a mechanism developers can use to identify particular vocabularies using Uniform
Resource Identifiers (URIs).
XML Example
XML documents create a hierarchical structure looks like a tree so it is known as XML Tree
that starts at "the root" and branches to "the leaves".
The first line is the XML declaration. It defines the XML version (1.0) and the encoding used
(ISO-8859-1 = Latin-1/West European character set).
The next line describes the root element of the document (like saying: "this document is a
note"):
1. <note>
The next 4 lines describe 4 child elements of the root (to, from, heading, and body).
1. <to>Tove</to>
2. <from>Jani</from>
3. <heading>Reminder</heading>
4. <body>Don't forget me this weekend!</body>
And finally the last line defines the end of the root element.
1. </note>
XML documents must contain a root element. This element is "the parent" of all other
elements.
The elements in an XML document form a document tree. The tree starts at the root and
branches to the lowest level of the tree.
1. <root>
2. <child>
3. <subchild>.....</subchild>
4. </child>
5. </root>
The terms parent, child, and sibling are used to describe the relationships between elements.
Parent elements have children. Children on the same level are called siblings (brothers or
sisters).
All elements can have text content and attributes (just like in HTML).
1. <bookstore>
2. <book category="COOKING">
3. <title lang="en">Everyday Italian</title>
4. <author>Giada De Laurentiis</author>
5. <year>2005</year>
6. <price>30.00</price>
7. </book>
8. <book category="CHILDREN">
9. <title lang="en">Harry Potter</title>
10. <author>J K. Rowling</author>
11. <year>2005</year>
12. <price>29.99</price>
13. </book>
14. <book category="WEB">
15. <title lang="en">Learning XML</title>
16. <author>Erik T. Ray</author>
17. <year>2003</year>
18. <price>39.95</price>
19. </book>
20. </bookstore>
The root element in the example is <bookstore>. All elements in the document are contained
within <bookstore>.
The <book> element has 4 children: <title>,< author>, <year> and <price>.
XML - Declaration
XML declaration contains details that prepare an XML processor to parse the XML
document. It is optional, but when used, it must appear in the first line of the XML document.
Syntax
<?xml
version = "version_number"
encoding = "encoding_declaration"
standalone = "standalone_status"
?>
Each parameter consists of a parameter name, an equals sign (=), and parameter value inside
a quote. Following table shows the above syntax in detail −
Encoding UTF-8, UTF-16, ISO- It defines the character encoding used in the
10646-UCS-2, ISO- document. UTF-8 is the default encoding used.
10646-UCS-4, ISO-8859-
1 to ISO-8859-9, ISO-
2022-JP, Shift_JIS, EUC-
JP
Rules
An XML declaration should abide with the following rules −
If the XML declaration is present in the XML, it must be placed as the first line in the
XML document.
If the XML declaration is included, it must contain version number attribute.
The Parameter names and values are case-sensitive.
The names are always in lower case.
The order of placing the parameters is important. The correct order is: version,
encoding and standalone.
Either single or double quotes may be used.
The XML declaration has no closing tag i.e. </?xml>
<?xml >
The root element is the only element in the XML document that is not nested inside any other
element. All other elements must be contained within the root element, either directly or indirectly
through other elements.
<rootElement>
</rootElement>
In this example, <rootElement> is the root element. All other elements in the
XML document will be nested within this root element. The root element gives
the XML document its structure and serves as the starting point for traversing
and accessing the data within the document. It defines the context for all the
data elements in the XML document.
<library>
<book>
<title>Sample Book</title>
<author>John Doe</author>
<ISBN>123456789</ISBN>
</book>
<book>
<title>Another Book</title>
<author>Jane Smith</author>
<ISBN>987654321</ISBN>
</book>
</library>
In this example, the root element is <library>. It is the outermost element and acts as the container
for all the other elements in the XML document. All the <book> elements are nested within the
<library> element.
<price>29.99</price>
text
attributes
other elements
or a mix of the above
<bookstore>
<book category="children">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
<title>, <author>, <year>, and <price> have text content because they
contain text (like 29.99).
Empty elements are useful when representing data that doesn't require additional nested elements
or when defining attributes without any content. They are commonly used in XML documents to
indicate the presence of specific data points or properties without providing additional details.
<emptyElement />
In the first example, <emptyElement /> is a standalone empty element. It doesn't contain
any child elements or text content.
In the second example, <book ISBN="123456789" /> is an empty element representing a
book with an ISBN attribute. It doesn't have any nested elements or text content but
includes an attribute named "ISBN" with the value "123456789."
In the third example, the <book> element is used as an empty element within a parent
element <library>. This structure allows multiple empty <book> elements to be included
under the <library> element, each representing a different book with its own set of
attributes.
Empty elements are a convenient way to represent simple data points or attributes in XML
without the need for additional nested elements or content. They help maintain a clear and
concise representation of data, especially when certain elements only require minimal
information.
1. Element Name Start Character: The first character of an element name must be
a letter (A-Z or a-z) or an underscore ("_"). It cannot start with a number or any
other special character.
2. Element Name Characters: After the first character, the element name can
include letters, numbers, underscores, hyphens, and periods. Special characters
like spaces, commas, and other punctuation marks are not allowed.
3. Element Name Case Sensitivity: XML is case-sensitive. This means that
elements with different cases (e.g., "book" and "Book") are treated as distinct
elements.
4. Reserved Names: Certain names are reserved and cannot be used as element
names because they have specific meanings in XML. For example, you cannot
use "xml" as an element name (e.g., <xml>).
5. Validity of Element Names: Element names must be valid XML names. This
means they cannot be XML keywords, cannot start with "xml" (case-
insensitive), and cannot contain colons (":"), which are reserved for
namespaces.
<book>
<author_name>
<ISBN_number>
<price_usd>
<element_123>
Here we have pointed out XML related technologies. There are following XML related
technologies:
7) XLink XML linking xlink stands for XML linking language. This is a
language language for creating hyperlinks (external and internal
links) in XML documents.
XML Attributes
XML elements can have attributes. By the use of attributes we can add the information about
the element.
Let us take an example of a book publisher. Here, book is the element and publisher is the
attribute.
Or
1. <book>
2. <book category="computer">
3. <BOOK >
4. <CATEGORY>Computer</category>
5. <CATEGORY>Science</category>
6.
7. <author> A & B </author>
8. </book>
Data can be stored in attributes or in child elements. But there are some limitations in using
attributes, over child elements.
o Attributes are not easily expandable. If you want to change in attribute's vales in
future, it may be complicated.
o Attributes cannot describe structure but child elements can.
o Attributes values are not easy to test against a DTD, which is used to define the legal
elements of an XML document.
Difference between attribute and sub-element
In the context of documents, attributes are part of markup, while sub elements are part of the
basic document contents.
In the context of data representation, the difference is unclear and may be confusing.
1st way:
2nd way:
1. <book>
2. <publisher> Tata McGraw Hill </publisher>
3. </book>
In the first example publisher is used as an attribute and in the second example publisher is an
element.
Both examples provide the same information but it is good practice to avoid attribute in XML
and use elements instead of attributes.
XML Comments
XML comments are just like HTML comments. We know that the comments are used to
make codes more understandable other developers.
XML Comments add notes or lines for understanding the purpose of an XML code. Although
XML is known as self-describing data but sometimes XML comments are necessary.
Syntax
o You can use a comment anywhere in XML document except within attribute value.
A tree structure contains root element (as parent), child element and so on. It is very easy to
traverse all succeeding branches and sub-branches and leaf nodes starting from the root.
Example of an XML document
1. <?xml version="1.0"?>
2. <college>
3. <student>
4. <firstname>Tamanna</firstname>
5. <lastname>Bhatia</lastname>
6. <contact>09990449935</contact>
7. <email>[email protected]</email>
8. <address>
9. <city>Ghaziabad</city>
10. <state>Uttar Pradesh</state>
11. <pin>201007</pin>
12. </address>
13. </student>
14. </college>
XML – DOM
XML DOM (Document Object Model) is a programming interface that represents the
structure of an XML document as a tree-like object, allowing developers to manipulate and
navigate XML documents using programming languages. It provides a platform-independent,
language-neutral way to access and interact with XML documents dynamically.
The XML DOM exposes the XML document's contents and structure as a set of
interconnected objects, where each node in the tree corresponds to an element, attribute, or
text content in the XML document. This tree-like representation is also known as a "node
tree" or "DOM tree."
Key features and functionalities of XML DOM include:
1. Parsing XML: XML DOM allows developers to parse XML documents, converting
them into a structured tree of nodes that can be easily manipulated and accessed.
2. Node Types: The DOM tree consists of different types of nodes, including elements,
attributes, text, comments, and processing instructions. Each node type is represented
by a specific DOM interface.
3. Traversal: Developers can traverse the DOM tree, moving between nodes, accessing
parent, child, and sibling nodes, and navigating the entire structure.
4. Node Creation and Modification: XML DOM enables the creation of new elements,
attributes, and text nodes and the modification of existing nodes, allowing developers
to update XML documents dynamically.
5. Search and Query: DOM provides methods to search for specific elements or
attributes based on their names, values, or positions within the tree.
6. Validation: XML DOM can validate XML documents against XML Schema or DTD
(Document Type Definition) to ensure their conformity with predefined rules.
7. Platform and Language Independence: XML DOM is available in many programming
languages, including Java, JavaScript, Python, C#, PHP, and more. It is implemented
as a set of APIs that can be used across different platforms.
Here's a simple example of how XML DOM can be used in JavaScript to access and modify
XML data:
<bookstore>
<book>
<title>Sample Book</title>
<author>John Doe</author>
</book>
</bookstore>
XML Validation
A well-formed XML document can be validated against DTD or Schema.
A well-formed XML document is an XML document with correct syntax. It is very necessary
to know about valid XML document before knowing XML validation.
In simple words we can say that a DTD defines the document structure with a list of legal
elements and attributes.
Actually DTD and XML schema both are used to form a well formed XML document.
We should avoid errors in XML documents because they will stop the XML programs.
XML schema
It is defined as an XML language
It supports a large number of built in data types and definition of derived data types
XML DTD
What is DTD
DTD stands for Document Type Definition. It defines the legal building blocks of an XML
document. It is used to define document structure with a list of legal elements and attributes.
Purpose of DTD
Its main purpose is to define the structure of an XML document. It contains a list of legal
elements and define the structure with the help of them.
Checking Validation
Before proceeding with XML DTD, you must check the validation. An XML document is
called "well-formed" if it contains the correct syntax.
A well-formed and valid XML document is one which have been validated against DTD.
Let's take an example of well-formed and valid XML document. It follows all the rules of
DTD.
employee.xml
1. <?xml version="1.0"?>
2. <!DOCTYPE employee SYSTEM "employee.dtd">
3. <employee>
4. <firstname>vimal</firstname>
5. <lastname>jaiswal</lastname>
6. <email>[email protected]</email>
7. </employee>
In the above example, the DOCTYPE declaration refers to an external DTD file. The content
of the file is shown in below paragraph.
employee.dtd
OUTPUT:-
This XML file does not appear to have any style information associated with it. The
document tree is shown below.
<employee>
<firstname>vimal</firstname>
<lastname>jaiswal</lastname>
<email>[email protected]</email>
</employee>
Description of DTD
<!DOCTYPE employee : It defines that the root element of the document is employee.
<!ELEMENT lastname: It defines that the lastname element is #PCDATA typed. (parse-
able data type).
<!ELEMENT email: It defines that the email element is #PCDATA typed. (parse-able data
type).
A doctype declaration can also define special strings that can be used in the XML file.
1. An ampersand (&)
2. An entity name
3. A semicolon (;)
author.xml
OUTPUT:-
This XML file does not appear to have any style information associated with it. The
document tree is shown below.
<author>Sonoo Jaiswal</author>
In the above example, sj is an entity that is used inside the author element. In such case, it
will print the value of sj entity that is "Sonoo Jaiswal".
XML CSS
CSS (Cascading Style Sheets) can be used to add style and display information to an XML
document. It can format the whole XML document.
To link XML files with CSS, you should use the following syntax:
cssemployee.css
1. employee
2. {
3. background-color: pink;
4. }
5. firstname,lastname,email
6. {
7. font-size:25px;
8. display:block;
9. color: blue;
10. margin-left: 50px;
11. }
employee.dtd
employee.xml
1. <?xml version="1.0"?>
2. <?xml-stylesheet type="text/css" href="cssemployee.css"?>
3. <!DOCTYPE employee SYSTEM "employee.dtd">
4. <employee>
5. <firstname>vimal</firstname>
6. <lastname>jaiswal</lastname>
7. <email>[email protected]</email>
8. </employee>
output
CSS is not generally used to format XML file. W3C recommends XSLT instead of CSS.
XML Schema
XML schema is a language which is used for expressing constraint about XML documents.
There are so many schema languages which are used now a days for XSD (XML schema
definition).
An XML schema is used to define the structure of an XML document. It is like DTD but
provides more control on XML structure.
Checking Validation
employee.xsd
1. <?xml version="1.0"?>
2. <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
3. targetNamespace="http://www.javatpoint.com"
4. xmlns="http://www.javatpoint.com"
5. elementFormDefault="qualified">
6. <xs:element name="employee">
7. <xs:complexType>
8. <xs:sequence>
9. <xs:element name="firstname" type="xs:string"/>
10. <xs:element name="lastname" type="xs:string"/>
11. <xs:element name="email" type="xs:string"/>
12. </xs:sequence>
13. </xs:complexType>
14. </xs:element>
15. </xs:schema>
Let's see the xml file using XML schema or XSD file.
employee.xml
1. <?xml version="1.0"?>
2. <employee
3. xmlns="http://www.javatpoint.com"
4. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
5. xsi:schemaLocation="http://www.javatpoint.com employee.xsd">
6.
7. <firstname>vimal</firstname>
8. <lastname>jaiswal</lastname>
9. <email>[email protected]</email>
10. </employee>
DTD vs XSD
There are many differences between DTD (Document Type Definition) and XSD (XML
Schema Definition). In short, DTD provides less control on XML structure whereas XSD
(XML schema) provides more control.
No DTD XSD
.
1) DTD stands for Document Type XSD stands for XML Schema Definition.
Definition.
3) DTD doesn't support datatypes. XSD supports datatypes for elements and
attributes.
5) DTD doesn't define order for child XSD defines order for child elements.
elements.
7) DTD is not simple to learn. XSD is simple to learn because you don't need to
learn new language.
8) DTD provides less control on XML XSD provides more control on XML structure.
structure.
CDATA vs PCDATA
CDATA (Character Data) and PCDATA (Parsed Character Data) are two different types of
data that can be used in XML documents to represent character content. They are both used
to include textual data within XML elements, but they have different handling and parsing
rules.
1. CDATA (Character Data): CDATA sections are used to include blocks of text that
should be treated as character data and not be parsed as XML markup. The content
within a CDATA section is ignored by the XML parser, and special characters (such
as <, >, and &) are treated as literal text rather than XML markup. CDATA sections
are often used to include text that contains a lot of XML-reserved characters, avoiding
the need for escaping.
In this example, the content inside the CDATA section is treated as plain text, and the XML
parser will not attempt to interpret the <b> element or the & symbol.
2. PCDATA (Parsed Character Data): PCDATA refers to character data that is parsed
by the XML parser. Unlike CDATA, PCDATA is subject to XML parsing rules, and
special characters need to be escaped using character entities (e.g., < for <, >
for >, and & for &). PCDATA allows for structured text content within XML
elements, such as nested elements, attributes, and entity references.
Example of PCDATA:
In this example, the content within the <description> element is treated as PCDATA. The
XML parser interprets the escaped entities (<, >, and &) and processes the
content accordingly.
The choice between CDATA and PCDATA depends on the requirements of the XML data. If
the text content contains a lot of special characters or XML markup that you want to be
treated as plain text, CDATA is a better choice. However, if the text content is structured and
includes nested elements or attributes, using PCDATA with proper escaping is more
appropriate to maintain the XML's structural integrity.
Markup Delimiters
<studentwrgergregeggggggggtrhyyuuy>abc</
student>
Markup delimiters are special characters or sequences used in markup languages to enclose
or delimit elements, attributes, or other components within the markup. Markup delimiters
define the beginning and ending boundaries of different parts of the markup content. These
delimiters are essential for defining the structure and semantics of the markup language.
The most common markup delimiter is the angle bracket ("<" and ">"), which is used in
languages like HTML, XML, and SGML. Angle brackets enclose element names, attributes,
and other tags within the markup.
<title>Sample Book</title>
In XML, angle brackets ("<" and ">") are used to delimit element names. The opening tag
<book> marks the beginning of the "book" element, and the closing tag </book> marks the
end of the element.
Element markup refers to the process of creating and defining elements within a markup
language. It involves using special syntax and delimiters to specify the structure and content
of the elements, allowing for the representation of data and its semantics in a structured way.
In markup languages like HTML and XML, elements are the building blocks used to define
the structure and content of a document. Each element represents a specific piece of
information and is typically enclosed within start tags ("<element>") and end tags
("</element>"). The content and attributes of the element are specified between the start and
end tags.
Here, <p> is the start tag, indicating the beginning of the paragraph element, and </p> is the
end tag, indicating the end of the paragraph element. The text "This is a paragraph element."
is the content of the paragraph element.
<book>
<title>Sample Book</title>
<author>John Doe</author>
</book>
In this XML example, <book> is the start tag of the "book" element, and </book> is the end
tag. The content of the "book" element includes two nested elements, <title> and <author>,
each representing the title and author of the book, respectively.
Elements can also have attributes, which provide additional information about the element.
For example:
XML - Parsers
XML parser is a software library or a package that provides interface for client
applications to work with XML documents. It checks for proper format of the XML
document and may also validate the XML documents. Modern day browsers have
built-in XML parsers.
Following diagram shows how XML parser interacts with XML document −