0% found this document useful (0 votes)

31 views57 pages

XML Notes

Uploaded by

Rajan Sahota

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views57 pages

XML Notes

Uploaded by

Rajan Sahota

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 57

UNIT-1

XML BASICS

What is XML?

XML stands for eXtensible Markup Language. It is a popular markup language used to store
and transport structured data in a human-readable and machine-readable format. XML was
designed to be self-descriptive, which means it allows users to define their own tags,
elements, and document structure, making it highly flexible and customizable for various data
representation purposes.

The syntax of XML is based on a set of rules that define how elements should be structured.
Each XML document consists of a prologue, which includes the XML declaration, and the
document's root element. The content within the document is enclosed within tags, which
come in pairs: an opening tag and a closing tag. The opening tag contains the element name,
while the closing tag contains the same name but is preceded by a forward slash ("/").
For example:

<?xml version="1.0" encoding="UTF-8"?>

<bookstore>
<book>
<title>Web Development using PHP </title>
<author>Kirandeep Kaur</author>
<price>100.00</price>
<publication date>
</book>
<book>
</bookstore>

XML is widely used in various applications and industries, including web development (e.g.,
RSS feeds, configuration files), data exchange between different platforms and systems, as
well as in representing hierarchical data structures in databases and documents. XML has
been a foundational technology for web services like SOAP (Simple Object Access Protocol),
but newer technologies like JSON have become more popular for certain use cases due to
their simplicity and compactness.

Points to remember:-

o XML (eXtensible Markup Language) is a mark-up language.

o XML is designed to store and transport data.

o XML was released in late 90’s.

o XML was created to provide an easy to use and store self-describing data.

o XML became a W3C Recommendation on February 10, 1998.

o XML is not a replacement for HTML.

o XML is designed to be self-descriptive.

o XML is designed to carry data, not to display data.

o XML tags are not predefined. You must define your own tags.

o XML is platform independent and language independent.

Note: Self-describing data is the data that describes both its content and structure.

There are three important characteristics of XML that make it useful in a variety of
systems and solutions −
 XML is extensible − XML allows you to create your own self-descriptive tags,
or language, that suits your application.

 XML carries the data, does not present it − XML allows you to store the
data irrespective of how it will be presented.

 XML is a public standard − XML was developed by an organization called

the World Wide Web Consortium (W3C) and is available as an open standard.

What is Mark-up?

Mark-up refers to the practice of adding special annotations or tags to a text document to
provide additional information about the structure, formatting, or semantics of the content.
The purpose of mark-up is to instruct how the document should be displayed, processed, or
understood by various systems, applications, or users.

In mark-up languages, specific symbols or keywords (mark-up tags) are inserted within the
text to define the elements and their attributes. These tags are typically enclosed within angle
brackets ("<" and ">") and come in pairs: an opening tag and a closing tag. The opening tag
contains the element's name, and the closing tag includes the same name preceded by a
forward slash ("/"). The content that falls between the opening and closing tags is affected by
the markup's instructions.

Markup is commonly used in various contexts, including:

1. Document Structure: Mark-up languages like HTML (HyperText Markup

Language) are used to structure and format web pages. HTML tags define elements
like headings, paragraphs, lists, images, and links.

2. Data Representation: Mark-up languages like XML (eXtensible Markup Language)

are used to represent structured data in a machine-readable and human-readable
format. XML allows users to define their own tags to describe the data's structure and
meaning.
3. Text Formatting: Mark-up is often used to format text documents, such as in word
processing applications. For example, Markdown and LaTeX are markup languages
used for text formatting in plain text and academic publishing, respectively.

4. Programming Documentation: Mark-up is used in documenting code and software

libraries. Tools like Javadoc use markup tags to generate API documentation.

5. Rich Media: Mark-up is employed in defining rich media content, such as SVG
(Scalable Vector Graphics), which uses XML-based markup to describe vector
graphics.

6. Accessibility: Some mark-up languages allow the inclusion of accessibility

information, such as alt tags in HTML for providing text descriptions of images to
assist visually impaired users.

Mark-up languages play a crucial role in enabling the interoperability of data and content
across different platforms, devices, and applications. They provide a standardized way of
representing information and allow computers to interpret and process the data accurately.
Different markup languages cater to specific use cases, and the choice of markup language
depends on the requirements and the context in which it will be used.

History of XML
XML's history dates back to the late 1960s and early 1970s when the need for a standardized
way of representing and exchanging data across different systems and platforms emerged.
However, the real development of XML as we know it today began in the late 1990s. Here's a
brief history of XML:
1. SGML (Standard Generalized Markup Language): The roots of XML can be
traced back to SGML, a standard for defining markup languages. SGML was
introduced in the 1980s as a standard for defining the structure of documents with
markup tags. SGML allowed the definition of custom tags and was used in various
industries, including publishing and documentation.
2. HTML (Hypertext Markup Language): In the early 1990s, Tim Berners-Lee
developed HTML as a subset of SGML to create documents for the World Wide Web.
HTML provided a way to structure web pages using predefined tags, making it easier
to create and display content on the early web browsers.
3. The Need for a More Flexible Standard: As the web evolved, the limitations of
HTML became evident. There was a growing need for a more flexible and extensible
markup language that could represent a wide range of data and be easily parsed by
different systems. This led to the development of XML.
4. XML 1.0 Specification: In 1996, the World Wide Web Consortium (W3C) formed
the XML Working Group to develop a standard for XML. In February 1998, the first
official XML 1.0 specification was released, defining the syntax rules and guidelines
for creating XML documents. XML allowed users to create their own custom tags,
making it suitable for various data representation needs.
5. Adoption and Application: XML quickly gained popularity due to its versatility and
ease of use. It became the preferred format for data interchange and storage, with
applications in web services, configuration files, data exchange between applications,
and more.
6. XPath, XSLT, and Other XML Technologies: Over time, various XML-related
technologies were developed to complement XML's capabilities. XPath was
introduced as a query language for navigating XML documents, while XSLT enabled
the transformation of XML data into different formats. These technologies further
enhanced the usability and power of XML.
7. JSON's Emergence: Despite XML's widespread adoption, in the mid-2000s, a new
data interchange format called JSON (JavaScript Object Notation) gained popularity
due to its simplicity and compactness. JSON became the preferred format for certain
use cases, particularly in web APIs, due to its more straightforward syntax and smaller
data size compared to XML.

Despite the rise of JSON, XML continues to be used extensively, especially in domains that
require more complex data structures and where data self-description is critical. XML's rich
tooling and support for schema validation make it valuable in various industries, and it
remains an essential part of the web and data interchange technologies.
Origins of XML
The origins of XML (eXtensible Markup Language) can be traced back to the mid-1970s,
with the development of SGML (Standard Generalized Markup Language). SGML was the
first standardized markup language, introduced in the early 1980s, and it served as the
foundation for XML.
Here's a brief timeline of the key events leading to the development of XML:
1. SGML (Standard Generalized Markup Language):
 In the late 1960s and early 1970s, the need arose for a standardized way to
define the structure of documents to ensure interoperability and information
exchange across different systems.
 Charles F. Goldfarb, Ed Mosher, and Ray Lorie, working at IBM, started the
development of SGML in the mid-1970s.
 SGML was designed to be a meta-mark-up language, allowing users to define
their own document types (mark-up languages) through Document Type
Definitions (DTDs).
 It was standardized in 1986 as ISO 8879:1986, providing a formal
specification for representing the structure of documents using tags.
2. HTML (Hypertext Markup Language):
 In the early 1990s, the World Wide Web was born, and there was a need for a
markup language to structure web content.
 Tim Berners-Lee, a British computer scientist, developed HTML as a
simplified and practical application of SGML to create web pages and link
documents together.
 HTML allowed the use of predefined tags for structuring text, images, links,
and other elements, making it accessible to non-experts.
3. The Need for More Flexible Data Representation:
 As the web and internet technologies advanced, it became evident that HTML
had limitations in representing structured data beyond basic web content.
 There was a growing need for a more extensible and versatile markup
language that could represent various types of data and allow users to define
custom document structures.
4. XML's Development:
 In 1996, the World Wide Web Consortium (W3C) formed the XML Working
Group to develop a new markup language that addressed the limitations of
HTML and provided a standardized way to represent data.
 The XML 1.0 specification was released in February 1998, introducing XML
as a simplified and more flexible version of SGML.
 XML allowed users to define their own tags and document structures, making
it ideal for representing and exchanging a wide range of data types.
 Unlike SGML, XML was more focused on simplicity and ease of use, which
contributed to its widespread adoption.
5. XML's Adoption and Growth:
 XML quickly gained popularity due to its versatility and potential applications
in various domains, including data interchange, web services, configuration
files, and more.
 Over time, additional XML-related technologies were developed, such as
XPath, XSLT, and XML Schema, enhancing XML's capabilities and usability.
Today, XML remains an essential part of the web and various industries, particularly where
data self-description and structured data representation are critical. It has influenced the
development of other markup languages, including XHTML (an XML-based version of
HTML) and specific domain-specific XML languages used in various sectors.

Applications of XML
XML (eXtensible Markup Language) is a versatile markup language with a wide range of
applications in various industries and domains. Some of the key applications of XML
include:
1. Data Interchange and Integration: XML is commonly used for data interchange
and integration between different systems, applications, and platforms. It provides a
standardized and self-descriptive format for representing structured data, making it
easier to exchange information across different systems.
2. Web Services: XML serves as the backbone for many web services and APIs
(Application Programming Interfaces). Web services use XML to send and receive
data in a format that can be easily understood and processed by different
programming languages.
3. Configuration Files: Many software applications and systems use XML for
configuration files. These files allow users to customize settings, preferences, and
parameters without altering the application's code.
4. RSS Feeds: XML is commonly used for creating RSS (Really Simple Syndication)
feeds, which allow websites to publish regularly updated content in a standardized
format. RSS feeds enable users to subscribe to content updates from their favourite
websites.
5. Document Mark-up and Authoring: XML can be used for structuring and marking
up documents, allowing authors to define the document's hierarchical structure,
headings, paragraphs, lists, and other elements.
6. Database and Data Storage: XML is employed in databases and data storage
systems to represent and store structured data. It provides a flexible way to model
complex data structures and relationships.
7. Metadata and Semantics: XML can be used to define and express metadata and
semantic information about documents, web resources, and data elements. This helps
in enhancing the discoverability and understanding of content.
8. Industry-Specific Standards: Many industries have adopted XML-based standards
to facilitate data exchange and communication
9. Cross-Platform Compatibility: XML's platform-independent nature makes it ideal
for exchanging data between different operating systems, programming languages,
and devices.
10. Healthcare and Electronic Medical Records (EMR): XML is utilized in the
healthcare industry for creating standardized electronic medical records and
exchanging patient data securely between healthcare providers.
11. Publishing and Content Management: XML is widely used in publishing
workflows, content management systems, and digital publishing to ensure
consistency, reusability, and easy content transformation.
12. Geospatial Data: In GIS (Geographic Information Systems) and geospatial
applications, XML is used for representing and sharing geographic data in a
structured format.
Overall, XML's flexibility, self-descriptiveness, and human-readability make it an
excellent choice for various data representation and interchange scenarios. While newer
formats like JSON have gained popularity for specific use cases, XML continues to be a
fundamental technology in many industries due to its robustness and rich tooling support.

Features and Advantages of XML

XML (eXtensible Mark-up Language) offers several features and advantages that make it a powerful
and widely used mark-up language for data representation and interchange. Here are some key
features and advantages of XML:

1. Extensibility: The "X" in XML stands for "extensible," meaning users can define their own
tags and document structures to represent data in a way that suits their specific needs. This
flexibility allows XML to adapt to diverse data representation requirements.

2. Self-Descriptive: XML documents are self-descriptive, as they contain both the data and the
metadata defining the structure of the data. XML tags provide meaningful names for
elements, making it easier for humans and systems to understand the data's meaning and
relationships.

3. Platform-Independent: XML is a platform-independent language, meaning XML documents

can be exchanged and processed across different operating systems, programming
languages, and devices without compatibility issues.

4. Human-Readable: XML documents are designed to be easily readable by humans, thanks to

its text-based syntax. This feature enhances readability and simplifies debugging and manual
data editing tasks.

5. Structured Data Representation: XML allows data to be structured hierarchically using

nested elements and attributes. This makes it suitable for representing complex data
structures and relationships.

6. Data Validation: XML documents can be associated with XML Schema or Document Type
Definitions (DTDs) to define the rules and constraints that the data must adhere/follow to.
This validation ensures data consistency and correctness.

7. Data Transformation: XML can be transformed into other formats, such as HTML, using
technologies like XSLT (eXtensible Stylesheet Language Transformations). This feature is
valuable for presenting XML data in different ways for various applications.

8. Interoperability: XML enables seamless data exchange between different systems and
applications, interoperability and integration between distinct software solutions.

9. Standardization and Widespread Adoption: XML is a widely adopted standard, backed by

the World Wide Web Consortium (W3C), ensuring consistency in its implementation and
support across various platforms and tools.

10. Versioning Support: XML provides built-in support for versioning, allowing users to evolve
their data representation over time without breaking existing implementations.

11. Industry-Specific Standards: XML has been adopted in many industries to create domain-
specific standards for data exchange. This standardization facilitates efficient communication
and data sharing within specific domains.
12. Metadata Support: XML allows the inclusion of metadata within the document, providing
additional information about the content, its origin, and other relevant details.

Overall, XML's features and advantages make it a versatile and widely used language for
representing structured data in a human-readable and machine-readable format. While newer
formats like JSON have gained popularity for certain use cases, XML remains an essential technology
in various industries due to its robustness, tooling support, and ability to handle complex data
structures.

Disadvantages of XML
While XML (eXtensible Markup Language) offers several benefits, it also has some
disadvantages that should be considered when choosing it as a data representation format.
Here are some of the main disadvantages of XML:

1. Verbose Syntax: XML's syntax can be quite verbose, leading to larger file sizes
compared to more compact formats like JSON. This verbosity can impact data
transfer times and storage requirements, especially for large datasets.
2. Parsing Overhead: Parsing XML documents can be computationally more expensive
than parsing simpler formats like JSON. The need to process nested elements and
attributes can result in increased parsing overhead, affecting performance in resource-
constrained environments.
3. Complexity: XML's extensibility and flexibility come at the cost of increased
complexity. Defining complex document structures with nested elements and
attributes can become harder to manage, especially for users unfamiliar with XML.
4. Redundancy: XML documents can be verbose and include redundant information,
leading to increased data size and inefficiency. The use of opening and closing tags
for every element, even when the content is empty, contributes to this redundancy.
5. Lack of Native Data Types: XML does not have native data types, such as integers
or booleans, unlike some other data formats like JSON. As a result, all data in XML is
represented as strings, requiring additional parsing and conversions when using the
data in programming languages.
6. Less Compact than Binary Formats: XML is a text-based format, which means it
may not be as compact as binary formats for representing certain types of data. In
scenarios where data size and transfer speed are critical, binary formats may be more
efficient.
7. Limited Support for Metadata: While XML allows for metadata to be included in
documents, the support for standardized metadata formats is less prevalent compared
to some other data formats like JSON-LD (JSON for Linked Data) or RDF (Resource
Description Framework).
8. Parsing Errors Handling: Handling parsing errors in XML can be more challenging
than in simpler formats, as nested structures and complex document hierarchies can
lead to harder-to-diagnose issues when errors occur.
9. Processing Overhead: XML processing can require significant memory and
processing resources, especially for large documents or when working with XML
documents in real-time streaming scenarios.
10. Alternative Formats: The popularity of other data interchange formats like JSON
has grown significantly due to their simplicity and efficiency in certain use cases. As
a result, some developers and systems may prefer these alternatives over XML for
specific applications.

Despite these disadvantages, XML continues to be widely used in various domains,

especially when its self-descriptive nature and data structure flexibility are crucial for data
interchange and representation needs. However, for specific use cases where simplicity,
compactness, and efficiency are essential, developers may choose other formats like JSON,
Protocol Buffers, or MessagePack. The choice of data format depends on the specific
requirements and constraints of the application at hand.

Difference between HTML and XML

There are many differences between HTML and XML. These important differences are
given below:

HTML XML

1. It was written in 1993. It was released in 1996.

2. HTML stands for Hyper Text Markup XML stands for Extensible Markup
Language. Language.

3. HTML is static in nature. XML is dynamic in nature.

It was developed by Web Hypertext

It was developed by Worldwide Web
4. Application Technology Working
Consortium.
Group WHATWG.

It is neither termed as a presentation nor

5. It is termed as a presentation language.
a programming language.

XML provides a framework to define

6. HTML is a markup language.
markup languages.

7. HTML can ignore small errors. XML does not allow errors.

8. It has an extension of .html and .htm It has an extension of .xml

9. HTML is not Case sensitive. XML is Case sensitive.

10
HTML tags are predefined tags. XML tags are user-defined tags.
.

11 There are limited number of tags in

XML tags are extensible.
. HTML.

12
HTML does not preserve white spaces. White space can be preserved in XML.
.
13 HTML tags are used for displaying the XML tags are used for describing the
. data. data not for displaying.

14 In HTML, closing tags are not

In XML, closing tags are necessary.
. necessary.

15
HTML is used to display the data. XML is used to store data.
.

16 HTML does not carry data it just XML carries the data to and from the
. displays it. database.

17 In XML, the objects are expressed by

HTML offers native object support.
. conventions using attributes.

XML document size is relatively large as

18 HTML document size is relatively
the approach of formatting and the codes
. small.
both are lengthy.

An additional application is not DOM(Document Object Model) is

19
required for parsing of JavaScript code required for parsing JavaScript codes and
.
into the HTML document. mapping of text.

20 Some of the tools used for HTML are: Some of the tools used for XML are:
.
 Visual Studio Code  Oxygen XML
 Atom  XML Notepad
 Notepad++  Liquid Studio
 Sublime Text and many more.
and many more.
Components of XML with example

XML documents are composed of several components that define the structure and content of
the data being represented. The main components of an XML document are as follows:

1. Prologue: The prologue is an optional component that appears at the beginning of an

XML document. It typically contains the XML declaration, which specifies the XML
version and encoding used in the document.

Example:

<?xml version="1.0" encoding="UTF-8"?>

2. Element: An element is a fundamental building block of an XML document and

represents a distinct piece of data. Elements are enclosed within start tags and end
tags, and they can contain other elements, text content, or attributes.

Example:

<book>

<title>Sample Book</title>

</book>

3. Start Tag and End Tag: A start tag (also known as an opening tag) is used to begin
an element, and an end tag (also known as a closing tag) is used to close the element.
The content between the start and end tags represents the data or nested elements
associated with the element.

Example:

</book>

4. Attributes: Attributes provide additional information about an element and are

included within the start tag of the element. They consist of a name and a value,
separated by an equal sign ("="). An element can have multiple attributes.

Example:

<book category="fiction" lang="en">

Shaktimaan

</book>

5. Text Content: Text content is the data enclosed within an element. It can include
plain text, numbers, or any other character data.

Example:

<title>Sample Book</title>

6. Comments: Comments are used to include explanatory or informative notes within

an XML document. They are enclosed within .

Example: //jgjkhkjhgkhk

/*dsfsdf

7. CDATA Section: A CDATA section is used to include blocks of text that should be
treated as character data and not be parsed as XML markup.

Example:

<![CDATA[This is a CDATA section containing <tags> and special characters &.]]>

8. Whitespace: Whitespace refers to spaces, tabs, line breaks, and other non-visible
characters. In XML, whitespace is generally ignored, except within CDATA sections
or when specifically preserved through mechanisms like XML Schema.

Example:

<![CDATA[

<text>

This is some text

with whitespace and

line breaks.

</text>

These components come together to create well-formed XML documents

that adhere to the rules and syntax of XML. XML's self-descriptive nature,
along with its support for nesting and hierarchy, makes it a powerful tool
for representing structured data in various applications and industries.

Here's an example of an XML document that showcases all the

components explained earlier:

<?xml version="1.0" encoding="UTF-8"?>

<bookstore>

<book category="fiction" lang="en">
<title>Sample Book</title>
<author>John Doe</author>
<price>19.99</price>
</book>

<book category="non-fiction" lang="fr">

<title>French Non-Fiction Book</title>
<author>Jane Smith</author>
<price>15.50</price>
</book>

<magazine>
<title>Tech Today</title>
<issue>July 2023</issue>
<editorial>
<![CDATA[Check out the latest tech trends!]]>
</editorial>
</magazine>
</bookstore>
In this example, we have an XML document representing a bookstore. It includes the
following components:

1. Prologue: The prologue is the first line of the XML document, declaring the version
(1.0) and encoding (UTF-8) used in the document.

2. Comments: There are two comment sections in the XML, providing explanatory
notes to readers and developers.

3. Elements: The XML document contains elements like <bookstore>, <book>,

<title>, <author>, <price>, <description>, <magazine>, and <issue>, each
representing a distinct piece of data.

4. Attributes: The <book> and <magazine> elements have attributes category and
lang.

5. Text Content: The elements <title>, <author>, <price>, <issue>, and <editorial>
contain text content representing various data values.
6. CDATA Section: The <description> element contains a CDATA section, preserving
the text as character data, including special characters like &.

7. Whitespace: The whitespace within the <bookstore> and <magazine> elements is

ignored by default, but it helps improve human readability.

This example demonstrates how XML components can be combined to create a well-formed
and structured XML document, allowing for the representation of different data elements in a
self-descriptive and easily readable manner.

Anatomy of an XML Document

An XML (eXtensible Markup Language) document follows a specific structure known as the
"anatomy of an XML document." This structure defines the required components that make
up a valid XML file. The key components of an XML document are:

1. Prologue: The prologue is an optional component that appears at the beginning of an

XML document. It consists of the XML declaration, which provides information
about the XML version and encoding used in the document.
The first line of the document is known as the XML declaration. This tells a processing
application which version of XML you are using (the version indicator is mandatory) and
which character encoding you have used for the document.

If the XML declaration is omitted, a processor will make certain assumptions about your
document. In particular, it will expect it to be encoded in UTF-8, an encoding of the Unicode
character set. However, it is best to use the XML declaration wherever possible, both to avoid
confusion over the character encoding and to indicate to processors which version of XML
you're using.

Example:

<?xml version="1.0" encoding="UTF-8"?>

2. Root Element: The root element is the outermost element in the XML document. It
acts as the container for all other elements and serves as the starting point for the
document's hierarchical structure. There can be only one root element in an XML
document.

Example:

<root>

</root>

3. Elements and Attributes

The second line of the example begins an element, which has been named authors. The
contents of that element include everything between the right angle bracket (>)
in <authors> and the left angle bracket (<) in </authors>. The actual syntactic
constructs <authors> and </authors> are often referred to as the element start tag and end
tag, respectively. Do not confuse tags with elements! Note that elements may include other
elements, as well as text. An XML document must contain exactly one root element, which
contains all other content within the document. The name of the root element defines the type
of the XML document.
Elements that contain both text and other elements simultaneously are classified as mixed
content. The sample authors document uses elements named person to describe the authors
themselves. Each person element has an attribute named id. Unlike elements, attributes can
contain only textual content. Their values must be surrounded by quotes. Either single quotes
(') or double quotes (") may be used, as long as you use the same kind of closing quote as the
opening one.

Within XML documents, attributes are frequently used for metadata (i.e., "data about data"),
describing properties of the element's contents.

<animal name="dog" legs="4"/>

<animal legs="4" name="dog"/>

On the other hand, the information presented to an application by an XML processor upon
reading the following two lines will be different for each animal element, because the
ordering of elements is significant:

XML treats a set of attributes like a bunch of stuff in a bag ? there is no implicit ordering ?
while elements are treated like items on a list, where ordering matters.

4. Well-Formedness

An XML document that conforms to the rules of XML syntax is known as well-formed. At
its most basic level, well-formedness means that elements should be properly matched, and
all opened elements should be closed.

Table A-1 shows some XML documents that are not well-formed.

Table A-1. Examples of poorly formed XML documents

Document Reason it's not well-formed

The elements are not properly nested, because foo is closed while inside its child
<foo>
element bar.
<bar>

</foo>

</bar>

<foo>

<bar> The bar element was not closed before its parent, foo, was closed.

</foo>

<foo baz> The baz attribute has no value. While this is permissible in HTML (e.g., <table

</foo> border>), it is forbidden in XML.

<foo
The baz attribute value, 23, has no surrounding quotes. Unlike HTML, all
baz=23>
attribute values must be quoted in XML.
</foo>

5. Comments

As in HTML, it is possible to include comments within XML documents. XML comments

are intended to be read only by people. With HTML, developers have occasionally employed
comments to add application-specific functionality. For example, the server-side include
functionality of most web servers uses instructions embedded in HTML comments. XML
provides other means of indicating application processing instructions. Comments should not
be used for any purpose other than those for which they were intended.

The start of a comment is indicated with <!--, and the end of the comment is indicated with --
>. Any sequence of characters, aside from the string --, may appear within a comment.
Comments tend to be used more in XML documents intended for human consumption than
those intended for machine consumption. Comments aren't widely used in RSS.

6. Entity References

Another feature of XML that is occasionally useful when writing RSS documents is the
mechanism for escaping characters.
Because some characters have special significance in XML, there needs to be a way to
represent them. For example, in some cases the < symbol might really be intended to mean
"less than," rather than to signal the start of an element name. Clearly, just inserting the
character without any escaping mechanism would result in a poorly formed document,
because a processing application would assume you were starting another element. Another
instance of this problem is needing to include both double quotes and single quotes
simultaneously in an attribute's value. Here's an example that illustrates both these
difficulties:

<para>

I'd really like to use the < character

</para>

<note title="On the proper 'use' of the "character"/>

</badDoc>

XML avoids this problem by the use of the predefined entity reference. The word entity in the
context of XML simply means a unit of content. The term entity reference means just that, a
symbolic way of referring to a certain unit of content. XML predefines entities for the
following symbols: left angle bracket (<), right angle bracket (>), apostrophe ('), double quote
("), and ampersand (&).

An entity reference is introduced with an ampersand (&), which is followed by a name (using
the word "name" in its formal sense, as defined by the XML 1.0 specification), and
terminated with a semicolon (;).

Table A-2 shows how the five predefined entities can be used within an XML document.

Table A-2. Predefined entity references in XML 1.0

Literal character Entity reference

< <

< >
` '

" "

& &

Here's our problematic document, revised to use entity references:

<para>

I'd really like to use the < character

</para>

<note title="On the proper ' use ' of the "character"/>

</badDoc>

7. Character References

Character references allow you to denote a character by its numeric position in Unicode
character set (this position is known as its code point). Table A-3 contains a few examples
that illustrate the syntax.

Table A-3. Example character references

Actual character Character reference

1 0

A A

~ Ñ

® ®

8. Encoding is the process of converting unicode characters into their

equivalent binary representation. When the XML processor reads an XML
document, it encodes the document depending on the type of encoding.
Hence, we need to specify the type of encoding in the XML declaration.
Encoding Types
There are mainly two types of encoding −

 UTF-8
 UTF-16
UTF stands for UCS Transformation Format, and UCS itself means Universal
Character Set. The number 8 or 16 refers to the number of bits used to represent a
character. They are either 8(1 to 4 bytes) or 16(2 or 4 bytes). For the documents
without encoding information, UTF-8 is set by default.
AD

Syntax
Encoding type is included in the prolog section of the XML document. The syntax for
UTF-8 encoding is as follows −
<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
The syntax for UTF-16 encoding is as follows −
<?xml version = "1.0" encoding = "UTF-16" standalone = "no" ?>
Example
Following example shows the declaration of encoding −
<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
In the above example encoding="UTF-8", specifies that 8-bits are used to represent
the characters. To represent 16-bit characters, UTF-16 encoding can be used.
The XML files encoded with UTF-8 tend to be smaller in size than those encoded
with UTF-16 format.

9. Validity

In addition to well-formedness, XML 1.0 offers another level of verification, called validity.
To explain why validity is important, let's take a simple example. Imagine you invented a
simple XML format for your friends' telephone numbers:

<person>
<name>Albert Smith</name>

</person>

<name>Bertrand Jones</name>

</person>

</phonebook>

Based on your format, you also construct a program to display and search your phone
numbers. This program turns out to be so useful, you share it with your friends. However,
your friends aren't so accurate on detail as you are, and they try to feed your program this
phone book file:

<name>Melanie Green</name>

</person>

</phonebook>

Note that, although this file is perfectly well-formed, it doesn't fit the format you prescribed
for the phone book, and you find you need to change your program to cope with this
situation. If your friends had used number as you did to denote the phone number, and
not phone, there wouldn't have been a problem. However, as it is, this second file is not a
valid phonebook document.
10. XML Namespaces

XML 1.0 lets developers create their own elements and attributes, but it leaves open the
potential for overlapping names. "Title" in one context may mean something entirely
different than "Title" in a different context. The "Namespaces in XML" specification
provides a mechanism developers can use to identify particular vocabularies using Uniform
Resource Identifiers (URIs).

XML Example
XML documents create a hierarchical structure looks like a tree so it is known as XML Tree
that starts at "the root" and branches to "the leaves".

Example of Sample XML Document

XML documents uses a self-describing and simple syntax:

1. <?xml version="1.0" encoding="ISO-8859-1"?>

2. <note>
3. <to>Tove</to>
4. <from>Jani</from>
5. <heading>Reminder</heading>
6. <body>Don't forget me this weekend!</body>
7. </note>

The first line is the XML declaration. It defines the XML version (1.0) and the encoding used
(ISO-8859-1 = Latin-1/West European character set).

The next line describes the root element of the document (like saying: "this document is a
note"):

1. <note>

The next 4 lines describe 4 child elements of the root (to, from, heading, and body).
1. <to>Tove</to>
2. <from>Jani</from>
3. <heading>Reminder</heading>
4. <body>Don't forget me this weekend!</body>

And finally the last line defines the end of the root element.

1. </note>

XML documents must contain a root element. This element is "the parent" of all other
elements.

The elements in an XML document form a document tree. The tree starts at the root and
branches to the lowest level of the tree.

All elements can have sub elements (child elements).

1. <root>
2. <child>
3. <subchild>.....</subchild>
4. </child>
5. </root>

The terms parent, child, and sibling are used to describe the relationships between elements.
Parent elements have children. Children on the same level are called siblings (brothers or
sisters).

All elements can have text content and attributes (just like in HTML).

Another Example of XML: Books

File: books.xml

1. <bookstore>
2. <book category="COOKING">
3. <title lang="en">Everyday Italian</title>
4. <author>Giada De Laurentiis</author>
5. <year>2005</year>
6. <price>30.00</price>
7. </book>
8. <book category="CHILDREN">
9. <title lang="en">Harry Potter</title>
10. <author>J K. Rowling</author>
11. <year>2005</year>
12. <price>29.99</price>
13. </book>
14. <book category="WEB">
15. <title lang="en">Learning XML</title>
16. <author>Erik T. Ray</author>
17. <year>2003</year>
18. <price>39.95</price>
19. </book>
20. </bookstore>

The root element in the example is <bookstore>. All elements in the document are contained
within <bookstore>.

The <book> element has 4 children: <title>,< author>, <year> and <price>.

XML - Declaration
XML declaration contains details that prepare an XML processor to parse the XML
document. It is optional, but when used, it must appear in the first line of the XML document.

Syntax

Following syntax shows XML declaration −

<?xml
version = "version_number"
encoding = "encoding_declaration"
standalone = "standalone_status"
?>

Each parameter consists of a parameter name, an equals sign (=), and parameter value inside
a quote. Following table shows the above syntax in detail −

Parameter Parameter_value Parameter_description

Version 1.0 Specifies the version of the XML standard

used.

Encoding UTF-8, UTF-16, ISO- It defines the character encoding used in the
10646-UCS-2, ISO- document. UTF-8 is the default encoding used.
10646-UCS-4, ISO-8859-
1 to ISO-8859-9, ISO-
2022-JP, Shift_JIS, EUC-
JP

Standalone yes or no It informs the parser whether the document

relies on the information from an external
source, such as external document type
definition (DTD), for its content. The default
value is set to no. Setting it to yes tells the
processor there are no external declarations
required for parsing the document.

Rules
An XML declaration should abide with the following rules −

 If the XML declaration is present in the XML, it must be placed as the first line in the
XML document.
 If the XML declaration is included, it must contain version number attribute.
 The Parameter names and values are case-sensitive.
 The names are always in lower case.
 The order of placing the parameters is important. The correct order is: version,
encoding and standalone.
 Either single or double quotes may be used.
 The XML declaration has no closing tag i.e. </?xml>

XML Declaration Examples

Following are few examples of XML declarations −

XML declaration with no parameters −

<?xml >

XML declaration with version definition −

<?xml version = "1.0">

XML declaration with all parameters defined −

<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>

XML declaration with all parameters defined in single quotes −

<?xml version = '1.0' encoding = 'iso-8859-1' standalone = 'no' ?>

The Root Element in XML

In XML (eXtensible Markup Language), the root element is the outermost element in an XML
document. It acts as the container for all other elements in the document and is the starting point of
the document's hierarchical structure.

The root element is the only element in the XML document that is not nested inside any other
element. All other elements must be contained within the root element, either directly or indirectly
through other elements.

Here is an example of an XML document with the root element:

<?xml version="1.0" encoding="UTF-8"?>

</rootElement>
In this example, <rootElement> is the root element. All other elements in the
XML document will be nested within this root element. The root element gives
the XML document its structure and serves as the starting point for traversing
and accessing the data within the document. It defines the context for all the
data elements in the XML document.

Here's an example of an XML document with the root element:

<?xml version="1.0" encoding="UTF-8"?>

<book>

<title>Sample Book</title>

</book>

<book>

<title>Another Book</title>

<author>Jane Smith</author>

</book>

</library>

In this example, the root element is <library>. It is the outermost element and acts as the container
for all the other elements in the XML document. All the <book> elements are nested within the
<library> element.

What is an XML Element?

An XML element is everything from (including) the element's start tag to
(including) the element's end tag.

An element can contain:

 text
 attributes
 other elements
 or a mix of the above

<bookstore>
<book category="children">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>

In the example above:

<title>, <author>, <year>, and <price> have text content because they
contain text (like 29.99).

<bookstore> and <book> have element contents, because they contain

elements.

<book> has an attribute (category="children").

Empty XML Elements

In XML (eXtensible Markup Language), an empty element refers to an element that doesn't contain
any child elements or text content. An empty element is represented using a self-closing tag,
meaning it has an opening tag but no corresponding closing tag. Instead, the self-closing tag ends
with a forward slash ("/").

Empty elements are useful when representing data that doesn't require additional nested elements
or when defining attributes without any content. They are commonly used in XML documents to
indicate the presence of specific data points or properties without providing additional details.

Here are some examples of empty XML elements:

1. Empty Element without Attributes:

2. Empty Element with Attributes:

<book ISBN="123456789" />
3. Empty Element within a Parent Element:
<library>
<book ISBN="987654321" />
<book ISBN="543210987" />
</library>

In the first example, <emptyElement /> is a standalone empty element. It doesn't contain
any child elements or text content.
In the second example, <book ISBN="123456789" /> is an empty element representing a
book with an ISBN attribute. It doesn't have any nested elements or text content but
includes an attribute named "ISBN" with the value "123456789."
In the third example, the <book> element is used as an empty element within a parent
element <library>. This structure allows multiple empty <book> elements to be included
under the <library> element, each representing a different book with its own set of
attributes.
Empty elements are a convenient way to represent simple data points or attributes in XML
without the need for additional nested elements or content. They help maintain a clear and
concise representation of data, especially when certain elements only require minimal
information.

XML Naming Rules

In XML (eXtensible Markup Language), elements are fundamental building blocks
used to represent data. When naming elements in XML documents, certain rules must
be followed to ensure valid and well-formed XML. Here are the naming rules for
XML elements:

1. Element Name Start Character: The first character of an element name must be
a letter (A-Z or a-z) or an underscore ("_"). It cannot start with a number or any
other special character.
2. Element Name Characters: After the first character, the element name can
include letters, numbers, underscores, hyphens, and periods. Special characters
like spaces, commas, and other punctuation marks are not allowed.
3. Element Name Case Sensitivity: XML is case-sensitive. This means that
elements with different cases (e.g., "book" and "Book") are treated as distinct
elements.
4. Reserved Names: Certain names are reserved and cannot be used as element
names because they have specific meanings in XML. For example, you cannot
use "xml" as an element name (e.g., <xml>).
5. Validity of Element Names: Element names must be valid XML names. This
means they cannot be XML keywords, cannot start with "xml" (case-
insensitive), and cannot contain colons (":"), which are reserved for
namespaces.

Examples of valid XML element names:

Examples of invalid XML element names:

<123element>

XML Related Technologies

Here we have pointed out XML related technologies. There are following XML related
technologies:

No. Technology Meaning Description

1) XHTML Extensible It is a clearer and stricter version of XML. It belongs

html to the family of XML markup languages. It was
developed to make html more extensible and increase
inter-operability with other data.

2) XML DOM XML It is a standard document model that is used to access

document and manipulate XML. It defines the XML file in tree
object model structure.
3) XSL Extensible
it contain three style sheet i) It transforms XML into other formats, like
parts: language html.
i) XSLT (xsl ii) It is used for formatting XML to screen,
transform) paper etc.
ii) XSL iii) It is a language to navigate XML
iii)XPath documents.

4) XQuery XML query It is a XML based language which is used to query

language XML based data.

5) DTD Document It is an standard which is used to define the legal

type elements in an XML document.
definition

6) XSD XML schema It is an XML based alternative to dtd. It is used to

definition describe the structure of an XML document.

7) XLink XML linking xlink stands for XML linking language. This is a
language language for creating hyperlinks (external and internal
links) in XML documents.

8) XPointer XML pointer It is a system for addressing components of XML

language based internet media. It allows the xlink hyperlinks to
point to more specific parts in the XML document.

XML Attributes
XML elements can have attributes. By the use of attributes we can add the information about
the element.

XML attributes enhance the properties of the elements.

Note: XML attributes must always be quoted. We can use single or double quote.

Let us take an example of a book publisher. Here, book is the element and publisher is the
attribute.

1. <book publisher="Tata McGraw Hill"></book>

1. <book publisher='Tata McGraw Hill'></book>

Metadata should be stored as attribute and data should be stored as element.

1. <book>
2. <book category="computer">
3. <BOOK >
4. <CATEGORY>Computer</category>
5. <CATEGORY>Science</category>
6.
7. <author> A & B </author>
8. </book>

Data can be stored in attributes or in child elements. But there are some limitations in using
attributes, over child elements.

Why should we avoid XML attributes

o Attributes cannot contain multiple values but child elements can have multiple values.

o Attributes cannot contain tree structure but child element can.

o Attributes are not easily expandable. If you want to change in attribute's vales in
future, it may be complicated.
o Attributes cannot describe structure but child elements can.

o Attributes are more difficult to be manipulated by program code.

o Attributes values are not easy to test against a DTD, which is used to define the legal
elements of an XML document.
Difference between attribute and sub-element

In the context of documents, attributes are part of markup, while sub elements are part of the
basic document contents.

In the context of data representation, the difference is unclear and may be confusing.

Same information can be represented in two ways:

1st way:

1. <book publisher="Tata McGraw Hill"> </book>

2nd way:

1. <book>
2. <publisher> Tata McGraw Hill </publisher>
3. </book>

In the first example publisher is used as an attribute and in the second example publisher is an
element.

Both examples provide the same information but it is good practice to avoid attribute in XML
and use elements instead of attributes.

XML Comments

XML comments are just like HTML comments. We know that the comments are used to
make codes more understandable other developers.

XML Comments add notes or lines for understanding the purpose of an XML code. Although
XML is known as self-describing data but sometimes XML comments are necessary.

Syntax

An XML comment should be written as:

1. 
You cannot nest one XML comment inside the another.
XML Comments Example

Let's take an example to show the use of comment in an XML example:

1. <?xml version="1.0" encoding="UTF-8" ?>

2. 
3. <students>
4. <student>
5. <name>Ratan</name>
6. <marks>70</marks>
7. </student>
8. <student>
9. <name>Aryan</name>
10. <marks>60</marks>
11. </student>
12. </students>

Rules for adding XML comments

o Don't use a comment before an XML declaration.

o You can use a comment anywhere in XML document except within attribute value.

o Don't nest a comment inside the other comment.

XML Tree Structure

An XML document has a self-descriptive structure. It forms a tree structure which is referred
as an XML tree. The tree structure makes easy to describe an XML document.

A tree structure contains root element (as parent), child element and so on. It is very easy to
traverse all succeeding branches and sub-branches and leaf nodes starting from the root.
Example of an XML document
1. <?xml version="1.0"?>
2. <college>
3. <student>
4. <firstname>Tamanna</firstname>
5. <lastname>Bhatia</lastname>
6. <contact>09990449935</contact>
7. <email>[email protected]</email>
8. <address>
9. <city>Ghaziabad</city>
10. <state>Uttar Pradesh</state>
11. <pin>201007</pin>
12. </address>
13. </student>
14. </college>

Let's see the tree-structure representation of the above example.

In the above example, first line is the XML declaration. It defines the XML version 1.0. Next
line shows the root element (college) of the document. Inside that there is one more element
(student). Student element contains five branches named <firstname>, <lastname>,
<contact>, <Email> and <address>.

<address> branch contains 3 sub-branches named <city>, <state> and <pin>.

Note: DOM parser represents the XML document in Tree structure.

XML – DOM
XML DOM (Document Object Model) is a programming interface that represents the
structure of an XML document as a tree-like object, allowing developers to manipulate and
navigate XML documents using programming languages. It provides a platform-independent,
language-neutral way to access and interact with XML documents dynamically.
The XML DOM exposes the XML document's contents and structure as a set of
interconnected objects, where each node in the tree corresponds to an element, attribute, or
text content in the XML document. This tree-like representation is also known as a "node
tree" or "DOM tree."
Key features and functionalities of XML DOM include:

1. Parsing XML: XML DOM allows developers to parse XML documents, converting
them into a structured tree of nodes that can be easily manipulated and accessed.
2. Node Types: The DOM tree consists of different types of nodes, including elements,
attributes, text, comments, and processing instructions. Each node type is represented
by a specific DOM interface.
3. Traversal: Developers can traverse the DOM tree, moving between nodes, accessing
parent, child, and sibling nodes, and navigating the entire structure.
4. Node Creation and Modification: XML DOM enables the creation of new elements,
attributes, and text nodes and the modification of existing nodes, allowing developers
to update XML documents dynamically.
5. Search and Query: DOM provides methods to search for specific elements or
attributes based on their names, values, or positions within the tree.
6. Validation: XML DOM can validate XML documents against XML Schema or DTD
(Document Type Definition) to ensure their conformity with predefined rules.
7. Platform and Language Independence: XML DOM is available in many programming
languages, including Java, JavaScript, Python, C#, PHP, and more. It is implemented
as a set of APIs that can be used across different platforms.

Here's a simple example of how XML DOM can be used in JavaScript to access and modify
XML data:

<?xml version="1.0" encoding="UTF-8"?>

<title>Sample Book</title>

</book>
</bookstore>

// JavaScript code to access XML using DOM

var xmlDoc = new DOMParser().parseFromString(xmlString, 'text/xml');

var titleNode = xmlDoc.querySelector('title');

console.log(titleNode.textContent); // Output: "Sample Book"

// Modify the title

titleNode.textContent = "Updated Book Title";

console.log(titleNode.textContent); // Output: "Updated Book Title"

XML Validation
A well-formed XML document can be validated against DTD or Schema.

A well-formed XML document is an XML document with correct syntax. It is very necessary
to know about valid XML document before knowing XML validation.

Valid XML document

It must be well formed (satisfy all the basic syntax condition)

It should be behave according to predefined DTD or XML schema

Rules for well-formed XML
o It must begin with the XML declaration.

o It must have one unique root element.

o All start tags of XML documents must match end tags.

o XML tags are case sensitive.

o All elements must be closed.

o All elements must be properly nested.

o All attributes values must be quoted.

o XML entities must be used for special characters.

XML DTD

A DTD defines the legal elements of an XML document

In simple words we can say that a DTD defines the document structure with a list of legal
elements and attributes.

XML schema is a XML based alternative to DTD.

Actually DTD and XML schema both are used to form a well formed XML document.

We should avoid errors in XML documents because they will stop the XML programs.

XML schema
It is defined as an XML language

Uses namespaces to allow for reuses of existing definitions

It supports a large number of built in data types and definition of derived data types

XML DTD
What is DTD

DTD stands for Document Type Definition. It defines the legal building blocks of an XML
document. It is used to define document structure with a list of legal elements and attributes.

Purpose of DTD

Its main purpose is to define the structure of an XML document. It contains a list of legal
elements and define the structure with the help of them.
Checking Validation

Before proceeding with XML DTD, you must check the validation. An XML document is
called "well-formed" if it contains the correct syntax.

A well-formed and valid XML document is one which have been validated against DTD.

Valid and well-formed XML document with DTD

Let's take an example of well-formed and valid XML document. It follows all the rules of
DTD.

employee.xml

1. <?xml version="1.0"?>
2. <!DOCTYPE employee SYSTEM "employee.dtd">
3. <employee>
4. <firstname>vimal</firstname>
5. <lastname>jaiswal</lastname>
6. <email>[email protected]</email>
7. </employee>

In the above example, the DOCTYPE declaration refers to an external DTD file. The content
of the file is shown in below paragraph.

employee.dtd

1. <!ELEMENT employee (firstname,lastname,email)>

2. <!ELEMENT firstname (#PCDATA)>
3. <!ELEMENT lastname (#PCDATA)>
4. <!ELEMENT email (#PCDATA)>

OUTPUT:-

This XML file does not appear to have any style information associated with it. The
document tree is shown below.

<employee>
<firstname>vimal</firstname>
<lastname>jaiswal</lastname>
<email>[email protected]</email>
</employee>

Description of DTD

<!DOCTYPE employee : It defines that the root element of the document is employee.

<!ELEMENT employee: It defines that the employee element contains 3 elements

"firstname, lastname and email".
<!ELEMENT firstname: It defines that the firstname element is #PCDATA typed. (parse-
able data type).

<!ELEMENT lastname: It defines that the lastname element is #PCDATA typed. (parse-
able data type).

<!ELEMENT email: It defines that the email element is #PCDATA typed. (parse-able data
type).

XML DTD with entity declaration

A doctype declaration can also define special strings that can be used in the XML file.

An entity has three parts:

1. An ampersand (&)
2. An entity name
3. A semicolon (;)

Syntax to declare entity:

1. <!ENTITY entity-name "entity-value">

Let's see a code to define the ENTITY in doctype declaration.

author.xml

1. <?xml version="1.0" standalone="yes" ?>

2. <!DOCTYPE author [
3. <!ELEMENT author (#PCDATA)>
4. <!ENTITY sj "Sonoo Jaiswal">
5. ]>
6. <author>&sj;</author>

OUTPUT:-
This XML file does not appear to have any style information associated with it. The
document tree is shown below.

<author>Sonoo Jaiswal</author>

In the above example, sj is an entity that is used inside the author element. In such case, it
will print the value of sj entity that is "Sonoo Jaiswal".

Note: A single DTD can be used in many XML files.

XML CSS

Purpose of CSS in XML

CSS (Cascading Style Sheets) can be used to add style and display information to an XML
document. It can format the whole XML document.

How to link XML file with CSS

To link XML files with CSS, you should use the following syntax:

1. <?xml-stylesheet type="text/css" href="cssemployee.css"?>

XML CSS Example

Let's see the css file.

cssemployee.css

1. employee
2. {
3. background-color: pink;
4. }
5. firstname,lastname,email
6. {
7. font-size:25px;
8. display:block;
9. color: blue;
10. margin-left: 50px;
11. }

Let's create the DTD file.

employee.dtd

1. <!ELEMENT employee (firstname,lastname,email)>

2. <!ELEMENT firstname (#PCDATA)>
3. <!ELEMENT lastname (#PCDATA)>
4. <!ELEMENT email (#PCDATA)>

Let's see the xml file using CSS and DTD.

employee.xml

1. <?xml version="1.0"?>
2. <?xml-stylesheet type="text/css" href="cssemployee.css"?>
3. <!DOCTYPE employee SYSTEM "employee.dtd">
4. <employee>
5. <firstname>vimal</firstname>
6. <lastname>jaiswal</lastname>
7. <email>[email protected]</email>
8. </employee>

output

vimal jaiswal [email protected]

CSS is not generally used to format XML file. W3C recommends XSLT instead of CSS.
XML Schema

What is XML schema

XML schema is a language which is used for expressing constraint about XML documents.
There are so many schema languages which are used now a days for XSD (XML schema
definition).

An XML schema is used to define the structure of an XML document. It is like DTD but
provides more control on XML structure.

Checking Validation

An XML document is called "well-formed" if it contains the correct syntax. A well-formed

and valid XML document is one which have been validated against Schema.

Visit http://www.xmlvalidation.com to validate the XML file against schema or DTD.

XML Schema Example

Let's create a schema file.

employee.xsd

1. <?xml version="1.0"?>
2. <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
3. targetNamespace="http://www.javatpoint.com"
4. xmlns="http://www.javatpoint.com"
5. elementFormDefault="qualified">
6. <xs:element name="employee">
7. <xs:complexType>
8. <xs:sequence>
9. <xs:element name="firstname" type="xs:string"/>
10. <xs:element name="lastname" type="xs:string"/>
11. <xs:element name="email" type="xs:string"/>
12. </xs:sequence>
13. </xs:complexType>
14. </xs:element>
15. </xs:schema>

Let's see the xml file using XML schema or XSD file.

employee.xml

1. <?xml version="1.0"?>
2. <employee
3. xmlns="http://www.javatpoint.com"
4. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
5. xsi:schemaLocation="http://www.javatpoint.com employee.xsd">
6.
7. <firstname>vimal</firstname>
8. <lastname>jaiswal</lastname>
9. <email>[email protected]</email>
10. </employee>

Description of XML Schema

<xs:element name="employee"> : It defines the element name employee.

<xs:complexType> : It defines that the element 'employee' is complex type.

<xs:sequence> : It defines that the complex type is a sequence of elements.

<xs:element name="firstname" type="xs:string"/> : It defines that the element 'firstname'
is of string/text type.

<xs:element name="lastname" type="xs:string"/> : It defines that the element 'lastname'

is of string/text type.

<xs:element name="email" type="xs:string"/> : It defines that the element 'email' is of

string/text type.

DTD vs XSD

There are many differences between DTD (Document Type Definition) and XSD (XML
Schema Definition). In short, DTD provides less control on XML structure whereas XSD
(XML schema) provides more control.

The important differences are given below:

No DTD XSD
.

1) DTD stands for Document Type XSD stands for XML Schema Definition.
Definition.

2) DTDs are derived XSDs are written in XML.

from SGML syntax.

3) DTD doesn't support datatypes. XSD supports datatypes for elements and
attributes.

4) DTD doesn't support namespace. XSD supports namespace.

5) DTD doesn't define order for child XSD defines order for child elements.
elements.

6) DTD is not extensible. XSD is extensible.

7) DTD is not simple to learn. XSD is simple to learn because you don't need to
learn new language.

8) DTD provides less control on XML XSD provides more control on XML structure.
structure.

CDATA vs PCDATA

CDATA (Character Data) and PCDATA (Parsed Character Data) are two different types of
data that can be used in XML documents to represent character content. They are both used
to include textual data within XML elements, but they have different handling and parsing
rules.

1. CDATA (Character Data): CDATA sections are used to include blocks of text that
should be treated as character data and not be parsed as XML markup. The content
within a CDATA section is ignored by the XML parser, and special characters (such
as <, >, and &) are treated as literal text rather than XML markup. CDATA sections
are often used to include text that contains a lot of XML-reserved characters, avoiding
the need for escaping.

Example of a CDATA section:

<description><![CDATA[This is a bold statement & more!]]></description>

In this example, the content inside the CDATA section is treated as plain text, and the XML
parser will not attempt to interpret the element or the & symbol.

2. PCDATA (Parsed Character Data): PCDATA refers to character data that is parsed
by the XML parser. Unlike CDATA, PCDATA is subject to XML parsing rules, and
special characters need to be escaped using character entities (e.g., < for <, >
for >, and & for &). PCDATA allows for structured text content within XML
elements, such as nested elements, attributes, and entity references.

Example of PCDATA:

<description>This is a bold statement & more!</description>

This is a < > < statement & more!

In this example, the content within the <description> element is treated as PCDATA. The
XML parser interprets the escaped entities (<, >, and &) and processes the
content accordingly.

The choice between CDATA and PCDATA depends on the requirements of the XML data. If
the text content contains a lot of special characters or XML markup that you want to be
treated as plain text, CDATA is a better choice. However, if the text content is structured and
includes nested elements or attributes, using PCDATA with proper escaping is more
appropriate to maintain the XML's structural integrity.

Markup Delimiters

Markup delimiters are special characters or sequences used in markup languages to enclose
or delimit elements, attributes, or other components within the markup. Markup delimiters
define the beginning and ending boundaries of different parts of the markup content. These
delimiters are essential for defining the structure and semantics of the markup language.

The most common markup delimiter is the angle bracket ("<" and ">"), which is used in
languages like HTML, XML, and SGML. Angle brackets enclose element names, attributes,
and other tags within the markup.

Here is example of markup delimiters in XML

1. XML Element Delimiters:

<book> ... </book>

<title>Sample Book</title>
In XML, angle brackets ("<" and ">") are used to delimit element names. The opening tag
<book> marks the beginning of the "book" element, and the closing tag </book> marks the
end of the element.

2. XML Attribute Delimiters:

<book category="fiction123456789" lang="en"> ... </book>
In XML, attribute values are delimited using double quotes ("") or single quotes ('')
after the attribute name and an equal sign (=).

Element Markup and Attribute Markup

Element markup refers to the process of creating and defining elements within a markup
language. It involves using special syntax and delimiters to specify the structure and content
of the elements, allowing for the representation of data and its semantics in a structured way.

In markup languages like HTML and XML, elements are the building blocks used to define
the structure and content of a document. Each element represents a specific piece of
information and is typically enclosed within start tags ("<element>") and end tags
("</element>"). The content and attributes of the element are specified between the start and
end tags.

For example, in HTML, an element can be used to represent a paragraph as follows:

This is a paragraph element.

Here, is the start tag, indicating the beginning of the paragraph element, and is the
end tag, indicating the end of the paragraph element. The text "This is a paragraph element."
is the content of the paragraph element.

In XML, elements can be used to represent structured data. For example:

<book>

<title>Sample Book</title>

</book>
In this XML example, <book> is the start tag of the "book" element, and </book> is the end
tag. The content of the "book" element includes two nested elements, <title> and <author>,
each representing the title and author of the book, respectively.

Elements can also have attributes, which provide additional information about the element.
For example:

<img src="image.jpg" alt="Image">

XML - Parsers
XML parser is a software library or a package that provides interface for client
applications to work with XML documents. It checks for proper format of the XML
document and may also validate the XML documents. Modern day browsers have
built-in XML parsers.
Following diagram shows how XML parser interacts with XML document −

The goal of a parser is to transform XML into a readable code.

XML parsers are software components or libraries that read XML documents and
interpret their structure, allowing developers to access and manipulate the data
within the documents programmatically. XML parsers are essential for processing
XML data in various programming languages and environments. They facilitate the
extraction and handling of XML data, making it easier to work with structured
information.
There are two main types of XML parsers:
1. DOM (Document Object Model) Parsers: DOM parsers construct an in-
memory tree-like representation of the XML document, known as the DOM
tree. The tree structure allows easy navigation and manipulation of the XML
data using standard programming interfaces. Developers can traverse the
tree, access nodes (elements, attributes, text), add or modify nodes, and save
the modified XML back to a file. DOM parsers load the entire XML document
into memory, making them suitable for relatively small XML files.
2. SAX (Simple API for XML) Parsers: SAX parsers, on the other hand, work
differently. Instead of building an in-memory tree, they process the XML
document sequentially as a stream. As the parser reads the document, it
sends events to an application or event handler. Developers can then handle
these events to extract the necessary data from the XML. SAX parsers do not
load the entire XML into memory, making them suitable for large XML
documents or in situations where memory resources are limited.
The choice between DOM and SAX parsers depends on the specific requirements of
the XML processing task:
 DOM Parsers are best suited for tasks that involve frequent navigation and
manipulation of XML data. They provide a comprehensive view of the XML
document, making it easy to work with the entire structure. However, they can
be memory-intensive for large XML files.
 SAX Parsers are ideal for scenarios where memory efficiency and
performance are critical. Since SAX parsers process XML documents
sequentially, they can handle large files more efficiently. However, they are
less convenient for complex XML data manipulations compared to DOM
parsers.
These XML parsers simplify XML data processing, enabling developers to work with
XML documents efficiently and effectively.

XML Realtime Examples
0% (1)
XML Realtime Examples
67 pages
XML for Developers and Tech Enthusiasts
No ratings yet
XML for Developers and Tech Enthusiasts
67 pages
Why Do We Need XML?
No ratings yet
Why Do We Need XML?
13 pages
XML: Revolutionizing Data Exchange
No ratings yet
XML: Revolutionizing Data Exchange
7 pages
XML Basics and Applications Guide
No ratings yet
XML Basics and Applications Guide
52 pages
What You Should Already Know: Home Page
No ratings yet
What You Should Already Know: Home Page
18 pages
Why Is XML So Important?
No ratings yet
Why Is XML So Important?
53 pages
XML Extensible Markup Language: ISAS Presented by - Swaminathan - PL - Saranya.N
No ratings yet
XML Extensible Markup Language: ISAS Presented by - Swaminathan - PL - Saranya.N
24 pages
XML Basic
No ratings yet
XML Basic
18 pages
Understanding XML: Key Features & Uses
No ratings yet
Understanding XML: Key Features & Uses
27 pages
XML Interview Questions
No ratings yet
XML Interview Questions
52 pages
Overview of HTML and XML
No ratings yet
Overview of HTML and XML
22 pages
Unit 2
No ratings yet
Unit 2
296 pages
Unit Ii-Xml
No ratings yet
Unit Ii-Xml
41 pages
Web Design
No ratings yet
Web Design
11 pages
XML Interview Guide
No ratings yet
XML Interview Guide
41 pages
Mam Epay ITPE4 (Integrative Programming and Technologies 2)
No ratings yet
Mam Epay ITPE4 (Integrative Programming and Technologies 2)
15 pages
Ex M L: What Exactly Is A Markup Language?
No ratings yet
Ex M L: What Exactly Is A Markup Language?
15 pages
XML Applications & HTML Comparison
No ratings yet
XML Applications & HTML Comparison
14 pages
XML 2
No ratings yet
XML 2
38 pages
Web IV Unit Notes
No ratings yet
Web IV Unit Notes
56 pages
DSS01
No ratings yet
DSS01
118 pages
Web Technology (CSC-353) : (Unit 3: XML)
No ratings yet
Web Technology (CSC-353) : (Unit 3: XML)
50 pages
XML Guide for Developers
No ratings yet
XML Guide for Developers
8 pages
UNIT 5 Part 01
No ratings yet
UNIT 5 Part 01
24 pages
Unit 5 1
No ratings yet
Unit 5 1
24 pages
Unit 5
No ratings yet
Unit 5
18 pages
Unit 3
No ratings yet
Unit 3
50 pages
4020 Week 3
No ratings yet
4020 Week 3
75 pages
300+ Top Web Technology Lab Viva Questions and Answers PDF
No ratings yet
300+ Top Web Technology Lab Viva Questions and Answers PDF
47 pages
1 (Cdata)
No ratings yet
1 (Cdata)
1 page
XML
No ratings yet
XML
3 pages
LM Unit-1
No ratings yet
LM Unit-1
9 pages
Introduction To XML and Its Applications
No ratings yet
Introduction To XML and Its Applications
32 pages
XML and Applications
No ratings yet
XML and Applications
39 pages
Introduction To XML: The Two Problems
No ratings yet
Introduction To XML: The Two Problems
39 pages
XML Karox
No ratings yet
XML Karox
73 pages
XML Basics
No ratings yet
XML Basics
13 pages
XML Basics and Importance Explained
No ratings yet
XML Basics and Importance Explained
13 pages
XML Overview
No ratings yet
XML Overview
2 pages
XML Midterm
No ratings yet
XML Midterm
39 pages
Automation - ch05
No ratings yet
Automation - ch05
35 pages
XML (Extensible Markup Language)
No ratings yet
XML (Extensible Markup Language)
4 pages
XML Quick Guide
No ratings yet
XML Quick Guide
30 pages
Pec-Cs801d
No ratings yet
Pec-Cs801d
15 pages
Extensible Markup Language (XML)
No ratings yet
Extensible Markup Language (XML)
20 pages
Chapter 5 - XML
No ratings yet
Chapter 5 - XML
14 pages
What You Should Already Know: Home Page
No ratings yet
What You Should Already Know: Home Page
56 pages
What Is XML and Its Applications &characterisrics of XML
No ratings yet
What Is XML and Its Applications &characterisrics of XML
4 pages
What About XML
No ratings yet
What About XML
83 pages
Module 2 PDF
No ratings yet
Module 2 PDF
25 pages
Xmlbasics
100% (1)
Xmlbasics
669 pages
XML Fundamentals for Web Developers
No ratings yet
XML Fundamentals for Web Developers
6 pages
Sgmlandxml 200806091332
No ratings yet
Sgmlandxml 200806091332
12 pages
Dom PDF
No ratings yet
Dom PDF
212 pages
XML (BScCSIT 5th Semester)
No ratings yet
XML (BScCSIT 5th Semester)
39 pages
Unit-1 XML
No ratings yet
Unit-1 XML
9 pages
UNIT 1 Introduction To XML: 1 Prepare By: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
No ratings yet
UNIT 1 Introduction To XML: 1 Prepare By: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
28 pages
XML: A Guide for Developers
No ratings yet
XML: A Guide for Developers
17 pages
XML DTD
No ratings yet
XML DTD
12 pages
TCP3151 Integrative Programming and Technologies Assignment
No ratings yet
TCP3151 Integrative Programming and Technologies Assignment
3 pages
CS106 - XML Nesting Validation
No ratings yet
CS106 - XML Nesting Validation
4 pages
Unit-2 XML
No ratings yet
Unit-2 XML
13 pages
Unit 5
No ratings yet
Unit 5
19 pages
XML
No ratings yet
XML
40 pages
Catalogue & Price List: Technical Books
100% (1)
Catalogue & Price List: Technical Books
10 pages
Vtlib - Vtiger Development Library
No ratings yet
Vtlib - Vtiger Development Library
62 pages
08 Advanced Custom Tags
No ratings yet
08 Advanced Custom Tags
30 pages
XML DTD & Schema Guide
No ratings yet
XML DTD & Schema Guide
200 pages
# Lecture-21 Document Type Definition: Internal DTD Declaration
No ratings yet
# Lecture-21 Document Type Definition: Internal DTD Declaration
8 pages
Service Oriented Architecture
0% (1)
Service Oriented Architecture
126 pages
MCSatellite
No ratings yet
MCSatellite
751 pages
Computer Applications in Mechanical Engineering: S.B. Roll No
No ratings yet
Computer Applications in Mechanical Engineering: S.B. Roll No
1 page
Introduction To DTP
No ratings yet
Introduction To DTP
21 pages
Computer Applications in Mechanical Engg
No ratings yet
Computer Applications in Mechanical Engg
1 page
RoboticsCustomizedUIManual (031 060)
No ratings yet
RoboticsCustomizedUIManual (031 060)
30 pages
S.B. Roll No............................................
No ratings yet
S.B. Roll No............................................
1 page
Mech Engg Exam: Comp Apps Guide
No ratings yet
Mech Engg Exam: Comp Apps Guide
1 page
Computer Programming Exam Guide
No ratings yet
Computer Programming Exam Guide
1 page
S.B. Roll No............................................
No ratings yet
S.B. Roll No............................................
1 page
Computer Programming and Applications 3 Exam /Elect/EEE/0526/0352/6904/Nov'17 Duration: 3Hrs. M.Marks:75 Section-A Q1. Fill in The Blanks. 15x1 15
No ratings yet
Computer Programming and Applications 3 Exam /Elect/EEE/0526/0352/6904/Nov'17 Duration: 3Hrs. M.Marks:75 Section-A Q1. Fill in The Blanks. 15x1 15
1 page
Computer Programming and Applications
No ratings yet
Computer Programming and Applications
1 page
Computer Programming and Applications 3 Exam/Electrical/EEE/0526/6904/May'17 Duration: 3 Hrs. M.Marks:75 Section - A Q.1 Fill in The Blanks: 1x15 15
No ratings yet
Computer Programming and Applications 3 Exam/Electrical/EEE/0526/6904/May'17 Duration: 3 Hrs. M.Marks:75 Section - A Q.1 Fill in The Blanks: 1x15 15
1 page
Lecture 1 XML Introduction
No ratings yet
Lecture 1 XML Introduction
64 pages
Representing Web Data: XML: Web Technologies A Computer Science Perspective
No ratings yet
Representing Web Data: XML: Web Technologies A Computer Science Perspective
32 pages
Understanding HTTP and Web Technologies
No ratings yet
Understanding HTTP and Web Technologies
7 pages
Iot-Notes PDF
No ratings yet
Iot-Notes PDF
97 pages
L03 XML Basics - PDFXML
No ratings yet
L03 XML Basics - PDFXML
32 pages
Accessing Relational Data Using Microsoft
No ratings yet
Accessing Relational Data Using Microsoft
30 pages
HTML Scripting Guide
No ratings yet
HTML Scripting Guide
12 pages
Java Applets for Educators
No ratings yet
Java Applets for Educators
14 pages
Indian Institute of Management, Ahmedabad: New Arrivals (Books)
No ratings yet
Indian Institute of Management, Ahmedabad: New Arrivals (Books)
14 pages
Donnay XML
No ratings yet
Donnay XML
20 pages
Java Basics for Beginners
No ratings yet
Java Basics for Beginners
29 pages
IoT Insights for Tech Developers
No ratings yet
IoT Insights for Tech Developers
26 pages
Module 2 - XML
No ratings yet
Module 2 - XML
68 pages
Unit 2 - XML
No ratings yet
Unit 2 - XML
48 pages
OOP2 (Part 7) - Structured Data
No ratings yet
OOP2 (Part 7) - Structured Data
38 pages
Beginning JSP 2-From Novice To Professional
No ratings yet
Beginning JSP 2-From Novice To Professional
39 pages
Wtloral
No ratings yet
Wtloral
91 pages
Module1 - Web Programming Fundamentals
No ratings yet
Module1 - Web Programming Fundamentals
33 pages

XML Notes

Uploaded by

XML Notes

Uploaded by

UNIT-1

<?xml version="1.0" encoding="UTF-8"?>

o XML (eXtensible Markup Language) is a mark-up language.

o XML is designed to store and transport data.

o XML was released in late 90’s.

o XML became a W3C Recommendation on February 10, 1998.

o XML is not a replacement for HTML.

o XML is designed to be self-descriptive.

o XML is designed to carry data, not to display data.

o XML is platform independent and language independent.

 XML is a public standard − XML was developed by an organization called

Markup is commonly used in various contexts, including:

1. Document Structure: Mark-up languages like HTML (HyperText Markup

2. Data Representation: Mark-up languages like XML (eXtensible Markup Language)

4. Programming Documentation: Mark-up is used in documenting code and software

6. Accessibility: Some mark-up languages allow the inclusion of accessibility

Features and Advantages of XML

3. Platform-Independent: XML is a platform-independent language, meaning XML documents

4. Human-Readable: XML documents are designed to be easily readable by humans, thanks to

5. Structured Data Representation: XML allows data to be structured hierarchically using

9. Standardization and Widespread Adoption: XML is a widely adopted standard, backed by

Despite these disadvantages, XML continues to be widely used in various domains,

Difference between HTML and XML

1. It was written in 1993. It was released in 1996.

3. HTML is static in nature. XML is dynamic in nature.

It was developed by Web Hypertext

It is neither termed as a presentation nor

XML provides a framework to define

8. It has an extension of .html and .htm It has an extension of .xml

9. HTML is not Case sensitive. XML is Case sensitive.

11 There are limited number of tags in

14 In HTML, closing tags are not

17 In XML, the objects are expressed by

XML document size is relatively large as

An additional application is not DOM(Document Object Model) is

1. Prologue: The prologue is an optional component that appears at the beginning of an

<?xml version="1.0" encoding="UTF-8"?>

2. Element: An element is a fundamental building block of an XML document and

4. Attributes: Attributes provide additional information about an element and are

<book category="fiction" lang="en">

6. Comments: Comments are used to include explanatory or informative notes within

<![CDATA[This is a CDATA section containing <tags> and special characters &.]]>

This is some text

with whitespace and

These components come together to create well-formed XML documents

Here's an example of an XML document that showcases all the

<?xml version="1.0" encoding="UTF-8"?>

<book category="non-fiction" lang="fr">

3. Elements: The XML document contains elements like <bookstore>, <book>,

7. Whitespace: The whitespace within the <bookstore> and <magazine> elements is

Anatomy of an XML Document

1. Prologue: The prologue is an optional component that appears at the beginning of an

<?xml version="1.0" encoding="UTF-8"?>

3. Elements and Attributes

<animal name="dog" legs="4"/>

<animal legs="4" name="dog"/>

Table A-1. Examples of poorly formed XML documents

Document Reason it's not well-formed

</foo> border>), it is forbidden in XML.

As in HTML, it is possible to include comments within XML documents. XML comments

I'd really like to use the < character

<note title="On the proper 'use' of the "character"/>

Table A-2. Predefined entity references in XML 1.0

Literal character Entity reference

Here's our problematic document, revised to use entity references:

I'd really like to use the &lt; character

<note title="On the proper &apos; use &apos; of the &quot;character"/>

Table A-3. Example character references

Actual character Character reference

8. Encoding is the process of converting unicode characters into their

Example of Sample XML Document

XML documents uses a self-describing and simple syntax:

I'd really like to use the < character

<note title="On the proper ' use ' of the "character"/>