Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
31 views57 pages

XML Notes

XML Notes

Uploaded by

Rajan Sahota
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views57 pages

XML Notes

XML Notes

Uploaded by

Rajan Sahota
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 57

UNIT-1

XML BASICS

What is XML?

XML stands for eXtensible Markup Language. It is a popular markup language used to store
and transport structured data in a human-readable and machine-readable format. XML was
designed to be self-descriptive, which means it allows users to define their own tags,
elements, and document structure, making it highly flexible and customizable for various data
representation purposes.

The syntax of XML is based on a set of rules that define how elements should be structured.
Each XML document consists of a prologue, which includes the XML declaration, and the
document's root element. The content within the document is enclosed within tags, which
come in pairs: an opening tag and a closing tag. The opening tag contains the element name,
while the closing tag contains the same name but is preceded by a forward slash ("/").
For example:

<?xml version="1.0" encoding="UTF-8"?>


<bookstore>
<book>
<title>Web Development using PHP </title>
<author>Kirandeep Kaur</author>
<price>100.00</price>
<publication date>
</book>
<book>
</bookstore>

XML is widely used in various applications and industries, including web development (e.g.,
RSS feeds, configuration files), data exchange between different platforms and systems, as
well as in representing hierarchical data structures in databases and documents. XML has
been a foundational technology for web services like SOAP (Simple Object Access Protocol),
but newer technologies like JSON have become more popular for certain use cases due to
their simplicity and compactness.

Points to remember:-

o XML (eXtensible Markup Language) is a mark-up language.

o XML is designed to store and transport data.

o XML was released in late 90’s.

o XML was created to provide an easy to use and store self-describing data.

o XML became a W3C Recommendation on February 10, 1998.

o XML is not a replacement for HTML.

o XML is designed to be self-descriptive.

o XML is designed to carry data, not to display data.

o XML tags are not predefined. You must define your own tags.

o XML is platform independent and language independent.


Note: Self-describing data is the data that describes both its content and structure.

There are three important characteristics of XML that make it useful in a variety of
systems and solutions −
 XML is extensible − XML allows you to create your own self-descriptive tags,
or language, that suits your application.

 XML carries the data, does not present it − XML allows you to store the
data irrespective of how it will be presented.

 XML is a public standard − XML was developed by an organization called


the World Wide Web Consortium (W3C) and is available as an open standard.

What is Mark-up?

Mark-up refers to the practice of adding special annotations or tags to a text document to
provide additional information about the structure, formatting, or semantics of the content.
The purpose of mark-up is to instruct how the document should be displayed, processed, or
understood by various systems, applications, or users.

In mark-up languages, specific symbols or keywords (mark-up tags) are inserted within the
text to define the elements and their attributes. These tags are typically enclosed within angle
brackets ("<" and ">") and come in pairs: an opening tag and a closing tag. The opening tag
contains the element's name, and the closing tag includes the same name preceded by a
forward slash ("/"). The content that falls between the opening and closing tags is affected by
the markup's instructions.

Markup is commonly used in various contexts, including:

1. Document Structure: Mark-up languages like HTML (HyperText Markup


Language) are used to structure and format web pages. HTML tags define elements
like headings, paragraphs, lists, images, and links.

2. Data Representation: Mark-up languages like XML (eXtensible Markup Language)


are used to represent structured data in a machine-readable and human-readable
format. XML allows users to define their own tags to describe the data's structure and
meaning.
3. Text Formatting: Mark-up is often used to format text documents, such as in word
processing applications. For example, Markdown and LaTeX are markup languages
used for text formatting in plain text and academic publishing, respectively.

4. Programming Documentation: Mark-up is used in documenting code and software


libraries. Tools like Javadoc use markup tags to generate API documentation.

5. Rich Media: Mark-up is employed in defining rich media content, such as SVG
(Scalable Vector Graphics), which uses XML-based markup to describe vector
graphics.

6. Accessibility: Some mark-up languages allow the inclusion of accessibility


information, such as alt tags in HTML for providing text descriptions of images to
assist visually impaired users.

Mark-up languages play a crucial role in enabling the interoperability of data and content
across different platforms, devices, and applications. They provide a standardized way of
representing information and allow computers to interpret and process the data accurately.
Different markup languages cater to specific use cases, and the choice of markup language
depends on the requirements and the context in which it will be used.

History of XML
XML's history dates back to the late 1960s and early 1970s when the need for a standardized
way of representing and exchanging data across different systems and platforms emerged.
However, the real development of XML as we know it today began in the late 1990s. Here's a
brief history of XML:
1. SGML (Standard Generalized Markup Language): The roots of XML can be
traced back to SGML, a standard for defining markup languages. SGML was
introduced in the 1980s as a standard for defining the structure of documents with
markup tags. SGML allowed the definition of custom tags and was used in various
industries, including publishing and documentation.
2. HTML (Hypertext Markup Language): In the early 1990s, Tim Berners-Lee
developed HTML as a subset of SGML to create documents for the World Wide Web.
HTML provided a way to structure web pages using predefined tags, making it easier
to create and display content on the early web browsers.
3. The Need for a More Flexible Standard: As the web evolved, the limitations of
HTML became evident. There was a growing need for a more flexible and extensible
markup language that could represent a wide range of data and be easily parsed by
different systems. This led to the development of XML.
4. XML 1.0 Specification: In 1996, the World Wide Web Consortium (W3C) formed
the XML Working Group to develop a standard for XML. In February 1998, the first
official XML 1.0 specification was released, defining the syntax rules and guidelines
for creating XML documents. XML allowed users to create their own custom tags,
making it suitable for various data representation needs.
5. Adoption and Application: XML quickly gained popularity due to its versatility and
ease of use. It became the preferred format for data interchange and storage, with
applications in web services, configuration files, data exchange between applications,
and more.
6. XPath, XSLT, and Other XML Technologies: Over time, various XML-related
technologies were developed to complement XML's capabilities. XPath was
introduced as a query language for navigating XML documents, while XSLT enabled
the transformation of XML data into different formats. These technologies further
enhanced the usability and power of XML.
7. JSON's Emergence: Despite XML's widespread adoption, in the mid-2000s, a new
data interchange format called JSON (JavaScript Object Notation) gained popularity
due to its simplicity and compactness. JSON became the preferred format for certain
use cases, particularly in web APIs, due to its more straightforward syntax and smaller
data size compared to XML.

Despite the rise of JSON, XML continues to be used extensively, especially in domains that
require more complex data structures and where data self-description is critical. XML's rich
tooling and support for schema validation make it valuable in various industries, and it
remains an essential part of the web and data interchange technologies.
Origins of XML
The origins of XML (eXtensible Markup Language) can be traced back to the mid-1970s,
with the development of SGML (Standard Generalized Markup Language). SGML was the
first standardized markup language, introduced in the early 1980s, and it served as the
foundation for XML.
Here's a brief timeline of the key events leading to the development of XML:
1. SGML (Standard Generalized Markup Language):
 In the late 1960s and early 1970s, the need arose for a standardized way to
define the structure of documents to ensure interoperability and information
exchange across different systems.
 Charles F. Goldfarb, Ed Mosher, and Ray Lorie, working at IBM, started the
development of SGML in the mid-1970s.
 SGML was designed to be a meta-mark-up language, allowing users to define
their own document types (mark-up languages) through Document Type
Definitions (DTDs).
 It was standardized in 1986 as ISO 8879:1986, providing a formal
specification for representing the structure of documents using tags.
2. HTML (Hypertext Markup Language):
 In the early 1990s, the World Wide Web was born, and there was a need for a
markup language to structure web content.
 Tim Berners-Lee, a British computer scientist, developed HTML as a
simplified and practical application of SGML to create web pages and link
documents together.
 HTML allowed the use of predefined tags for structuring text, images, links,
and other elements, making it accessible to non-experts.
3. The Need for More Flexible Data Representation:
 As the web and internet technologies advanced, it became evident that HTML
had limitations in representing structured data beyond basic web content.
 There was a growing need for a more extensible and versatile markup
language that could represent various types of data and allow users to define
custom document structures.
4. XML's Development:
 In 1996, the World Wide Web Consortium (W3C) formed the XML Working
Group to develop a new markup language that addressed the limitations of
HTML and provided a standardized way to represent data.
 The XML 1.0 specification was released in February 1998, introducing XML
as a simplified and more flexible version of SGML.
 XML allowed users to define their own tags and document structures, making
it ideal for representing and exchanging a wide range of data types.
 Unlike SGML, XML was more focused on simplicity and ease of use, which
contributed to its widespread adoption.
5. XML's Adoption and Growth:
 XML quickly gained popularity due to its versatility and potential applications
in various domains, including data interchange, web services, configuration
files, and more.
 Over time, additional XML-related technologies were developed, such as
XPath, XSLT, and XML Schema, enhancing XML's capabilities and usability.
Today, XML remains an essential part of the web and various industries, particularly where
data self-description and structured data representation are critical. It has influenced the
development of other markup languages, including XHTML (an XML-based version of
HTML) and specific domain-specific XML languages used in various sectors.

Applications of XML
XML (eXtensible Markup Language) is a versatile markup language with a wide range of
applications in various industries and domains. Some of the key applications of XML
include:
1. Data Interchange and Integration: XML is commonly used for data interchange
and integration between different systems, applications, and platforms. It provides a
standardized and self-descriptive format for representing structured data, making it
easier to exchange information across different systems.
2. Web Services: XML serves as the backbone for many web services and APIs
(Application Programming Interfaces). Web services use XML to send and receive
data in a format that can be easily understood and processed by different
programming languages.
3. Configuration Files: Many software applications and systems use XML for
configuration files. These files allow users to customize settings, preferences, and
parameters without altering the application's code.
4. RSS Feeds: XML is commonly used for creating RSS (Really Simple Syndication)
feeds, which allow websites to publish regularly updated content in a standardized
format. RSS feeds enable users to subscribe to content updates from their favourite
websites.
5. Document Mark-up and Authoring: XML can be used for structuring and marking
up documents, allowing authors to define the document's hierarchical structure,
headings, paragraphs, lists, and other elements.
6. Database and Data Storage: XML is employed in databases and data storage
systems to represent and store structured data. It provides a flexible way to model
complex data structures and relationships.
7. Metadata and Semantics: XML can be used to define and express metadata and
semantic information about documents, web resources, and data elements. This helps
in enhancing the discoverability and understanding of content.
8. Industry-Specific Standards: Many industries have adopted XML-based standards
to facilitate data exchange and communication
9. Cross-Platform Compatibility: XML's platform-independent nature makes it ideal
for exchanging data between different operating systems, programming languages,
and devices.
10. Healthcare and Electronic Medical Records (EMR): XML is utilized in the
healthcare industry for creating standardized electronic medical records and
exchanging patient data securely between healthcare providers.
11. Publishing and Content Management: XML is widely used in publishing
workflows, content management systems, and digital publishing to ensure
consistency, reusability, and easy content transformation.
12. Geospatial Data: In GIS (Geographic Information Systems) and geospatial
applications, XML is used for representing and sharing geographic data in a
structured format.
Overall, XML's flexibility, self-descriptiveness, and human-readability make it an
excellent choice for various data representation and interchange scenarios. While newer
formats like JSON have gained popularity for specific use cases, XML continues to be a
fundamental technology in many industries due to its robustness and rich tooling support.

Features and Advantages of XML


XML (eXtensible Mark-up Language) offers several features and advantages that make it a powerful
and widely used mark-up language for data representation and interchange. Here are some key
features and advantages of XML:

1. Extensibility: The "X" in XML stands for "extensible," meaning users can define their own
tags and document structures to represent data in a way that suits their specific needs. This
flexibility allows XML to adapt to diverse data representation requirements.

2. Self-Descriptive: XML documents are self-descriptive, as they contain both the data and the
metadata defining the structure of the data. XML tags provide meaningful names for
elements, making it easier for humans and systems to understand the data's meaning and
relationships.

3. Platform-Independent: XML is a platform-independent language, meaning XML documents


can be exchanged and processed across different operating systems, programming
languages, and devices without compatibility issues.

4. Human-Readable: XML documents are designed to be easily readable by humans, thanks to


its text-based syntax. This feature enhances readability and simplifies debugging and manual
data editing tasks.

5. Structured Data Representation: XML allows data to be structured hierarchically using


nested elements and attributes. This makes it suitable for representing complex data
structures and relationships.

6. Data Validation: XML documents can be associated with XML Schema or Document Type
Definitions (DTDs) to define the rules and constraints that the data must adhere/follow to.
This validation ensures data consistency and correctness.

7. Data Transformation: XML can be transformed into other formats, such as HTML, using
technologies like XSLT (eXtensible Stylesheet Language Transformations). This feature is
valuable for presenting XML data in different ways for various applications.

8. Interoperability: XML enables seamless data exchange between different systems and
applications, interoperability and integration between distinct software solutions.

9. Standardization and Widespread Adoption: XML is a widely adopted standard, backed by


the World Wide Web Consortium (W3C), ensuring consistency in its implementation and
support across various platforms and tools.

10. Versioning Support: XML provides built-in support for versioning, allowing users to evolve
their data representation over time without breaking existing implementations.

11. Industry-Specific Standards: XML has been adopted in many industries to create domain-
specific standards for data exchange. This standardization facilitates efficient communication
and data sharing within specific domains.
12. Metadata Support: XML allows the inclusion of metadata within the document, providing
additional information about the content, its origin, and other relevant details.

Overall, XML's features and advantages make it a versatile and widely used language for
representing structured data in a human-readable and machine-readable format. While newer
formats like JSON have gained popularity for certain use cases, XML remains an essential technology
in various industries due to its robustness, tooling support, and ability to handle complex data
structures.

Disadvantages of XML
While XML (eXtensible Markup Language) offers several benefits, it also has some
disadvantages that should be considered when choosing it as a data representation format.
Here are some of the main disadvantages of XML:

1. Verbose Syntax: XML's syntax can be quite verbose, leading to larger file sizes
compared to more compact formats like JSON. This verbosity can impact data
transfer times and storage requirements, especially for large datasets.
2. Parsing Overhead: Parsing XML documents can be computationally more expensive
than parsing simpler formats like JSON. The need to process nested elements and
attributes can result in increased parsing overhead, affecting performance in resource-
constrained environments.
3. Complexity: XML's extensibility and flexibility come at the cost of increased
complexity. Defining complex document structures with nested elements and
attributes can become harder to manage, especially for users unfamiliar with XML.
4. Redundancy: XML documents can be verbose and include redundant information,
leading to increased data size and inefficiency. The use of opening and closing tags
for every element, even when the content is empty, contributes to this redundancy.
5. Lack of Native Data Types: XML does not have native data types, such as integers
or booleans, unlike some other data formats like JSON. As a result, all data in XML is
represented as strings, requiring additional parsing and conversions when using the
data in programming languages.
6. Less Compact than Binary Formats: XML is a text-based format, which means it
may not be as compact as binary formats for representing certain types of data. In
scenarios where data size and transfer speed are critical, binary formats may be more
efficient.
7. Limited Support for Metadata: While XML allows for metadata to be included in
documents, the support for standardized metadata formats is less prevalent compared
to some other data formats like JSON-LD (JSON for Linked Data) or RDF (Resource
Description Framework).
8. Parsing Errors Handling: Handling parsing errors in XML can be more challenging
than in simpler formats, as nested structures and complex document hierarchies can
lead to harder-to-diagnose issues when errors occur.
9. Processing Overhead: XML processing can require significant memory and
processing resources, especially for large documents or when working with XML
documents in real-time streaming scenarios.
10. Alternative Formats: The popularity of other data interchange formats like JSON
has grown significantly due to their simplicity and efficiency in certain use cases. As
a result, some developers and systems may prefer these alternatives over XML for
specific applications.

Despite these disadvantages, XML continues to be widely used in various domains,


especially when its self-descriptive nature and data structure flexibility are crucial for data
interchange and representation needs. However, for specific use cases where simplicity,
compactness, and efficiency are essential, developers may choose other formats like JSON,
Protocol Buffers, or MessagePack. The choice of data format depends on the specific
requirements and constraints of the application at hand.

Difference between HTML and XML


There are many differences between HTML and XML. These important differences are
given below:

HTML XML

1. It was written in 1993. It was released in 1996.

2. HTML stands for Hyper Text Markup XML stands for Extensible Markup
Language. Language.

3. HTML is static in nature. XML is dynamic in nature.

It was developed by Web Hypertext


It was developed by Worldwide Web
4. Application Technology Working
Consortium.
Group WHATWG.

It is neither termed as a presentation nor


5. It is termed as a presentation language.
a programming language.

XML provides a framework to define


6. HTML is a markup language.
markup languages.

7. HTML can ignore small errors. XML does not allow errors.

8. It has an extension of .html and .htm It has an extension of .xml

9. HTML is not Case sensitive. XML is Case sensitive.

10
HTML tags are predefined tags. XML tags are user-defined tags.
.

11 There are limited number of tags in


XML tags are extensible.
. HTML.

12
HTML does not preserve white spaces. White space can be preserved in XML.
.
13 HTML tags are used for displaying the XML tags are used for describing the
. data. data not for displaying.

14 In HTML, closing tags are not


In XML, closing tags are necessary.
. necessary.

15
HTML is used to display the data. XML is used to store data.
.

16 HTML does not carry data it just XML carries the data to and from the
. displays it. database.

17 In XML, the objects are expressed by


HTML offers native object support.
. conventions using attributes.

XML document size is relatively large as


18 HTML document size is relatively
the approach of formatting and the codes
. small.
both are lengthy.

An additional application is not DOM(Document Object Model) is


19
required for parsing of JavaScript code required for parsing JavaScript codes and
.
into the HTML document. mapping of text.

20 Some of the tools used for HTML are: Some of the tools used for XML are:
.
 Visual Studio Code  Oxygen XML
 Atom  XML Notepad
 Notepad++  Liquid Studio
 Sublime Text and many more.
and many more.
Components of XML with example

XML documents are composed of several components that define the structure and content of
the data being represented. The main components of an XML document are as follows:

1. Prologue: The prologue is an optional component that appears at the beginning of an


XML document. It typically contains the XML declaration, which specifies the XML
version and encoding used in the document.

Example:

<?xml version="1.0" encoding="UTF-8"?>

2. Element: An element is a fundamental building block of an XML document and


represents a distinct piece of data. Elements are enclosed within start tags and end
tags, and they can contain other elements, text content, or attributes.

Example:

<book>

<title>Sample Book</title>

<author>John Doe</author>

</book>

3. Start Tag and End Tag: A start tag (also known as an opening tag) is used to begin
an element, and an end tag (also known as a closing tag) is used to close the element.
The content between the start and end tags represents the data or nested elements
associated with the element.

Example:

<book>
<!-- Content goes here -->

</book>

4. Attributes: Attributes provide additional information about an element and are


included within the start tag of the element. They consist of a name and a value,
separated by an equal sign ("="). An element can have multiple attributes.

Example:

<book category="fiction" lang="en">

Shaktimaan

</book>

5. Text Content: Text content is the data enclosed within an element. It can include
plain text, numbers, or any other character data.

Example:

<title>Sample Book</title>

6. Comments: Comments are used to include explanatory or informative notes within


an XML document. They are enclosed within <!-- and -->.

Example: //jgjkhkjhgkhk

/*dsfsdf

*/

<!-- This is a comment in XML -->

7. CDATA Section: A CDATA section is used to include blocks of text that should be
treated as character data and not be parsed as XML markup.

Example:

<![CDATA[This is a CDATA section containing <tags> and special characters &.]]>


8. Whitespace: Whitespace refers to spaces, tabs, line breaks, and other non-visible
characters. In XML, whitespace is generally ignored, except within CDATA sections
or when specifically preserved through mechanisms like XML Schema.

Example:

<![CDATA[

<text>

This is some text

with whitespace and

line breaks.

</text>

These components come together to create well-formed XML documents


that adhere to the rules and syntax of XML. XML's self-descriptive nature,
along with its support for nesting and hierarchy, makes it a powerful tool
for representing structured data in various applications and industries.

Here's an example of an XML document that showcases all the


components explained earlier:

<?xml version="1.0" encoding="UTF-8"?>


<bookstore>
<!-- This is a comment -->
<book category="fiction" lang="en">
<title>Sample Book</title>
<author>John Doe</author>
<price>19.99</price>
</book>

<book category="non-fiction" lang="fr">


<title>French Non-Fiction Book</title>
<author>Jane Smith</author>
<price>15.50</price>
</book>

<!-- This is a CDATA section -->


<description><![CDATA[This book is a great read & highly
recommended!]]></description>

<!-- Element with nested elements -->


<magazine>
<title>Tech Today</title>
<issue>July 2023</issue>
<editorial>
<![CDATA[Check out the latest tech trends!]]>
</editorial>
</magazine>
</bookstore>
In this example, we have an XML document representing a bookstore. It includes the
following components:

1. Prologue: The prologue is the first line of the XML document, declaring the version
(1.0) and encoding (UTF-8) used in the document.

2. Comments: There are two comment sections in the XML, providing explanatory
notes to readers and developers.

3. Elements: The XML document contains elements like <bookstore>, <book>,


<title>, <author>, <price>, <description>, <magazine>, and <issue>, each
representing a distinct piece of data.

4. Attributes: The <book> and <magazine> elements have attributes category and
lang.

5. Text Content: The elements <title>, <author>, <price>, <issue>, and <editorial>
contain text content representing various data values.
6. CDATA Section: The <description> element contains a CDATA section, preserving
the text as character data, including special characters like &.

7. Whitespace: The whitespace within the <bookstore> and <magazine> elements is


ignored by default, but it helps improve human readability.

This example demonstrates how XML components can be combined to create a well-formed
and structured XML document, allowing for the representation of different data elements in a
self-descriptive and easily readable manner.

Anatomy of an XML Document


An XML (eXtensible Markup Language) document follows a specific structure known as the
"anatomy of an XML document." This structure defines the required components that make
up a valid XML file. The key components of an XML document are:

1. Prologue: The prologue is an optional component that appears at the beginning of an


XML document. It consists of the XML declaration, which provides information
about the XML version and encoding used in the document.
The first line of the document is known as the XML declaration. This tells a processing
application which version of XML you are using (the version indicator is mandatory) and
which character encoding you have used for the document.

If the XML declaration is omitted, a processor will make certain assumptions about your
document. In particular, it will expect it to be encoded in UTF-8, an encoding of the Unicode
character set. However, it is best to use the XML declaration wherever possible, both to avoid
confusion over the character encoding and to indicate to processors which version of XML
you're using.

Example:

<?xml version="1.0" encoding="UTF-8"?>

2. Root Element: The root element is the outermost element in the XML document. It
acts as the container for all other elements and serves as the starting point for the
document's hierarchical structure. There can be only one root element in an XML
document.

Example:

<root>

<!-- Other elements go here -->

</root>

3. Elements and Attributes

The second line of the example begins an element, which has been named authors. The
contents of that element include everything between the right angle bracket (>)
in <authors> and the left angle bracket (<) in </authors>. The actual syntactic
constructs <authors> and </authors> are often referred to as the element start tag and end
tag, respectively. Do not confuse tags with elements! Note that elements may include other
elements, as well as text. An XML document must contain exactly one root element, which
contains all other content within the document. The name of the root element defines the type
of the XML document.
Elements that contain both text and other elements simultaneously are classified as mixed
content. The sample authors document uses elements named person to describe the authors
themselves. Each person element has an attribute named id. Unlike elements, attributes can
contain only textual content. Their values must be surrounded by quotes. Either single quotes
(') or double quotes (") may be used, as long as you use the same kind of closing quote as the
opening one.

Within XML documents, attributes are frequently used for metadata (i.e., "data about data"),
describing properties of the element's contents.

<animal name="dog" legs="4"/>

<animal legs="4" name="dog"/>

On the other hand, the information presented to an application by an XML processor upon
reading the following two lines will be different for each animal element, because the
ordering of elements is significant:

<animal><name>dog</name><legs>4</legs></animal>

<animal><legs>4</legs><name>dog</name></animal>

XML treats a set of attributes like a bunch of stuff in a bag ? there is no implicit ordering ?
while elements are treated like items on a list, where ordering matters.

4. Well-Formedness

An XML document that conforms to the rules of XML syntax is known as well-formed. At
its most basic level, well-formedness means that elements should be properly matched, and
all opened elements should be closed.

Table A-1 shows some XML documents that are not well-formed.

Table A-1. Examples of poorly formed XML documents

Document Reason it's not well-formed

The elements are not properly nested, because foo is closed while inside its child
<foo>
element bar.
<bar>

</foo>

</bar>

<foo>

<bar> The bar element was not closed before its parent, foo, was closed.

</foo>

<foo baz> The baz attribute has no value. While this is permissible in HTML (e.g., <table

</foo> border>), it is forbidden in XML.

<foo
The baz attribute value, 23, has no surrounding quotes. Unlike HTML, all
baz=23>
attribute values must be quoted in XML.
</foo>

5. Comments

As in HTML, it is possible to include comments within XML documents. XML comments


are intended to be read only by people. With HTML, developers have occasionally employed
comments to add application-specific functionality. For example, the server-side include
functionality of most web servers uses instructions embedded in HTML comments. XML
provides other means of indicating application processing instructions. Comments should not
be used for any purpose other than those for which they were intended.

The start of a comment is indicated with <!--, and the end of the comment is indicated with --
>. Any sequence of characters, aside from the string --, may appear within a comment.
Comments tend to be used more in XML documents intended for human consumption than
those intended for machine consumption. Comments aren't widely used in RSS.

6. Entity References

Another feature of XML that is occasionally useful when writing RSS documents is the
mechanism for escaping characters.
Because some characters have special significance in XML, there needs to be a way to
represent them. For example, in some cases the < symbol might really be intended to mean
"less than," rather than to signal the start of an element name. Clearly, just inserting the
character without any escaping mechanism would result in a poorly formed document,
because a processing application would assume you were starting another element. Another
instance of this problem is needing to include both double quotes and single quotes
simultaneously in an attribute's value. Here's an example that illustrates both these
difficulties:

<badDoc>

<para>

I'd really like to use the < character

</para>

<note title="On the proper 'use' of the "character"/>

</badDoc>

XML avoids this problem by the use of the predefined entity reference. The word entity in the
context of XML simply means a unit of content. The term entity reference means just that, a
symbolic way of referring to a certain unit of content. XML predefines entities for the
following symbols: left angle bracket (<), right angle bracket (>), apostrophe ('), double quote
("), and ampersand (&).

An entity reference is introduced with an ampersand (&), which is followed by a name (using
the word "name" in its formal sense, as defined by the XML 1.0 specification), and
terminated with a semicolon (;).

Table A-2 shows how the five predefined entities can be used within an XML document.

Table A-2. Predefined entity references in XML 1.0

Literal character Entity reference

< &lt;

< &gt;
` &apos;

" &quot;

& &amp;

Here's our problematic document, revised to use entity references:

<badDoc>

<para>

I'd really like to use the &lt; character

</para>

<note title="On the proper &apos; use &apos; of the &quot;character"/>

</badDoc>

7. Character References

Character references allow you to denote a character by its numeric position in Unicode
character set (this position is known as its code point). Table A-3 contains a few examples
that illustrate the syntax.

Table A-3. Example character references

Actual character Character reference

1 &#48;

A &#65;

~ &#xD1;

® &#xAE;

8. Encoding is the process of converting unicode characters into their


equivalent binary representation. When the XML processor reads an XML
document, it encodes the document depending on the type of encoding.
Hence, we need to specify the type of encoding in the XML declaration.
Encoding Types
There are mainly two types of encoding −

 UTF-8
 UTF-16
UTF stands for UCS Transformation Format, and UCS itself means Universal
Character Set. The number 8 or 16 refers to the number of bits used to represent a
character. They are either 8(1 to 4 bytes) or 16(2 or 4 bytes). For the documents
without encoding information, UTF-8 is set by default.
AD

Syntax
Encoding type is included in the prolog section of the XML document. The syntax for
UTF-8 encoding is as follows −
<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
The syntax for UTF-16 encoding is as follows −
<?xml version = "1.0" encoding = "UTF-16" standalone = "no" ?>
Example
Following example shows the declaration of encoding −
<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
In the above example encoding="UTF-8", specifies that 8-bits are used to represent
the characters. To represent 16-bit characters, UTF-16 encoding can be used.
The XML files encoded with UTF-8 tend to be smaller in size than those encoded
with UTF-16 format.

9. Validity

In addition to well-formedness, XML 1.0 offers another level of verification, called validity.
To explain why validity is important, let's take a simple example. Imagine you invented a
simple XML format for your friends' telephone numbers:

<phonebook>

<person>
<name>Albert Smith</name>

<number>123-456-7890</number>

</person>

<person>

<name>Bertrand Jones</name>

<number>456-123-9876</number>

</person>

</phonebook>

Based on your format, you also construct a program to display and search your phone
numbers. This program turns out to be so useful, you share it with your friends. However,
your friends aren't so accurate on detail as you are, and they try to feed your program this
phone book file:

<phonebook>

<person>

<name>Melanie Green</name>

<phone>123-456-7893</phone>

</person>

</phonebook>

Note that, although this file is perfectly well-formed, it doesn't fit the format you prescribed
for the phone book, and you find you need to change your program to cope with this
situation. If your friends had used number as you did to denote the phone number, and
not phone, there wouldn't have been a problem. However, as it is, this second file is not a
valid phonebook document.
10. XML Namespaces

XML 1.0 lets developers create their own elements and attributes, but it leaves open the
potential for overlapping names. "Title" in one context may mean something entirely
different than "Title" in a different context. The "Namespaces in XML" specification
provides a mechanism developers can use to identify particular vocabularies using Uniform
Resource Identifiers (URIs).

XML Example
XML documents create a hierarchical structure looks like a tree so it is known as XML Tree
that starts at "the root" and branches to "the leaves".

Example of Sample XML Document

XML documents uses a self-describing and simple syntax:

1. <?xml version="1.0" encoding="ISO-8859-1"?>


2. <note>
3. <to>Tove</to>
4. <from>Jani</from>
5. <heading>Reminder</heading>
6. <body>Don't forget me this weekend!</body>
7. </note>

The first line is the XML declaration. It defines the XML version (1.0) and the encoding used
(ISO-8859-1 = Latin-1/West European character set).

The next line describes the root element of the document (like saying: "this document is a
note"):

1. <note>

The next 4 lines describe 4 child elements of the root (to, from, heading, and body).
1. <to>Tove</to>
2. <from>Jani</from>
3. <heading>Reminder</heading>
4. <body>Don't forget me this weekend!</body>

And finally the last line defines the end of the root element.

1. </note>

XML documents must contain a root element. This element is "the parent" of all other
elements.

The elements in an XML document form a document tree. The tree starts at the root and
branches to the lowest level of the tree.

All elements can have sub elements (child elements).

1. <root>
2. <child>
3. <subchild>.....</subchild>
4. </child>
5. </root>

The terms parent, child, and sibling are used to describe the relationships between elements.
Parent elements have children. Children on the same level are called siblings (brothers or
sisters).

All elements can have text content and attributes (just like in HTML).

Another Example of XML: Books


File: books.xml

1. <bookstore>
2. <book category="COOKING">
3. <title lang="en">Everyday Italian</title>
4. <author>Giada De Laurentiis</author>
5. <year>2005</year>
6. <price>30.00</price>
7. </book>
8. <book category="CHILDREN">
9. <title lang="en">Harry Potter</title>
10. <author>J K. Rowling</author>
11. <year>2005</year>
12. <price>29.99</price>
13. </book>
14. <book category="WEB">
15. <title lang="en">Learning XML</title>
16. <author>Erik T. Ray</author>
17. <year>2003</year>
18. <price>39.95</price>
19. </book>
20. </bookstore>

The root element in the example is <bookstore>. All elements in the document are contained
within <bookstore>.

The <book> element has 4 children: <title>,< author>, <year> and <price>.

XML - Declaration
XML declaration contains details that prepare an XML processor to parse the XML
document. It is optional, but when used, it must appear in the first line of the XML document.

Syntax

Following syntax shows XML declaration −

<?xml
version = "version_number"
encoding = "encoding_declaration"
standalone = "standalone_status"
?>

Each parameter consists of a parameter name, an equals sign (=), and parameter value inside
a quote. Following table shows the above syntax in detail −

Parameter Parameter_value Parameter_description

Version 1.0 Specifies the version of the XML standard


used.

Encoding UTF-8, UTF-16, ISO- It defines the character encoding used in the
10646-UCS-2, ISO- document. UTF-8 is the default encoding used.
10646-UCS-4, ISO-8859-
1 to ISO-8859-9, ISO-
2022-JP, Shift_JIS, EUC-
JP

Standalone yes or no It informs the parser whether the document


relies on the information from an external
source, such as external document type
definition (DTD), for its content. The default
value is set to no. Setting it to yes tells the
processor there are no external declarations
required for parsing the document.

Rules
An XML declaration should abide with the following rules −

 If the XML declaration is present in the XML, it must be placed as the first line in the
XML document.
 If the XML declaration is included, it must contain version number attribute.
 The Parameter names and values are case-sensitive.
 The names are always in lower case.
 The order of placing the parameters is important. The correct order is: version,
encoding and standalone.
 Either single or double quotes may be used.
 The XML declaration has no closing tag i.e. </?xml>

XML Declaration Examples

Following are few examples of XML declarations −

XML declaration with no parameters −

<?xml >

XML declaration with version definition −

<?xml version = "1.0">

XML declaration with all parameters defined −

<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>

XML declaration with all parameters defined in single quotes −

<?xml version = '1.0' encoding = 'iso-8859-1' standalone = 'no' ?>

The Root Element in XML


In XML (eXtensible Markup Language), the root element is the outermost element in an XML
document. It acts as the container for all other elements in the document and is the starting point of
the document's hierarchical structure.

The root element is the only element in the XML document that is not nested inside any other
element. All other elements must be contained within the root element, either directly or indirectly
through other elements.

Here is an example of an XML document with the root element:

<?xml version="1.0" encoding="UTF-8"?>

<rootElement>

<!-- Other elements go here -->

</rootElement>
In this example, <rootElement> is the root element. All other elements in the
XML document will be nested within this root element. The root element gives
the XML document its structure and serves as the starting point for traversing
and accessing the data within the document. It defines the context for all the
data elements in the XML document.

Here's an example of an XML document with the root element:

<?xml version="1.0" encoding="UTF-8"?>

<library>

<book>

<title>Sample Book</title>

<author>John Doe</author>

<ISBN>123456789</ISBN>

</book>

<book>

<title>Another Book</title>

<author>Jane Smith</author>

<ISBN>987654321</ISBN>

</book>

</library>

In this example, the root element is <library>. It is the outermost element and acts as the container
for all the other elements in the XML document. All the <book> elements are nested within the
<library> element.

What is an XML Element?


An XML element is everything from (including) the element's start tag to
(including) the element's end tag.

<price>29.99</price>

An element can contain:

 text
 attributes
 other elements
 or a mix of the above

<bookstore>
<book category="children">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>

In the example above:

<title>, <author>, <year>, and <price> have text content because they
contain text (like 29.99).

<bookstore> and <book> have element contents, because they contain


elements.

<book> has an attribute (category="children").

Empty XML Elements


In XML (eXtensible Markup Language), an empty element refers to an element that doesn't contain
any child elements or text content. An empty element is represented using a self-closing tag,
meaning it has an opening tag but no corresponding closing tag. Instead, the self-closing tag ends
with a forward slash ("/").

Empty elements are useful when representing data that doesn't require additional nested elements
or when defining attributes without any content. They are commonly used in XML documents to
indicate the presence of specific data points or properties without providing additional details.

Here are some examples of empty XML elements:

1. Empty Element without Attributes:

<emptyElement />

2. Empty Element with Attributes:


<book ISBN="123456789" />
3. Empty Element within a Parent Element:
<library>
<book ISBN="987654321" />
<book ISBN="543210987" />
</library>

In the first example, <emptyElement /> is a standalone empty element. It doesn't contain
any child elements or text content.
In the second example, <book ISBN="123456789" /> is an empty element representing a
book with an ISBN attribute. It doesn't have any nested elements or text content but
includes an attribute named "ISBN" with the value "123456789."
In the third example, the <book> element is used as an empty element within a parent
element <library>. This structure allows multiple empty <book> elements to be included
under the <library> element, each representing a different book with its own set of
attributes.
Empty elements are a convenient way to represent simple data points or attributes in XML
without the need for additional nested elements or content. They help maintain a clear and
concise representation of data, especially when certain elements only require minimal
information.

XML Naming Rules


In XML (eXtensible Markup Language), elements are fundamental building blocks
used to represent data. When naming elements in XML documents, certain rules must
be followed to ensure valid and well-formed XML. Here are the naming rules for
XML elements:

1. Element Name Start Character: The first character of an element name must be
a letter (A-Z or a-z) or an underscore ("_"). It cannot start with a number or any
other special character.
2. Element Name Characters: After the first character, the element name can
include letters, numbers, underscores, hyphens, and periods. Special characters
like spaces, commas, and other punctuation marks are not allowed.
3. Element Name Case Sensitivity: XML is case-sensitive. This means that
elements with different cases (e.g., "book" and "Book") are treated as distinct
elements.
4. Reserved Names: Certain names are reserved and cannot be used as element
names because they have specific meanings in XML. For example, you cannot
use "xml" as an element name (e.g., <xml>).
5. Validity of Element Names: Element names must be valid XML names. This
means they cannot be XML keywords, cannot start with "xml" (case-
insensitive), and cannot contain colons (":"), which are reserved for
namespaces.

Examples of valid XML element names:

<book>
<author_name>
<ISBN_number>
<price_usd>
<element_123>

Examples of invalid XML element names:

<123element> <!-- Cannot start with a number -->


<book title> <!-- Cannot contain spaces -->
<element.name> <!-- Cannot contain a period -->
<XML> <!-- Reserved name -->
<xmlElement> <!-- Case-sensitive: conflicts with reserved "xml" -->

XML Related Technologies

Here we have pointed out XML related technologies. There are following XML related
technologies:

No. Technology Meaning Description

1) XHTML Extensible It is a clearer and stricter version of XML. It belongs


html to the family of XML markup languages. It was
developed to make html more extensible and increase
inter-operability with other data.

2) XML DOM XML It is a standard document model that is used to access


document and manipulate XML. It defines the XML file in tree
object model structure.
3) XSL Extensible
it contain three style sheet i) It transforms XML into other formats, like
parts: language html.
i) XSLT (xsl ii) It is used for formatting XML to screen,
transform) paper etc.
ii) XSL iii) It is a language to navigate XML
iii)XPath documents.

4) XQuery XML query It is a XML based language which is used to query


language XML based data.

5) DTD Document It is an standard which is used to define the legal


type elements in an XML document.
definition

6) XSD XML schema It is an XML based alternative to dtd. It is used to


definition describe the structure of an XML document.

7) XLink XML linking xlink stands for XML linking language. This is a
language language for creating hyperlinks (external and internal
links) in XML documents.

8) XPointer XML pointer It is a system for addressing components of XML


language based internet media. It allows the xlink hyperlinks to
point to more specific parts in the XML document.

XML Attributes
XML elements can have attributes. By the use of attributes we can add the information about
the element.

XML attributes enhance the properties of the elements.


Note: XML attributes must always be quoted. We can use single or double quote.

Let us take an example of a book publisher. Here, book is the element and publisher is the
attribute.

1. <book publisher="Tata McGraw Hill"></book>

Or

1. <book publisher='Tata McGraw Hill'></book>

Metadata should be stored as attribute and data should be stored as element.

1. <book>
2. <book category="computer">
3. <BOOK >
4. <CATEGORY>Computer</category>
5. <CATEGORY>Science</category>
6.
7. <author> A & B </author>
8. </book>

Data can be stored in attributes or in child elements. But there are some limitations in using
attributes, over child elements.

Why should we avoid XML attributes


o Attributes cannot contain multiple values but child elements can have multiple values.

o Attributes cannot contain tree structure but child element can.

o Attributes are not easily expandable. If you want to change in attribute's vales in
future, it may be complicated.
o Attributes cannot describe structure but child elements can.

o Attributes are more difficult to be manipulated by program code.

o Attributes values are not easy to test against a DTD, which is used to define the legal
elements of an XML document.
Difference between attribute and sub-element

In the context of documents, attributes are part of markup, while sub elements are part of the
basic document contents.

In the context of data representation, the difference is unclear and may be confusing.

Same information can be represented in two ways:

1st way:

1. <book publisher="Tata McGraw Hill"> </book>

2nd way:

1. <book>
2. <publisher> Tata McGraw Hill </publisher>
3. </book>

In the first example publisher is used as an attribute and in the second example publisher is an
element.

Both examples provide the same information but it is good practice to avoid attribute in XML
and use elements instead of attributes.

XML Comments

XML comments are just like HTML comments. We know that the comments are used to
make codes more understandable other developers.

XML Comments add notes or lines for understanding the purpose of an XML code. Although
XML is known as self-describing data but sometimes XML comments are necessary.

Syntax

An XML comment should be written as:


1. <!-- Write your comment-->
You cannot nest one XML comment inside the another.
XML Comments Example

Let's take an example to show the use of comment in an XML example:

1. <?xml version="1.0" encoding="UTF-8" ?>


2. <!--Students marks are uploaded by months-->
3. <students>
4. <student>
5. <name>Ratan</name>
6. <marks>70</marks>
7. </student>
8. <student>
9. <name>Aryan</name>
10. <marks>60</marks>
11. </student>
12. </students>

Rules for adding XML comments


o Don't use a comment before an XML declaration.

o You can use a comment anywhere in XML document except within attribute value.

o Don't nest a comment inside the other comment.

XML Tree Structure


An XML document has a self-descriptive structure. It forms a tree structure which is referred
as an XML tree. The tree structure makes easy to describe an XML document.

A tree structure contains root element (as parent), child element and so on. It is very easy to
traverse all succeeding branches and sub-branches and leaf nodes starting from the root.
Example of an XML document
1. <?xml version="1.0"?>
2. <college>
3. <student>
4. <firstname>Tamanna</firstname>
5. <lastname>Bhatia</lastname>
6. <contact>09990449935</contact>
7. <email>[email protected]</email>
8. <address>
9. <city>Ghaziabad</city>
10. <state>Uttar Pradesh</state>
11. <pin>201007</pin>
12. </address>
13. </student>
14. </college>

Let's see the tree-structure representation of the above example.


In the above example, first line is the XML declaration. It defines the XML version 1.0. Next
line shows the root element (college) of the document. Inside that there is one more element
(student). Student element contains five branches named <firstname>, <lastname>,
<contact>, <Email> and <address>.

<address> branch contains 3 sub-branches named <city>, <state> and <pin>.

Note: DOM parser represents the XML document in Tree structure.

XML – DOM
XML DOM (Document Object Model) is a programming interface that represents the
structure of an XML document as a tree-like object, allowing developers to manipulate and
navigate XML documents using programming languages. It provides a platform-independent,
language-neutral way to access and interact with XML documents dynamically.
The XML DOM exposes the XML document's contents and structure as a set of
interconnected objects, where each node in the tree corresponds to an element, attribute, or
text content in the XML document. This tree-like representation is also known as a "node
tree" or "DOM tree."
Key features and functionalities of XML DOM include:

1. Parsing XML: XML DOM allows developers to parse XML documents, converting
them into a structured tree of nodes that can be easily manipulated and accessed.
2. Node Types: The DOM tree consists of different types of nodes, including elements,
attributes, text, comments, and processing instructions. Each node type is represented
by a specific DOM interface.
3. Traversal: Developers can traverse the DOM tree, moving between nodes, accessing
parent, child, and sibling nodes, and navigating the entire structure.
4. Node Creation and Modification: XML DOM enables the creation of new elements,
attributes, and text nodes and the modification of existing nodes, allowing developers
to update XML documents dynamically.
5. Search and Query: DOM provides methods to search for specific elements or
attributes based on their names, values, or positions within the tree.
6. Validation: XML DOM can validate XML documents against XML Schema or DTD
(Document Type Definition) to ensure their conformity with predefined rules.
7. Platform and Language Independence: XML DOM is available in many programming
languages, including Java, JavaScript, Python, C#, PHP, and more. It is implemented
as a set of APIs that can be used across different platforms.

Here's a simple example of how XML DOM can be used in JavaScript to access and modify
XML data:

<?xml version="1.0" encoding="UTF-8"?>

<bookstore>
<book>

<title>Sample Book</title>

<author>John Doe</author>

</book>
</bookstore>

// JavaScript code to access XML using DOM

var xmlDoc = new DOMParser().parseFromString(xmlString, 'text/xml');

var titleNode = xmlDoc.querySelector('title');

console.log(titleNode.textContent); // Output: "Sample Book"

// Modify the title

titleNode.textContent = "Updated Book Title";


console.log(titleNode.textContent); // Output: "Updated Book Title"

XML Validation
A well-formed XML document can be validated against DTD or Schema.

A well-formed XML document is an XML document with correct syntax. It is very necessary
to know about valid XML document before knowing XML validation.

Valid XML document

It must be well formed (satisfy all the basic syntax condition)

It should be behave according to predefined DTD or XML schema


Rules for well-formed XML
o It must begin with the XML declaration.

o It must have one unique root element.

o All start tags of XML documents must match end tags.

o XML tags are case sensitive.

o All elements must be closed.

o All elements must be properly nested.

o All attributes values must be quoted.

o XML entities must be used for special characters.


XML DTD

A DTD defines the legal elements of an XML document

In simple words we can say that a DTD defines the document structure with a list of legal
elements and attributes.

XML schema is a XML based alternative to DTD.

Actually DTD and XML schema both are used to form a well formed XML document.

We should avoid errors in XML documents because they will stop the XML programs.

XML schema
It is defined as an XML language

Uses namespaces to allow for reuses of existing definitions

It supports a large number of built in data types and definition of derived data types

XML DTD
What is DTD

DTD stands for Document Type Definition. It defines the legal building blocks of an XML
document. It is used to define document structure with a list of legal elements and attributes.

Purpose of DTD

Its main purpose is to define the structure of an XML document. It contains a list of legal
elements and define the structure with the help of them.
Checking Validation

Before proceeding with XML DTD, you must check the validation. An XML document is
called "well-formed" if it contains the correct syntax.

A well-formed and valid XML document is one which have been validated against DTD.

Valid and well-formed XML document with DTD

Let's take an example of well-formed and valid XML document. It follows all the rules of
DTD.

employee.xml

1. <?xml version="1.0"?>
2. <!DOCTYPE employee SYSTEM "employee.dtd">
3. <employee>
4. <firstname>vimal</firstname>
5. <lastname>jaiswal</lastname>
6. <email>[email protected]</email>
7. </employee>

In the above example, the DOCTYPE declaration refers to an external DTD file. The content
of the file is shown in below paragraph.

employee.dtd

1. <!ELEMENT employee (firstname,lastname,email)>


2. <!ELEMENT firstname (#PCDATA)>
3. <!ELEMENT lastname (#PCDATA)>
4. <!ELEMENT email (#PCDATA)>

OUTPUT:-

This XML file does not appear to have any style information associated with it. The
document tree is shown below.

<employee>
<firstname>vimal</firstname>
<lastname>jaiswal</lastname>
<email>[email protected]</email>
</employee>

Description of DTD

<!DOCTYPE employee : It defines that the root element of the document is employee.

<!ELEMENT employee: It defines that the employee element contains 3 elements


"firstname, lastname and email".
<!ELEMENT firstname: It defines that the firstname element is #PCDATA typed. (parse-
able data type).

<!ELEMENT lastname: It defines that the lastname element is #PCDATA typed. (parse-
able data type).

<!ELEMENT email: It defines that the email element is #PCDATA typed. (parse-able data
type).

XML DTD with entity declaration

A doctype declaration can also define special strings that can be used in the XML file.

An entity has three parts:

1. An ampersand (&)
2. An entity name
3. A semicolon (;)

Syntax to declare entity:

1. <!ENTITY entity-name "entity-value">

Let's see a code to define the ENTITY in doctype declaration.

author.xml

1. <?xml version="1.0" standalone="yes" ?>


2. <!DOCTYPE author [
3. <!ELEMENT author (#PCDATA)>
4. <!ENTITY sj "Sonoo Jaiswal">
5. ]>
6. <author>&sj;</author>

OUTPUT:-
This XML file does not appear to have any style information associated with it. The
document tree is shown below.

<author>Sonoo Jaiswal</author>

In the above example, sj is an entity that is used inside the author element. In such case, it
will print the value of sj entity that is "Sonoo Jaiswal".

Note: A single DTD can be used in many XML files.

XML CSS

Purpose of CSS in XML

CSS (Cascading Style Sheets) can be used to add style and display information to an XML
document. It can format the whole XML document.

How to link XML file with CSS

To link XML files with CSS, you should use the following syntax:

1. <?xml-stylesheet type="text/css" href="cssemployee.css"?>

XML CSS Example

Let's see the css file.

cssemployee.css

1. employee
2. {
3. background-color: pink;
4. }
5. firstname,lastname,email
6. {
7. font-size:25px;
8. display:block;
9. color: blue;
10. margin-left: 50px;
11. }

Let's create the DTD file.

employee.dtd

1. <!ELEMENT employee (firstname,lastname,email)>


2. <!ELEMENT firstname (#PCDATA)>
3. <!ELEMENT lastname (#PCDATA)>
4. <!ELEMENT email (#PCDATA)>

Let's see the xml file using CSS and DTD.

employee.xml

1. <?xml version="1.0"?>
2. <?xml-stylesheet type="text/css" href="cssemployee.css"?>
3. <!DOCTYPE employee SYSTEM "employee.dtd">
4. <employee>
5. <firstname>vimal</firstname>
6. <lastname>jaiswal</lastname>
7. <email>[email protected]</email>
8. </employee>

output

vimal jaiswal [email protected]

CSS is not generally used to format XML file. W3C recommends XSLT instead of CSS.
XML Schema

What is XML schema

XML schema is a language which is used for expressing constraint about XML documents.
There are so many schema languages which are used now a days for XSD (XML schema
definition).

An XML schema is used to define the structure of an XML document. It is like DTD but
provides more control on XML structure.

Checking Validation

An XML document is called "well-formed" if it contains the correct syntax. A well-formed


and valid XML document is one which have been validated against Schema.

Visit http://www.xmlvalidation.com to validate the XML file against schema or DTD.

XML Schema Example

Let's create a schema file.

employee.xsd

1. <?xml version="1.0"?>
2. <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
3. targetNamespace="http://www.javatpoint.com"
4. xmlns="http://www.javatpoint.com"
5. elementFormDefault="qualified">
6. <xs:element name="employee">
7. <xs:complexType>
8. <xs:sequence>
9. <xs:element name="firstname" type="xs:string"/>
10. <xs:element name="lastname" type="xs:string"/>
11. <xs:element name="email" type="xs:string"/>
12. </xs:sequence>
13. </xs:complexType>
14. </xs:element>
15. </xs:schema>

Let's see the xml file using XML schema or XSD file.

employee.xml

1. <?xml version="1.0"?>
2. <employee
3. xmlns="http://www.javatpoint.com"
4. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
5. xsi:schemaLocation="http://www.javatpoint.com employee.xsd">
6.
7. <firstname>vimal</firstname>
8. <lastname>jaiswal</lastname>
9. <email>[email protected]</email>
10. </employee>

Description of XML Schema

<xs:element name="employee"> : It defines the element name employee.

<xs:complexType> : It defines that the element 'employee' is complex type.

<xs:sequence> : It defines that the complex type is a sequence of elements.


<xs:element name="firstname" type="xs:string"/> : It defines that the element 'firstname'
is of string/text type.

<xs:element name="lastname" type="xs:string"/> : It defines that the element 'lastname'


is of string/text type.

<xs:element name="email" type="xs:string"/> : It defines that the element 'email' is of


string/text type.

DTD vs XSD

There are many differences between DTD (Document Type Definition) and XSD (XML
Schema Definition). In short, DTD provides less control on XML structure whereas XSD
(XML schema) provides more control.

The important differences are given below:

No DTD XSD
.

1) DTD stands for Document Type XSD stands for XML Schema Definition.
Definition.

2) DTDs are derived XSDs are written in XML.


from SGML syntax.

3) DTD doesn't support datatypes. XSD supports datatypes for elements and
attributes.

4) DTD doesn't support namespace. XSD supports namespace.

5) DTD doesn't define order for child XSD defines order for child elements.
elements.

6) DTD is not extensible. XSD is extensible.

7) DTD is not simple to learn. XSD is simple to learn because you don't need to
learn new language.

8) DTD provides less control on XML XSD provides more control on XML structure.
structure.

CDATA vs PCDATA

CDATA (Character Data) and PCDATA (Parsed Character Data) are two different types of
data that can be used in XML documents to represent character content. They are both used
to include textual data within XML elements, but they have different handling and parsing
rules.

1. CDATA (Character Data): CDATA sections are used to include blocks of text that
should be treated as character data and not be parsed as XML markup. The content
within a CDATA section is ignored by the XML parser, and special characters (such
as <, >, and &) are treated as literal text rather than XML markup. CDATA sections
are often used to include text that contains a lot of XML-reserved characters, avoiding
the need for escaping.

Example of a CDATA section:

<description><![CDATA[This is a <b>bold</b> statement & more!]]></description>

In this example, the content inside the CDATA section is treated as plain text, and the XML
parser will not attempt to interpret the <b> element or the & symbol.

2. PCDATA (Parsed Character Data): PCDATA refers to character data that is parsed
by the XML parser. Unlike CDATA, PCDATA is subject to XML parsing rules, and
special characters need to be escaped using character entities (e.g., &lt; for <, &gt;
for >, and &amp; for &). PCDATA allows for structured text content within XML
elements, such as nested elements, attributes, and entity references.

Example of PCDATA:

<description>This is a &lt;b&gt;bold&lt;/b&gt; statement &amp; more!</description>

This is a < > < statement & more!

In this example, the content within the <description> element is treated as PCDATA. The
XML parser interprets the escaped entities (&lt;, &gt;, and &amp;) and processes the
content accordingly.

The choice between CDATA and PCDATA depends on the requirements of the XML data. If
the text content contains a lot of special characters or XML markup that you want to be
treated as plain text, CDATA is a better choice. However, if the text content is structured and
includes nested elements or attributes, using PCDATA with proper escaping is more
appropriate to maintain the XML's structural integrity.

Markup Delimiters

<studentwrgergregeggggggggtrhyyuuy>abc</
student>

Markup delimiters are special characters or sequences used in markup languages to enclose
or delimit elements, attributes, or other components within the markup. Markup delimiters
define the beginning and ending boundaries of different parts of the markup content. These
delimiters are essential for defining the structure and semantics of the markup language.

The most common markup delimiter is the angle bracket ("<" and ">"), which is used in
languages like HTML, XML, and SGML. Angle brackets enclose element names, attributes,
and other tags within the markup.

Here is example of markup delimiters in XML

1. XML Element Delimiters:

<book> ... </book>

<title>Sample Book</title>
In XML, angle brackets ("<" and ">") are used to delimit element names. The opening tag
<book> marks the beginning of the "book" element, and the closing tag </book> marks the
end of the element.

2. XML Attribute Delimiters:


<book category="fiction123456789" lang="en"> ... </book>
In XML, attribute values are delimited using double quotes ("") or single quotes ('')
after the attribute name and an equal sign (=).

Element Markup and Attribute Markup

Element markup refers to the process of creating and defining elements within a markup
language. It involves using special syntax and delimiters to specify the structure and content
of the elements, allowing for the representation of data and its semantics in a structured way.

In markup languages like HTML and XML, elements are the building blocks used to define
the structure and content of a document. Each element represents a specific piece of
information and is typically enclosed within start tags ("<element>") and end tags
("</element>"). The content and attributes of the element are specified between the start and
end tags.

For example, in HTML, an element can be used to represent a paragraph as follows:

<p>This is a paragraph element.</p>

Here, <p> is the start tag, indicating the beginning of the paragraph element, and </p> is the
end tag, indicating the end of the paragraph element. The text "This is a paragraph element."
is the content of the paragraph element.

In XML, elements can be used to represent structured data. For example:

<book>

<title>Sample Book</title>

<author>John Doe</author>

</book>
In this XML example, <book> is the start tag of the "book" element, and </book> is the end
tag. The content of the "book" element includes two nested elements, <title> and <author>,
each representing the title and author of the book, respectively.

Elements can also have attributes, which provide additional information about the element.
For example:

<img src="image.jpg" alt="Image">

XML - Parsers
XML parser is a software library or a package that provides interface for client
applications to work with XML documents. It checks for proper format of the XML
document and may also validate the XML documents. Modern day browsers have
built-in XML parsers.
Following diagram shows how XML parser interacts with XML document −

The goal of a parser is to transform XML into a readable code.


XML parsers are software components or libraries that read XML documents and
interpret their structure, allowing developers to access and manipulate the data
within the documents programmatically. XML parsers are essential for processing
XML data in various programming languages and environments. They facilitate the
extraction and handling of XML data, making it easier to work with structured
information.
There are two main types of XML parsers:
1. DOM (Document Object Model) Parsers: DOM parsers construct an in-
memory tree-like representation of the XML document, known as the DOM
tree. The tree structure allows easy navigation and manipulation of the XML
data using standard programming interfaces. Developers can traverse the
tree, access nodes (elements, attributes, text), add or modify nodes, and save
the modified XML back to a file. DOM parsers load the entire XML document
into memory, making them suitable for relatively small XML files.
2. SAX (Simple API for XML) Parsers: SAX parsers, on the other hand, work
differently. Instead of building an in-memory tree, they process the XML
document sequentially as a stream. As the parser reads the document, it
sends events to an application or event handler. Developers can then handle
these events to extract the necessary data from the XML. SAX parsers do not
load the entire XML into memory, making them suitable for large XML
documents or in situations where memory resources are limited.
The choice between DOM and SAX parsers depends on the specific requirements of
the XML processing task:
 DOM Parsers are best suited for tasks that involve frequent navigation and
manipulation of XML data. They provide a comprehensive view of the XML
document, making it easy to work with the entire structure. However, they can
be memory-intensive for large XML files.
 SAX Parsers are ideal for scenarios where memory efficiency and
performance are critical. Since SAX parsers process XML documents
sequentially, they can handle large files more efficiently. However, they are
less convenient for complex XML data manipulations compared to DOM
parsers.
These XML parsers simplify XML data processing, enabling developers to work with
XML documents efficiently and effectively.

You might also like