Chapter-4
XML
What is XML?
• The eXtensible Markup Language (XML) is a text
document used mainly for distributing the data
on the internet between different applications.
• An xml is a text file saved with an extension .xml
• It’s a document for storing and transporting the
data; mainly used for the interchanging the data
on the internet.
• It is a language similar to html.
• In xml user can define our own tags and these
tags are used to describe the data.
• It is a compatible scripting language.
Advantages
• xml documents are easy to create.
• It has the property of self describing the data.
• xml is a fully compatible application like java.
• It is a portable language.
• It is platform independent.
*Difference between XML and HTML
XML syntax:
XML declaration:
This XML declaration indicates that the
document is written in XML and specifies
which version of XML.
XML declaration can also specify the language
encoding for the document.
Ex: <? xml version=”1.0” encoding=”UTF-8”?
Lang =“en”>
• Comments:Non executable part of a
program.
• XML comments begin with <!- - and end with - -> .
• XML comments allow us to write comments
within the document
• Ex: <!--This file is related to book information-->
• Root element:
• The first element in the XML document is
called root element, which is the parent of all
other elements in the document.
• Ex: <books>
----------
-----------
</books>
• Child elements:
The elements that are contained within
the root elements are called child elements.
• Empty elements:
An empty element is the one without the
closing tag and which does not hold any
contents.
• Ex: <br/>, < hr/>, <img/>…..
• Closing Tags – that’s the closing of the root
element.
Elements
• An xml document consists of 3 main tags
– Elements
– Attributes
– Entities
Elements
• Element:
The content between the start tag<..> and
end tag</..> including the tags is called element.
Ex: <title>Web programming</title>
<title>System programming</title>
Here web programming & system programming
are the elements.
*Attributes
• An attribute is a name/value pair , that we place
within an opening tag , which allows us to provide
extra information about an element.
• The property that describes an element is called
attributes.
• Ex :< img src=”myimage.gif”/>
• <input type=“text”>
• Here src & type are the attributes.
• An element can contain one or more attributes.
Entities
• Entity is an object in the real world
• Eg: Student
• Book etc
*XML syntax Rules:
• All XML documents must have a root element
• XML is Case sensitive.
• All XML elements must have closing tags.
• All XML elements must be properly nested.
• Attribute values must be quoted.
eg: <input type =“text”>
• The first character of each tag name must be a letter
or the “_ “character, but not numbers or other
punctuation.
*XML CDATA:
• CDATA is nothing but character data.
• The term CDATA is used about text data.
• Characters like <, >,& and few are treated as
illegal in xml elements.
• It will generate an error if we directly using it.
• So in order to avoid the errors in scripting, the
code can be defined using CDATA.
syntax
<! [CDATA [“ contents“ ] ]> as the closing
tag.
• "<" will generate an error because the parser
interprets it as the start of a new element.
• "&" will generate an error because the parser
interprets it as the start of an character entity.
• To avoid that error scripts code can be defined as CDATA as
follows:Example
<script type=”text/javascript” >
<![CDATA[
function greatest(a,b)
{
if(a>b)
return a;
else
return b;
}
]]>
• </script>
*Types of XML Documents
• There are two types
Well Formed document
Valid document
Well Formed document
• An XML document with correct syntax is called "Well
Formed". well-formedness refers to syntax.
• A Well Formed document is an xml document that confirms
or follows all the syntax rules of the xml.
• A well-formed XML document must have a corresponding end
tag for all of its start tags.
• Nesting of elements within each other in an XML document
must be proper.
• Eg:- <?xml version="1.0" encoding="UTF-8"?>
<!– -- Sample xml document-- -- >
<person>
<name> Manoj</name>
<age> 34</age>
<address> Hebbal</address>
</person>
Valid document
• An XML document said to be valid when it is not only well-formed, but
it also confirms to available DTD that specifies which tags it uses and
what attributes those tags can contain.
• validity refers to semantics.
• Syntax defines the rules and regulations that help write any statement
in a programming language, while semantics refers to the meaning of
the associated line of code.
• Eg:<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE strictSYSTEM “ strict.dtd">
<!– -- Sample xml document-- -- >
<person>
<name> Manoj</name>
<age> 34</age>
<address> Hebbal</address>
</person>
**DTD- Document Type Definition
• A DTD (Document Type Definition) consists of
a list of syntax definitions and rules for each
element in the XML document.
• The purpose of a DTD is to define the
structure and the legal elements and
attributes of an XML document:
• DTD specifies which element names can be
included in the document, the attributes that
each element can have, whether or not these
are required or optional and more.
DTD
• DTD <! DOCTYPE>
• The <!DOCTYPE> appearing near the top of the
document in every xml document;
• This is how DTD declaration happens in xml as well.
• Similarly to use DTD within XML document ,we need
to declare it.
• Syntax:
<!DOCTYPE rootname[DTD]>
• Eg:<! DOCTYPE books[note.dtd]>
Rules for DTD
• The DTD type declaration must be written in
between the xml declaration and the root
element.(ie, second line should be DOCTYPE)
• Keyword DOCTYPE must be followed by the
root element.
• Keyword DOCTYPE must be in uppercase.
**Types of DTD
• Internal DTD
• External DTD
Internal DTD
• A DTD is referred to as an internal DTD if
elements are declared within the XML files.
• If the DTD is declared inside the XML file, it
must be wrapped inside the <!DOCTYPE>
definition
• An internal DTD is defined between the square
brackets within the XML document.
Syntax
<!DOCTYPE root-element [element-declarations]>
Example
External DTD
• In external DTD elements are declared outside the
XML file.
• If the DTD is declared in an external file, the <!
DOCTYPE> definition must contain a reference to the
DTD file.
• It is same as internal except that defining an external
file.
• An external DTD is defined in an external file. And it
can be used with more than one XML document.
<?xml version="1.0"?>
<!DOCTYPE note “note.dtd”>
Syntax
• <!DOCTYPE root-element SYSTEM "file-name">
• where file-name is the file with .dtd extension.
**XML NAMESPACE:
• In XML namespace is used to prevent any conflicts
with element names.
• Because XML allows to create our own tag names,
there’s always the possibility of naming a tag exactly
same as one in another XML document.
• The XML namespace identifies the range of tags used
by the xml document.
• It is used to ensure that names used by one DTD
don’t conflict with user-defined tags or tags defined
Eg. For name conflicts
• If these XML fragments were added together,
there would be a name conflict.
• Both contain a <table> element, but the
elements have different content and meaning.
Solving the Name Conflict Using a Prefix
In the example above, there will be no conflict because the
two <table> elements have different names.
XML Namespaces - The xmlns Attribute
• When using prefixes in XML, a namespace for the prefix
must be defined.
• The namespace can be defined by an xmlns attribute in
the start tag of an element.
• The namespace declaration has the following syntax.
xmlns:prefix="URI".
**XML SCHEMAS
• An XML schema defines how to structure an XML
document and it can be used in place of DTD.
• An XML Schema describes the structure of an XML
document.
• – XML schema is based on XML.
• – XML Schema language is known as XML Schema
Definition (XSD).
• – The purpose of an XML Schema is to define the
legal building blocks of an XML document, just like a
DTD.
• An XML Schema:
• – defines elements that can appear in a document.
• – defines attributes that can appear in a document
• – defines which elements are child elements.
• – defines the order of child elements.
• – defines the number of child elements.
• – defines whether an element is empty or can
include text.
• – defines data types for elements and attributes.
• – defines default and fixed values for elements and
attributes.
(TYPES OF ELEMENTS IN XML)
• A simple Type
• A complex type
• “SIMPLE” TYPE ELEMENTS
• A simple element is an XML element that can contain
only text. It cannot contain any other elements or
attributes.
• Simple type elements have no children or attributes.
• Eg: <xs:element _name=“hai”/>
• “COMPLEX” TYPE ELEMENTS
• – A complex element may have attributes
•A complex element is an XML element that contains
other elements and/or attributes.
• – A complex element may be empty, or it may
contain text, other elements, or both text and other
elements.
• Eg: <product pid="1345"/>
Simple Elements
• A simple element is an XML element that can
contain only text. It cannot contain any other
elements or attributes.
• Complex Elements
A complex element is an XML element that
contains other elements and/or attributes.
• There are four kinds of complex elements:
• Empty elements
<product pid="1345"/>
Which does not have a child element.
• Elements that contain only other elements OR CHILD
Ex:A complex XML element, "employee", which
contains only other elements:
<employee>
<firstname>John</firstname>
<lastname>Smith</lastname>
</employee>
• Elements that contain only text.
Ex: A complex XML element, "food", which
contains only text:
<food type="dessert">Ice cream</food>
• Elements that contain both other elements and
text
Ex:A complex XML element, "description",
which contains both elements and text:
<description>
It happened on <date>03.03.99</date>
....
</description>
**XSL( Extensible Style sheet Language)
• It is a styling language for XML just like CSS is a
styling language for HTML.
• XSL is a language to format xml documents.
• XSL has two parts
- XSLT
- XSL- FO
XSLT
• XSLT stands for XSL Transformations.
• XSLT: It is a language for transforming XML
documents into various other types of
documents.
• XSLT (Extensible Stylesheet Language
Transformations) is a language for
transforming XML documents into other XML
documents like HTML for web pages, PDF,
PNG (portable network graphics)etc.
XSLT Transformation Process
• The process of transforming an XML
document into another format is called XSL
transformation.
• XSLT Processor is responsible for
transforming the xml document.
• XSLT processor reads XML and XSLT
document and produces the output in the
form of HTML or XHTML or XML or PDF etc.
Advantages
• XSLT provides an easy way to merge XML data
to produce output.
• By using XML and XSLT, the application will
look clean and will be easier to maintain.
• XSLT can be used as a validation language .
XSL-FO
• XSL-FO (XSL- Formatting Objects) is a markup
language for XML document formatting , that
is most often used to generate PDF files.
• A markup language is a text-encoding system
• XSL-FO is part of XSL (Extensible Stylesheet
Language), a set of W3C technologies
designed for the transformation and
formatting of XML data.
Parser
• A parser is a compiler or interpreter
component that breaks data into smaller
elements for easy translation into another
language. A parser takes input in the form of a
sequence of tokens or program instructions.
*XML PARSER or Processors
• An XML parser is a software library or package that provides
interfaces for client applications to work with an XML
document. The XML Parser is designed to read the XML and
create a way for programs to use XML.
• XML parser validates the document and check that the
document is well formatted.
• Reads in XML data, checks for syntactic constraints.
• There are two types of parser APIs(a set of functions and
procedures allowing the creation of applications)
– SAX Simple API to XML (event-based)
– DOM Document Object Model (object/tree based)
SAX(Simple API for XML)
• – An event-based parsing technique.
(the flow of the program is determined by events such
as user actions like mouse clicks.)
• – The parser generates an application event
whenever it encounters an element or data in the
document being parsed.
• It is an event based parser, it works like an event
handler in Java.
• – Programmer attaches “event handlers” to handle
the event. Eg: click -onclick
• Advantages
• 1) It is simple and memory efficient.
• 2) It is very fast and works for huge
documents.
• Disadvantages
• 1) It is event-based so its API is less sensitive.
• 2) Clients never know the full information
because the data is broken into pieces.
DOM
• Refer from Chapter 2