XML, XPath, XSLT, and XQuery Notes
Compilation (Detailed Version)
Part 1: XML and XML Schema
What is XML?
XML (eXtensible Markup Language) is a standard for storing, structuring, and distributing data
over the Web. Unlike HTML, which is designed to display data, XML is designed to carry data.
XML documents are self-descriptive, meaning they provide information on both the structure
and value of the data.
Key Characteristics of XML:
Standard for Web Data Exchange: XML allows for automatic interpretation of content
across different platforms and systems.
Simple and Text-based: XML is easy to read and can handle upgrades across operating
systems, software, and browsers.
Extensible: Users can define their own elements and attributes.
Separation of Content and Presentation: XML separates the data content from its
presentation, making it easy to reuse data across multiple platforms.
XML Data Model - Tree Structure
XML documents are structured as trees with a root element and sub-elements. Each element can
have attributes and child elements. Elements, tags, and attributes are the building blocks of an
XML document.
Example of XML Document:
<STAFFLIST>
<STAFF>
<STAFFNO>SL21</STAFFNO>
<NAME>
<FNAME>John</FNAME>
<LNAME>White</LNAME>
</NAME>
<POSITION>Manager</POSITION>
</STAFF>
</STAFFLIST>
Tree Structure Representation:
Root Node: STAFFLIST
Child Node: STAFF
Attributes: branchNo in <STAFF branchNo="B005">
Sub-elements: STAFFNO, NAME, POSITION
Part 2: XPath - XML Path Language
What is XPath?
XPath is a language used to address and navigate through elements and attributes in an XML
document. It uses a path-like syntax to identify nodes in an XML document.
Key Terminology:
Parent Node: A node that contains another node within it.
Child Node: A node directly under another node.
Ancestor Node: All nodes that exist higher up the tree structure.
Descendant Node: All nodes that exist lower down the tree structure.
Basic XPath Syntax:
/ indicates the root node.
// selects nodes from anywhere in the document.
. refers to the current node.
.. refers to the parent node.
Example XPath Queries:
//STAFF[1]: Selects the first STAFF element.
//STAFF[@branchNo="B003"]: Selects STAFF elements with a branchNo attribute of
B003.
//STAFF[POSITION="Manager"]: Selects STAFF elements where the POSITION is
Manager.
Part 3: XSLT - Extensible Stylesheet Language
Transformations
What is XSLT?
XSLT is a language for transforming XML documents into other formats, such as HTML, PDF,
or other XML documents. It uses templates to match parts of an XML document and apply
specific transformations.
Key Features of XSLT:
Independent of Programming: Transformations are written in a separate XSL
stylesheet.
Reusable Templates: Templates can be reused for different XML documents.
Flexible Output: XSLT can generate various output formats depending on the
transformation rules.
Example XSLT Code:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>Staff List</h2>
<xsl:for-each select="STAFFLIST/STAFF">
<p>
<xsl:value-of select="NAME/FNAME"/>
<xsl:value-of select="NAME/LNAME"/>
</p>
</xsl:for-each>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Output Example:
HTML Page displaying the names of staff members from the XML document.
Part 4: XQuery - Querying XML Documents
What is XQuery?
XQuery is a query language designed for extracting and manipulating data from XML
documents. It is analogous to SQL for databases and is widely used in XML-based databases.
Benefits of XQuery:
Can query both hierarchical and tabular data.
Useful for transforming XML documents into other formats.
Supports building web pages from XML data.
Basic XQuery Syntax:
for $staff in doc("dreamhome_stafflist.xml")/STAFFLIST/STAFF
where $staff/POSITION="Manager"
return $staff/NAME
FLWOR Expressions: FLWOR stands for FOR, LET, WHERE, ORDER, RETURN. It is a
powerful feature of XQuery for querying and transforming XML data.
Example:
for $i in 1 to 3
return <value>{$i}</value>
Output:
<value>1</value>
<value>2</value>
<value>3</value>
Using Predicates:
doc("dreamhome_stafflist.xml")//STAFF[POSITION="Manager"]: Retrieves
STAFF elements with the POSITION "Manager".
doc("dreamhome_stafflist.xml")//STAFF[@branchNo="B005"]//FNAME: Retrieves
first names of staff members in branch B005.
Conclusion:
Understanding XML, XPath, XSLT, and XQuery is crucial for working with semi-structured
data and integrating data across various applications. XML provides a flexible way to represent
data, XPath and XSLT enable navigation and transformation of XML documents, and XQuery
allows for powerful querying and manipulation of XML data.