Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
11 views23 pages

Module 5 Enhanced Data Models

The document discusses enhanced data models, focusing on object-oriented and object-relational databases, which address limitations of traditional relational databases in handling complex data types and relationships. It highlights the advantages and disadvantages of these models, including their ability to represent real-world scenarios and support for complex objects, while also noting challenges such as complexity and maintenance issues. Additionally, it outlines the need for object-oriented databases in managing interrelated information and integrating with object-oriented applications.

Uploaded by

aparnasr64
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views23 pages

Module 5 Enhanced Data Models

The document discusses enhanced data models, focusing on object-oriented and object-relational databases, which address limitations of traditional relational databases in handling complex data types and relationships. It highlights the advantages and disadvantages of these models, including their ability to represent real-world scenarios and support for complex objects, while also noting challenges such as complexity and maintenance issues. Additionally, it outlines the need for object-oriented databases in managing interrelated information and integrating with object-oriented applications.

Uploaded by

aparnasr64
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Enhanced Data Models

Object-oriented databases and object-relational systems do provide features that allow users
to extend their systems by specifying additional abstract data types for each application.
However, it is quite useful to identify certain common features for some of these advanced
applications and to create models that can represent them. Additionally, specialized storage
structures and indexing methods can be implemented to improve the performance of these
common features. Then the features can be implemented as abstract data types or class
libraries and purchased separately from the basic DBMS software package. The term data
blade has been used in Informix and cartridge in Oracle to refer to such optional sub modules
that can be included in a DBMS package. Users can utilize these features directly if they are
suitable for their applications, without having to reinvent, reimplement, and reprogram such
common features. Active databases provide additional functionality for specifying active rules.
These rules can be automatically triggered by events that occur, such as database updates or
certain times being reached, and can initiate certain actions that have been specified in the rule
declaration to occur if certain conditions are met. Many commercial packages include some of
the functionality provided by active databases in the form of triggers. Temporal databases,
which permit the database system to store a history of changes, and allow users to query both
current and past states of the database. Some temporal database models also allow users to
store future expected information, such as planned schedules. It is important to note that many
database applications are temporal, but they are often implemented without having much
temporal support from the DBMS package—that is, the temporal concepts are implemented in
the application programs that access the data-base. Spatial database concepts uses types of
spatial data, different kinds of spatial analyses, operations on spatial data, types of spatial
queries, spatial data indexing, spatial data mining, and applications of spatial databases.
Multimedia databases provide features that allow users to store and query different types of
multimedia information, which includes images (such as pictures and drawings), video
clips (such as movies, newsreels, and home videos), audio clips (such as songs, phone
messages, and speeches), and documents (such as books and articles). A deductive data-base
system includes capabilities to define (deductive) rules, which can deduce or infer additional
information from the facts that are stored in a database. Because part of the theoretical
foundation for some deductive database systems is mathematical logic, such rules are often
referred to as logic databases. Other types of systems, referred to as expert database
systems or knowledge-based systems, also incorporate reasoning and inferencing capabilities;
such systems use techniques that were developed in the field of artificial intelligence, including
semantic networks, frames, production systems, or rules for capturing domain-specific
knowledge.
Object-Oriented Databases
An object-oriented database is a collection of object-oriented programming and relational
database. There are various items which are created using object-oriented programming
languages like C++, Java which can be stored in relational databases, but object-oriented
databases are well-suited for those items.
An object-oriented database is organized around objects rather than actions, and data rather
than logic. For example, a multimedia record in a relational database can be a definable data
object, as opposed to an alphanumeric value.

Limitation of Relational Databases

Relational database technology was not able to handle complex application systems such as
Computer Aided Design (CAD), Computer Aided Manufacturing (CAM), and Computer
Integrated Manufacturing (CIM), Computer Aided Software Engineering (CASE) etc. The
limitation for relational databases is that, they have been designed to represent entities and
relationship in the form of two-dimensional tables. Any complex interrelationship like, multi-
valued attributes or composite attribute may result in the decomposition of a table into several
tables. Similarly, complex interrelationships result in a number of tables being created. The
main asset of relational databases namely, its simplicity for such applications, is also one of its
weaknesses, in the case of complex applications. The data domains in a relational system can
be represented in relational databases as standard data types defined in the SQL. However, the
relational model does not allow extending these data types or creating the user’s own data
types. Thus, limiting the types of data that may be represented using relational databases.
Another major weakness of the RDMS is that, concepts like inheritance/hierarchy need to be
represented with a series of tables with the required referential constraint. Thus they are not
very natural for objects requiring inheritance or hierarchy. However, one must remember that
relational databases have proved to be commercially successful for text based applications and
have lots of standard features including security, reliability and easy access. Many commercial
DBMS products are basically relational but also support object oriented concepts.

In short, the limitations or disadvantages of using the relational database.

1 – Maintenance Problem

The maintenance of the relational database becomes difficult over time due to the increase in
the data. Developers and programmers have to spend a lot of time maintaining the database.

2 – Cost

The relational database system is costly to set up and maintain. The initial cost of the software
alone can be quite pricey for smaller businesses, but it gets worse when you factor in hiring a
professional technician who must also have expertise with that specific kind of program.

3 – Physical Storage

A relational database is comprised of rows and columns, which requires a lot of physical
memory because each operation performed depends on separate storage. The requirements of
physical memory may increase along with the increase of data.

4 – Lack of Scalability

While using the relational database over multiple servers, its structure changes and becomes
difficult to handle, especially when the quantity of the data is large. Due to this, the data is not
scalable on different physical storage servers. Ultimately, its performance is affected i.e. lack of
availability of data and load time etc. As the database becomes larger or more distributed with
a greater number of servers, this will have negative effects like latency and availability issues
affecting overall performance.

5 – Complexity in Structure

Relational databases can only store data in tabular form which makes it difficult to represent
complex relationships between objects. This is an issue because many applications require
more than one table to store all the necessary data required by their application logic.

6 – Decrease in performance over time

The relational database can become slower, not just because of its reliance on multiple tables.
When there is a large number of tables and data in the system, it causes an increase in
complexity. It can lead to slow response times over queries or even complete failure for them
depending on how many people are logged into the server at a given time.

Object Oriented Data model


The real-world problems are closely represented through the object-oriented data model. In this type of
model, both the data and relationship are represented in a single structure called an object. The storage
of audio, video, images, etc. is possible in this database, but it is advised not to store in the relational
database. In this model, the attributes describe the properties of an object.

Objects that share similar characteristics are grouped in classes. Therefore, a class is a collection of
similar objects with attributes and methods. In this model, two or more objects are connected with the
help of links. This link is used to relate objects. It is explained in the below example.

There are two objects in the above example −


 Employee
 Department
Each object data and relationships are contained in a single unit. The attributes are Name,
job_title. Methods are used to perform the operation with the help of attributes. The two
objects are connected through a common attribute department_id and communication
between these two will be done with the help of id.
Advantages
The advantages of the object-oriented model are as follows −
 Semantic content is added.
 Support for complex objects.
 Inheritance promotes data integrity.
 Visual representation includes semantic content.
Disadvantages
The disadvantages of the object-oriented model are as follows:
 It is a complex navigational system.
 Slow development of standards.
 High system overheads.
 Slow transactions.

Comparison between E-R Model and Object Oriented Model

E-R Model
ER model is used to represent real life scenarios as entities. The properties of these entities are
their attributes in the ER diagram and their connections are shown in the form of
relationships. An ER model is generally considered as a top down approach in data designing.
An example of ER model is −

Advantages of E - R model

 The data requirements are easily understandable using an E - R model as it utilises clear
diagrams.
 The E-R model can be easily converted into a relational database.
 The E-R diagram is very easy to understand as it has clearly defined entities and the
relations between them.
Disadvantages of E-R model

 There is no data manipulation language available for an E- R model as it is a largely


abstract concept.
 There are no standard notations for an E - R model. It depends on each individual
designer how they design it.
Object Oriented Model
Object oriented data model is based on using real life scenarios. In this model, the scenarios are
represented as objects. The objects with similar functionalities are grouped together and linked
to different other objects.
An Example of the Object Oriented data model is −

Advantages of Object Oriented Model

 Due to inheritance, the data types can be reused in different objects. This reduces the
cost of maintaining the same data in multiple locations.
 The object oriented model is quite flexible in most cases.
 It is easier to extend the design in Object Oriented Model.
Disadvantages of Object Oriented Model

 It is not practically implemented in database systems as it is mostly a theoretical


approach.
 This model can be quite complicated to create and understand.
The Need for Object Oriented Databases

The objects may be complex, or they may consists of low-level object (for example, a window
object may consists of many simpler objects like menu bars scroll bar etc.). However, to
represent the data of these complex objects through relational database models require many
tables – at least one each for each inherited class and a table for the base class. In order to
ensure that these tables operate correctly it is needed to set up referential integrity constraints
as well. On the other hand, object oriented models represent such a system Object Oriented
Database very naturally through an inheritance hierarchy. Consider an example to design a
class, (let say a Date class), the advantage of object oriented database management for such
situations would be that they allow representation of not only the structure but also the
operation on newer user defined database type such as finding the difference of two dates.
Thus, object oriented database technologies are ideal for implementing such systems that
support complex inherited objects, user defined data types (that require operations in addition
to standard operation including the operations that support polymorphism). Another major
reason for the need of object oriented database system would be the seamless integration of
this database technology with object-oriented applications. Software design is mostly based on
object oriented technologies. Thus, object oriented database may provide a seamless interface
for combining the two technologies. The Object oriented databases are also required to
manage complex, highly interrelated information. They provide solution in the most natural and
easy way that is closer to our understanding of the system. The concept of object oriented
database was introduced in the late 1970s and it became significant only in the early 1980s.
The initial commercial product offerings appeared in the late 1980s. Today, many object
oriented databases products are available like Objectivity/DB (developed by Objectivity, Inc.),
ONTOS DB (developed by ONTOS, Inc.), VERSANT (developed by Versant Object Technology
Corp.), ObjectStore (developed by Object Design, Inc.), GemStone (developed by Servio Corp.)
and ObjectStore PSE Pro (developed by Object Design, Inc.). An object oriented database is
presently being used for various applications in areas such as, e-commerce, engineering
product data management; and special purpose databases in areas such as, securities and
medicine.

OBJECT RELATIONAL DATABASE SYSTEMS

Object Relational Database Systems are the relational database systems that have been
enhanced to include the features of object oriented paradigm.

Complex Data Types

Consider an example a composite attribute − Address. The address of a person in a RDBMS can
be represented as: House-no, apartment, Locality, City, State, Pin code. When using RDBMS,
such information either Object Oriented Database needs to be represented as set attributes or
as just one string separated by a comma or a semicolon. The second approach is very inflexible,
as it would require complex string related operations for extracting information. It also hides
the details of an address, thus, it is not suitable. If we represent the attributes of the address as
separate attributes then the problem would be with respect to writing queries. For example, if
we need to find the address of a person, we need to specify all the attributes that we have
created for the address ie, House-no, Locality…. etc. The following may be one such possible
attempt:

CREATE TYPE Address AS ( House Char(20) Locality Char(20) City Char(12) State Char(15)
Pincode Char(6) ) ;

Thus, Address is now a new type that can be used while showing a database system scheme as:

CREATE TABLE STUDENT ( name Char(25), address Address, phone Char(12) programme
Char(5) dob ??? ) ;
Similarly, complex data types may be extended by including the date of birth field (dob), which
is represented in the discussed scheme as DOB . This complex data type should then, comprise
associated fields such as, day, month and year. This data type should also permit the
recognition of difference between two dates; the day; and the year of birth.

Consider the following queries:

Find the name and address of the students who are enrolled in MCA programme.

SELECT name, address FROM student WHERE programme = ‘MCA’ ;

Note that the attribute ‘address’ although composite, is put only once in the query.

Find the name and address of all the MCA students of Mumbai.

SELECT name, address FROM student WHERE programme = ‘MCA’ AND address.city =
‘Mumbai’; 10 Enhanced Database Models allow us to handle a composite attribute as a single
attribute with a user defined type. The reference to any of the component of this attribute will
be carried out without any problems. So the data definition of attribute components is still
intact. Complex data types also allow us to model a table with multi-valued attributes which
would require a new table in a relational database design.

For example, a library database system would require the representation following information
for a book.

Book table: • ISBN number • Book title • Authors • Published by • Subject areas of the book.

Clearly, in the table above, authors and subject areas are multi-valued attributes. The definition
for them using tables will be as (ISBN number, author) and (ISBN number, subject area) tables.
(Please note that our database is not considering the author position in the list of authors).
Although this database solves the immediate problem, yet it is a complex design. This problem
may be most naturally represented while using the object oriented database system.

Types and Inheritances in SQL

Consider the attribute:

• Name – that includes given name, middle name and surname

• Address – that includes address details, city, state and pincode.

• Date – that includes day, month and year

and also a method for distinguish one data from another.


SQL uses Persistent Stored Module (PSM)/PSM-96 standards for defining functions and
procedures. According to these standards, functions need to be declared both within the
definition of type and in a CREATE METHOD statement.

Thus, the types such as those given above, can be represented as:

CREATE TYPE Name AS ( given-name Char (20), middle-name Char(15), sur-name Char(20) )
FINAL CREATE TYPE Address AS ( add-det Char(20), city Char(20), state Char(20), pincode
Char(6) ) NOT FINAL 11 CREATE TYPE Date AS ( Object Oriented Database dd Number(2), mm
Number(2), yy Number(4) ) FINAL

METHOD difference (present Date) RETURNS INTERVAL days ;

This method can be defined separately as:

CREATE INSTANCE METHOD difference (present Date) RETURNS INTERVAL days FOR Date

BEGIN // Code to calculate difference of the present date to the date stored in the object. //

// The data of the object will be used with a prefix SELF as: SELF.yy, SELF.mm etc. //

// The last statement will be RETURN days that would return the number of days//

END

These types can now be used to represent class as:

CREATE TYPE Student AS ( name Name, address Address, dob Date ) ‘FINAL’ and ‘NOT FINAL’
key words have the same meaning as in JAVA. That is a final class cannot be inherited further.
There also exists the possibility of using constructors.

Type Inheritance

In the present standard of SQL one can define inheritance. Let us explain this with the help of
an example.

Consider a type University-person defined as:

CREATE TYPE University-person AS ( name Name, address Address )

Now, this type can be inherited by the Staff type or the Student type.

For example, the Student type if inherited from the class given above would be:

CREATE TYPE Student UNDER University-person ( programme Char(10), dob Number(7) )


Similarly, you can create a sub-class for the staff of the University as:

CREATE TYPE Staff 12 Enhanced Database Models UNDER University-person (designation


Char(10), basic-salary Number(7) )

Both the inherited types shown above-inherit the name and address attributes from the type
University-person. Methods can also be inherited in a similar way, however, they can be
overridden if the need arises.

Table Inheritance

The concept of table inheritance has evolved to incorporate implementation of generalisation/


specialisation hierarchy of an E-R diagram. SQL allows inheritance of tables. Once a new type is
declared, it could be used in the process of creation of new tables with the usage of keyword
“OF”.

Let us explain this with the help of an example.

Consider the University-person, Staff and Student.

The table for the type University-person as:

CREATE TABLE University-members OF University-person ;

Now the table inheritance would allow us to create sub-tables for such tables as:

CREATE TABLE student-list OF Student UNDER University-members ;

Similarly, we can create table for the University-staff as:

CREATE TABLE staff OF Staff UNDER University-members ;

Please note the following points for table inheritance:

• The type that associated with the sub-table must be the sub-type of the type of the parent
table. This is a major requirement for table inheritance.

• All the attributes of the parent table – (University-members in our case) should be present in
the inherited tables.

• Also, the three tables may be handled separately, however, any record present in the
inherited tables are also implicitly present in the base table. For example, any record inserted in
the student-list table will be implicitly present in university-members tables.
• A query on the parent table (such as university-members) would find the records from the
parent table and all the inherited tables (in our case all the three tables), however, the
attributes of the result table would be the same as the attributes of the parent table.

• One can restrict query to − only the parent table used by using the keyword – ONLY. For
example, SELECT NAME FROM university-member ONLY

OBJECT ORIENTED DATABASE SYSTEMS

Object oriented database systems are the application of object oriented concepts into database
system model to create an object oriented database model

Object Model : The Object Database Management Group (ODMG) has designed the object
model for the object oriented database management system. The Object Definition Language
(ODL) and Object Manipulation Language (OML) are based on this object model. Let us briefly
definethe concepts and terminology related to the object model.

Objects and Literal: These are the basic building elements of the object model. An object has the
following four characteristics:

 A unique identifier
 A name
 A lifetime defining whether it is persistent or not, and
 A structure that may be created using a type constructor. The structure in OODBMS can
be classified as atomic or collection objects (like Set, List, Array, etc.).

A literal does not have an identifier but has a value that may be constant. The structure of a
literal does not change. Literals can be atomic, such that they correspond to basic data types
like int, short, long, float etc. or structured literals (for example, current date, time etc.) or
collection literal defining values for some collection object.

Interface: Interfaces defines the operations that can be inherited by a user-defined object.
Interfaces are non-instantiable. All objects inherit basic operations (like copy object, delete
object) from the interface of Objects. A collection object inherits operations – such as, like an
operation to determine empty collection – from the basiccollection interface

Atomic Objects: An atomic object is an object that is not of a collection type. They are user
defined objects that are specified using class keyword. The properties of an atomic object can
be defined by its attributes and relationships.
Inheritance: The interfaces specify the abstract operations that can be inherited by classes. This
is called behavioural inheritance and is represented using “: “ symbol. Sub-classes can inherit
the state and behaviour of super-class(s) using the keyword EXTENDS.

Extents: An extent of an object that contains all the persistent objects of that class. A class
having an extent can have a key.

Object Definition Language


Object Definition Language (ODL) is a standard language on the same lines as the DDL of
SQL, that is used to represent the structure of an object-oriented database. It uses unique
object identity (OID) for each object such as library item, student, account, fees, inventory
etc. In this language objects are treated as records. Any class in the design process has
three properties that are attribute, relationship and methods. A class in ODL is described
using the following syntax:
class <name>
{
<list of properties>
};
Here, class is a key word, and the properties may be attribute method or relationship. The
attributes defined in ODL specify the features of an object. It could be simple,
enumerated, structure or complex type
class Book
{
attribute string ISBNNO;
attribute string TITLE;
attribute enum CATEGORY{text,reference,journal}
BOOKTYPE;
attribute struct AUTHORS{string fauthor, string sauthor,string tauthor} AUTHORLIST;
};
Please note that, in this case, it is defined authors as a structure, and a new field on book
type as an enum.
These books need to be issued to the students. For that we need to specify a relationship.
The relationship defined in ODL specifies the method of connecting one object to another.
We specify the relationship by using the keyword “relationship”. Thus, to connect a
student object with a book object, we need to specify the relationship in the student class
as:

relationship set <Book> receives

Here, for each object of the class student there is a reference to book object and theset
of references is called receives

But if it require to access the student based on the book then the “inverse relationship”
could be specified as
relationship set <Student> receivedby
it may need to specify the connection between the relationship receives and receivedby
by, using a keyword “inverse” in each declaration. If the relationship is in a different class,
it is referred to by the relationships name followed by a double colon(::) and the name of
the other relationship.

Methods could be specified with the classes along with input/output types. These
declarations are called “signatures”. These method parameters could be in, out or inout.
Here, the first parameter is passed by value whereas the next two parameters are passed
by reference. Exceptions could also be associated with these methods

The ODL could be atomic type or class names. The basic type uses many class
constructors such as set, bag, list, array, dictionary and structure.
Inheritance is implemented in ODL using subclasses with the keyword “extends”.
Multiple inheritance is implemented by using extends separated by a colon (:).
The difference between relation schema and relation instance, ODL uses the class and its
extent (set of existing objects). The objects are declared with the keyword “extent”.
The major considerations while converting ODL designs into relational designs are as follows:

a) It is not essential to declare keys for a class in ODL but in Relational design now
attributes have to be created in order for it to work as a key.

b) Attributes in ODL could be declared as non-atomic whereas, in Relational design, they


have to be converted into atomic attributes.

c) Methods could be part of design in ODL but, they cannot be directly converted into
relational schema although, the SQL supports it, as it is not the property of a relational
schema.

Relationships are defined in inverse pairs for ODL but, in case of relational design,
only one pair is defined.

Object Query Language

Object Query Language (OQL) is a standard query language which takes high-level, declarative
programming of SQL and object-oriented features of OOPs.

Find the list of authors for the book titled “The suitable boy”

SELECT b.AUTHORS FROM Book b WHERE b.TITLE=”The suitable boy”

The more complex query to display the title of the book which has been issued to the
student whose name is Anand, could be
SELECT b.TITLE FROM Book b, Student sWHERE s.NAME =”Anand”

This query is also written in the form of relationship as

SELECT b.TITLE FROM Book b WHERE b.receivedby.NAME =”Anand”

In the previous case, the query creates a bag of strings, but when the keywordDISTINCT is
used, the query returns a set.

SELECT DISTINCT b.TITLE FROM Book b WHERE b.receivedby.NAME =”Anand”

When we add ORDER BY clause it returns a list.

SELECT b.TITLE FROM Book b WHERE b.receivedby.NAME =”Anand”ORDER BY b.CATEGORY

In case of complex output the keyword “Struct” is used. If we want to display the pair of
titles from the same publishers then the proposed query is:

SELECT DISTINCT Struct(book1:b1,book2:b2)FROM Book b1,Book b2 WHERE b1.PUBLISHER


=b2.PUBLISHERAND b1.ISBNNO < b2.ISBNNO

Aggregate operators like SUM, AVG, COUNT, MAX, MIN could be used in OQL. If we want to
calculate the maximum marks obtained by any student then the OQL command is

Max(SELECT s.MARKS FROM Student s)

Group by is used with the set of structures, that are called “immediate collection”.

SELECT cour, publ,AVG(SELECT p.b.PRICE FROM partition p) FROM Book b GROUP BY


cour:b.receivedby.COURSE, publ:b.PUBLISHER;

HAVING is used to eliminate some of the groups created by the GROUP bycommands.
SELECT cour, publ, AVG(SELECT p.b.PRICE FROM partition p) FROM Book b GROUP BY
cour:b.receivedby.COURSE, publ:b.PUBLISHER HAVING AVG(SELECT p.b.PRICE FROM partition
p)>=60.
Union, intersection and difference operators are applied to set or bag type with the
keyword UNION, INTERSECT and EXCEPT. If we want to display the details of suppliers from
PATNA and SURAT then the OQL is

(SELECT DISTINCT su FROM Supplier su WHERE su.SUPPLIER_CITY=”PATNA”)UNION (SELECT


DISTINCT su FROM Supplier su WHERE su.SUPPLIER_CITY=”SURAT”)

The result of the OQL expression could be assigned to host language variables. If,
costlyBooks is a set <book> variable to store the list of books whose price is below Rs.200
then
costlyBooks = SELECT DISTINCT b from Book b where b.price >200.

In order to find a single element of the collection, the keyword “ELEMENT” is used.
If costlySBook is a variable then

costlySBook =ELEMENT (SELECT DISTINCT b FROM Book b WHERE b.PRICE > 200)

OODBMS VERSUS OBJECT RELATIONAL DATABASE

An object oriented database management system is created on the basis of persistent programming
paradigm whereas, a object relational is built by creating object oriented extensions of a relational
system. In fact both the products have clearly defined objectives.

The following table shows the difference among them:

Object Relational DBMS Object Oriented DBMS


The features of these DBMS include: The features of these DBMS include:
 Support for complex data types  Supports complex data types,
 Powerful query languages support  Very high integration of database withthe
through SQL programming language,
 Good protection of data against  Very good performance
programming errors  But not as powerful at querying as
Relational.
One of the major assets here is SQL. Although, It is based on object oriented programming
SQL is not as powerful as a Programming languages, thus, are very strong in programming,
Language, but it is none-the- less essentially a however, any error of a data type made by a
fourth generation language,thus, it provides programmer may effect manyusers.
excellent protection of data
from the Programming errors.
The relational model has a very rich foundation These databases are still evolving in this
for query optimisation, which helps in reducing direction. They have reasonable systems in
the time taken to execute a place.
query.
These databases make the querying as simpleas in The querying is possible but somewhat
relational even, for complex data types difficult to get.
and multimedia data.
Although the strength of these DBMS is SQL,it is Some applications that are primarily run in the
also one of the major weaknesses from the RAM and require a large number of database
performance point of view in memory accesses with high performance mayfind such
applications. DBMS more suitable. This is because of rich
programming interface provided by such DBMS.
However, such applications may not support very
strong query capabilities. A typical example of one
such application is databases required for
CAD.
XML - Databases
XML Database is used to store huge amount of information in the XML format. As the use of
XML is increasing in every field, it is required to have a secured place to store the XML
documents. The data stored in the database can be queried using XQuery, serialized, and
exported into a desired format.

XML Database Types


There are two major types of XML databases −

 XML- enabled
 Native XML (NXD)

XML - Enabled Database


XML enabled database is nothing but the extension provided for the conversion of XML
document. This is a relational database, where data is stored in tables consisting of rows and
columns. The tables contain set of records, which in turn consist of fields.

Native XML Database


Native XML database is based on the container rather than table format. It can store large
amount of XML document and data. Native XML database is queried by the XPath-expressions.
Native XML database has an advantage over the XML-enabled database. It is highly capable to
store, query and maintain the XML document than XML-enabled database.
Difference between Structured, Semi-structured and Unstructured data

Big Data includes huge volume, high velocity, and extensible variety of data. These are 3 types:
Structured data, Semi-structured data, and Unstructured data.
1. Structured data – Structured data is data whose elements are addressable for effective
analysis. It has been organized into a formatted repository that is typically a database. It
concerns all data which can be stored in database SQL in a table with rows and columns.
They have relational keys and can easily be mapped into pre-designed fields. Today, those
data are most processed in the development and simplest way to manage
information. Example: Relational data.
2. Semi-Structured data – Semi-structured data is information that does not reside in a
relational database but that has some organizational properties that make it easier to
analyze. With some processes, you can store them in the relation database (it could be very
hard for some kind of semi-structured data), but Semi-structured exist to ease
space. Example: XML data.
3. Unstructured data – Unstructured data is a data which is not organized in a predefined
manner or does not have a predefined data model, thus it is not a good fit for a mainstream
relational database. So for Unstructured data, there are alternative platforms for storing and
managing, it is increasingly prevalent in IT systems and is used by organizations in a variety
of business intelligence and analytics applications. Example: Word, PDF, Text, Media logs.

Differences between Structured, Semi-structured and Unstructured data:

Properties Structured data Semi-structured data Unstructured data

It is based on It is based on It is based on


Relational database XML/RDF(Resource character and
Technology table Description Framework). binary data

Matured transaction
and various No transaction
Transaction concurrency Transaction is adapted from management and
management techniques DBMS not matured no concurrency

Version Versioning over Versioning over tuples or Versioned as a


management tuples,row,tables graph is possible whole

It is more flexible than


structured data but less It is more flexible
It is schema dependent flexible than unstructured and there is
Flexibility and less flexible data absence of schema

It is very difficult to It’s scaling is simpler than


Scalability scale DB schema structured data It is more scalable.

New technology, not very


Robustness Very robust spread —

Only textual
Query Structured query allow Queries over anonymous queries are
performance complex joining nodes are possible possible

XML - Tree Structure


An XML document is always descriptive. The tree structure is often referred to as XML
Tree and plays an important role to describe any XML document easily.
The tree structure contains root (parent) elements, child elements and so on. By using tree
structure, you can get to know all succeeding branches and sub-branches starting from the
root. The parsing starts at the root, then moves down the first branch to an element, take the
first branch from there, and so on to the leaf nodes.
Following tree structure represents the above XML document –

In the above diagram, there is a root element named as <company>. Inside that, there is one
more element <Employee>. Inside the employee element, there are five branches named
<FirstName>, <LastName>, <ContactNo>, <Email>, and <Address>. Inside the <Address>
element, there are three sub-branches, named <City> <State> and <Zip>
XML - DTDs
The XML Document Type Declaration, commonly known as DTD, is a way to describe XML
language precisely. DTDs check vocabulary and validity of the structure of XML documents
against grammatical rules of appropriate XML language.
An XML DTD can be either specified inside the document, or it can be kept in a separate
document and then liked separately.

Syntax
Basic syntax of a DTD is as follows −
<!DOCTYPE element DTD identifier
[
declaration1
declaration2
........
]>
In the above syntax,
 The DTD starts with <!DOCTYPE delimiter.
 An element tells the parser to parse the document from the specified root element.
 DTD identifier is an identifier for the document type definition, which may be the path
to a file on the system or URL to a file on the internet. If the DTD is pointing to external
path, it is called External Subset.
 The square brackets [ ] enclose an optional list of entity declarations called Internal
Subset.

Internal DTD
A DTD is referred to as an internal DTD if elements are declared within the XML files. To refer it
as internal DTD, standalone attribute in XML declaration must be set to yes. This means, the
declaration works independent of an external source.

Syntax
Following is the syntax of internal DTD −
<!DOCTYPE root-element [element-declarations]>
where root-element is the name of root element and element-declarations is where you
declare the elements.

Rules
 The document type declaration must appear at the start of the document (preceded
only by the XML header) − it is not permitted anywhere else within the document.
 Similar to the DOCTYPE declaration, the element declarations must start with an
exclamation mark.
 The Name in the document type declaration must match the element type of the root
element.

External DTD
In external DTD elements are declared outside the XML file. They are accessed by specifying
the system attributes which may be either the legal .dtd file or a valid URL. To refer it as
external DTD, standalone attribute in the XML declaration must be set as no. This means,
declaration includes information from the external source.

Syntax
Following is the syntax for external DTD −
<!DOCTYPE root-element SYSTEM "file-name">
where file-name is the file with .dtd extension.

Types
One can refer to an external DTD by using either system identifiers or public identifiers.
System Identifiers
A system identifier enables to specify the location of an external file containing DTD
declarations. Syntax is as follows −
<!DOCTYPE name SYSTEM "address.dtd" [...]>
It contains keyword SYSTEM and a URI reference pointing to the location of the document.

Public Identifiers
Public identifiers provide a mechanism to locate DTD resources and is written as follows −
<!DOCTYPE name PUBLIC "-//Beginning XML//DTD Address Example//EN">
As one can see, it begins with keyword PUBLIC, followed by a specialized identifier. Public
identifiers are used to identify an entry in a catalog. Public identifiers can follow any format,
however, a commonly used format is called Formal Public Identifiers, or FPIs
XML – Schemas

XML Schema is commonly known as XML Schema Definition (XSD). It is used to describe and
validate the structure and the content of XML data. XML schema defines the elements,
attributes and data types. Schema element supports Namespaces. It is similar to a database
schema that describes the data in a database.
The basic idea behind XML Schemas is that they describe the legitimate format that an XML
document can take.

Elements
Elements are the building blocks of XML document. An element can be defined within an XSD
as follows −
<xs:element name = "x" type = "y"/>

Definition Types
One can define XML schema elements in the following ways −

Simple Type
Simple type element is used only in the context of the text. Some of the predefined simple
types are: xs:integer, xs:boolean, xs:string, xs:date. For example −
<xs:element name = "phone_number" type = "xs:int" />

Complex Type
A complex type is a container for other element definitions. This allows to specify which child
elements an element can contain and to provide some structure within XML documents.
Global Types
With the global type, one can define a single type in the document, which can be used by all
other references. For example, suppose one want to generalize the person and company for
different addresses of the company.
XML - Document
An XML document is a basic unit of XML information composed of elements and other markup
in an orderly package. An XML document can contains wide variety of data. For example,
database of numbers, numbers representing molecular structure or a mathematical equation.
The following image depicts the parts of XML document.

Document Prolog Section


Document Prolog comes at the top of the document, before the root element. This section
contains −

 XML declaration
 Document type declaration
XML declaration: contains details that prepare an XML processor to parse the XML document.
It is optional, but when used, it must appear in the first line of the XML document.

Syntax
Following syntax shows XML declaration −

 <?xml
 version = "version_number"
 encoding = "encoding_declaration"
 standalone = "standalone_status"
 ?>
Each parameter consists of a parameter name, an equals sign (=), and parameter value
inside a quote.
Document Elements Section
Document Elements are the building blocks of XML. These divide the document into a
hierarchy of sections, each serving a specific purpose. You can separate a document into
multiple sections so that they can be rendered differently, or used by a search engine. The
elements can be containers, with a combination of text and other elements.
XML - Databases
XML Database is used to store huge amount of information in the XML format. As the use of
XML is increasing in every field, it is required to have a secured place to store the XML
documents. The data stored in the database can be queried using XQuery, serialized, and
exported into a desired format.

XML Database Types


There are two major types of XML databases −

 XML- enabled
 Native XML (NXD)

XML - Enabled Database


XML enabled database is nothing but the extension provided for the conversion of XML
document. This is a relational database, where data is stored in tables consisting of rows and
columns. The tables contain set of records, which in turn consist of fields.

Native XML Database


Native XML database is based on the container rather than table format. It can store large
amount of XML document and data. Native XML database is queried by the XPath-expressions.
Native XML database has an advantage over the XML-enabled database. It is highly capable to
store, query and maintain the XML document than XML-enabled database.
Querying and Transforming XML Data
1. Translation of information from one XML schema to another

2. Querying on XML data.

3. Above two are closely related, and handled by the same tools.

4. Standard XML querying/translation languages

XPath : Simple language consisting of path expressions.


XSLT : Simple language designed for translation from XML to XML and XML to HTML

XQuery: An XML query language with a rich set of features

XML Querying & Transformation

XPath is used to address (select) parts of documents using path expressions A path expression
is a sequence of steps separated by “/”. Result of path expression may be set of values that
along with their containing elements/attributes match the specified path. The initial “/”
denotes root of the document. Path expressions are evaluated left to right. Each step operates
on the set of instances produced by the previous step. Selection predicates may follow any step
in a path, in [ ]. Attributes are accessed using “@”

XML data can be stored in – 1. on-relational data stores 2. Flat files -Natural for storing XML -
But has all problems (no concurrency, no recovery, …)

XML database -Database built specifically for storing XML data, supporting DOM model and
declarative querying. Currently no commercial-grade systems

Relational databases -Data must be translated into relational form –

Advantage: mature database systems Disadvantages: overhead of translating data and queries

Storage of XML Data in relational databases

1. String Representation 2. Tree Representation 3. Map to relations

String Representation Store each top level element as a string field of a tuple in a relational
database. Use a single relation to store all elements, or Use a separate relation for each top-
level element type -E.g. account, customer, depositor relations -Each with a string-valued
attribute to store the element

Indexing: -Store values of subelements/attributes to be indexed as extra fields of the relation,


and build indices on these fields E.g. customer_name or account_number. Some database
systems support function indices, which use the result of a function as the key value. The
function should return the value of the required subelement/attribute String Representation.
Store each top level element as a string field of a tuple in a relational database. Use a single
relation to store all elements, or Use a separate relation for each top-level element type. E.g.
account, customer, depositor relations -Each with a string-valued attribute to store the
element.

************************************

You might also like