Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
38 views29 pages

Unit 11

The document outlines the importance of managing data resources within organizations, focusing on the need for effective data management, challenges faced, and database concepts. It includes a case study on India's Unique Identification Number (UID) scheme, highlighting its role in unifying identification processes and improving access to services. The document also discusses the historical context of data use and the evolution of database systems, emphasizing the significance of data independence, reduced redundancy, and consistency.

Uploaded by

Akanksha singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views29 pages

Unit 11

The document outlines the importance of managing data resources within organizations, focusing on the need for effective data management, challenges faced, and database concepts. It includes a case study on India's Unique Identification Number (UID) scheme, highlighting its role in unifying identification processes and improving access to services. The document also discusses the historical context of data use and the evolution of database systems, emphasizing the significance of data independence, reduced redundancy, and consistency.

Uploaded by

Akanksha singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

DMBA210 : Management Information System

MASTERS OF BUSINESS ADMINISTRATION


SEMESTER 2

DMBA210
MANAGEMENT INFORMATION SYSTEM1
Unit: 11 - Managing Data Resources
DMBA210 : Management Information System

Unit – 11
Managing Data Resources

Unit: 11 - Managing Data Resources 2


DMBA210 : Management Information System

TABLE OF CONTENTS

SL Fig No / Table SAQ /


Topic Page No
No / Graph Activity

1 Introduction - -
5
1.1 Objectives - -

2 Case Study - - 6-8

3 The Need For Data Management - -


9
3.1 History of data use - 1

4 Challenges Of Data Management - -

4.1 Data independence - -

4.2 Reduced data redundancy - -

4.3 Data consistency - -

4.4 Data access - -


10-13
4.5 Data administration - -

4.6 Managing concurrency - -

4.7 Managing security - -

4.8 Recovery from crashes - -

4.9 Application development - 2

5 Database Concepts - -

5.1 Fields, Records, And Files I - 14-19


5.2 Basic architecture 1, 2, 3, 4 3

6 Data Warehouses - -
20-24
6.1 Data mining uses 5 4

7 Summary - - 25

8 Glossary - - 26

9 Terminal Questions - - 27

Unit: 11 - Managing Data Resources 3


DMBA210 : Management Information System

10 Answers - - 28

11 References - - 29

Unit: 11 - Managing Data Resources 4


DMBA210 : Management Information System

1. INTRODUCTION
Data as a resource of any organisation has gained critical importance in recent years. It has to be
managed carefully with the objective of ensuring access, reliability, and security. Databases consist of
simple structures such as fields, records, and tables. When assembled within an architecture, these
simple structures provide an immensely useful yet manageable resource for organisa- tions. There
are many types of database designs, the most popular being that of tables related to each other, called
relational database. Commercial implemen- tations of such databases are called database
management systems (DBMS) that have many features to easily create, update, query, and manage
data. All such systems rely on SQL, a computer language that allows easy definition and manipulation
of tables and relations.

When organisations accumulate large masses of data, the focus shifts from simply using the data for
transactions to that of using the data for help in deci- sion making. Data is separated out into special
tables called warehouses that are then used for analysis.

1.1 Objectives
After studying this unit, you should be able to:
• describe the need for data management

• discuss the challenges of data management

• explain database concepts

• discuss data warehouses

Unit: 11 - Managing Data Resources 5


DMBA210 : Management Information System

2. CASE STUDY

Unique Identification Number in India

For citizens of India, mobility from one state to another is a problem. If one moves, say, from Uttar
Pradesh to Karnataka, then in the new state of res- idence, one will have to open a new bank account,
obtain a new permit for cooking gas cylinders, get a new electricity connection, re-register an old
vehicle in the new state, and, if needed, get a new ration card. This is because these documents cannot
be transferred easily from Uttar Pradesh to Karnataka, as there are no legal provisions to do so. As
these documents require considerable time and effort while getting them for the first time, and
applying and waiting to get them a second time is a huge waste of effort.

It is partially to address this problem of transfer of documents that the Gov- ernment of India initiated
the Unique Identification Number (UID) scheme. Under this scheme, every citizen of India will be
provided a unique number that will be backed by an infrastructure to verify and authenticate the
number. A special organisation, called Unique Identification Authority of India (UIDAI), was created
in 2009 for this purpose, and was charged with issuing the UIDs to citizens. The UIDAI will eventually
provide a unique 12-digit number to all citizens of India with the assurance that the number is unique
(each number is associated with a unique citizen), is verifiable, and is valid across India.

Many citizens of India already have several documents that provide them with unique numbers:

1. Most have a Voter Registration Card that has a unique ID.


2. Many citizens have an official driver’s licence issued by their state, which has a unique number.
3. Many citizens also have a passport, issued by the Government of India, which has a unique
number.
4. Many citizens have a card for income tax payment (called the PAN card) that uses a unique
number.
5. Many citizens also have a ration card that also uses a unique number.

However, not all citizens have a need for or use all these cards. For instance, the number of income
tax payers in India is a small fraction of the population (as agricultural income is not taxed and a bulk
of India’s population relies on agriculture). Furthermore, most citizens do not have a passport, as they
don’t need to travel across borders, and many do not have a driver’s licence either, as they do not own
a motorised vehicle. The ration card is meant for people below the poverty line, but can be issued to

Unit: 11 - Managing Data Resources 6


DMBA210 : Management Information System

any citizen of India. Thus, most or all of these cards that provide a unique number are not available
or are of little real use to all citizens of India. It is in this context that the UID becomes important.

An UID number can provide a basis for uniting these disparate identifica- tion projects under a
common umbrella. Thus, a citizen who has a PAN card and also a driver’s licence can be seen, through
the unique number, to be the same person. This will reduce redundancy in the issuing of unique
numbers as well as control fraud and misuse of the numbers.

As envisaged, the UIDAI will issue a unique number to citizens based on biometric authentication.
This number is called Aadhaar. With a scan of ten fingerprints and the iris, each citizen will receive a
unique 12-digit number. Aadhaar can be used by banks, or the tax authorities, or schools, or the ration
card agencies to issue cards or validation documents to citizens. The idea is that if a citizen presents
a bank card to a merchant for some commercial transaction, the merchant can verify that the card
belongs to the particular individual by checking against the UIDAI database. This veri- fication service
will be made available at a reasonable cost by the UIDAI.

Aadhaar, in this sense, becomes what in database terminology is called a primary key, a unique
identifier for a record that can be used across the data- base and across applications without worry
of duplication. Various agencies like banks, the tax authority, the passport agency, the motor vehicles
depart- ment, and the public distribution system can then use Aadhaar to issue their own verification
and authentication documents. A citizen can move from one part of the country to another, and with
Aadhaar he/she can retrieve or use his/her card anywhere and be assured that his/her identity is
authenticated.

The unique number is used for many transactions other than social secu- rity, including those of credit
card purchases, property purchases and col- lege enrolment.

The Aadhaar scheme has come in for a fair measure of criticism. For a coun- try as diverse and
complex as India, critics argues, such a scheme is not suitable. Some argue that the Aadhaar scheme
will link many vital sources of information about individuals under a common source and thus
compro- mise individual privacy. Those with dubious intentions can snoop into online and
computerised records of individuals and have access to a vast store of information, something that is
not possible without a primary key. Oth- ers contend that, in the case of poor and marginal citizens,
obtaining and maintaining such a number will become an additional burden, and instead of helping
them, it will further impede their ability to make a living and function effectively. Still, others argue
that Aadhaar will become another tool in the hands of a corrupt and power-hungry bureaucracy,

Unit: 11 - Managing Data Resources 7


DMBA210 : Management Information System

which will extract further rents from those unable to understand the value of this scheme and how it
can be used effectively.

Aadhaar in India is similar to a unique number given to citizens in other coun- tries. In the USA, all
citizens are required to have a Social Security Number (SSN) that was originally designed to provide
them with social security – such as pension, medical care, job-loss compensation, and so on – but it is
now used for many different purposes such as for opening a bank account, obtaining a driver’s licence,
getting a credit card, being registered for medical insurance, enrolling in school or college and so on.
Such schemes for pro- viding social security, along with a unique number, are prevalent in European
countries too, such as Spain and France. In all these countries, the unique number is used for many
transactions other than social security.

The role envisaged for Aadhaar is best captured by the Chairman of the UIDAI, Mr. Nandan Nilekani,
‘The name Aadhaar communicates the funda- mental role of the number issued by the UIDAI the
number as a universal identity infrastructure, a foundation over which public and private agencies
can build services and applications that benefit residents across India’.

1. Aadhaar’s guarantee of uniqueness and centralised, online identity verification would be the
basis for building these multiple services and applications, and facilitating greater connectivity
to markets.
2. Aadhaar would also give any resident the ability to access these ser- vices and resources,
anytime, anywhere in the country.
3. Aadhaar can, for example, provide the identity infrastructure for ensuring financial inclusion
across the country – banks can link the unique number to a bank account for every resident,
and use the online identity authentication to allow residents to access the account from any-
where in the country.
4. Aadhaar would also be a foundation for the effective enforcement of individual rights. A clear
registration and recognition of the individual’s identity with the state is necessary to
implement their rights – to employment, education, food, etc. The number, by ensuring such
registration and recognition of individuals, would help the state deliver these rights. Source:
uidai.gov.in (accessed on June 2011).

Unit: 11 - Managing Data Resources 8


DMBA210 : Management Information System

3. THE NEED FOR DATA MANAGEMENT

3.1 History of data use


In the early years of computing, when programs were written on large main- frame computers, the
practice was to include the data required for a computa- tion within the program. For example, if a
program computed the income tax deductions for personnel in an organisation, the data regarding
the personnel and their income was maintained within the program itself. If changes were required,
say, when a new employee joined the organisation, the entire program for income tax calculations
would have to be modified, not just the data alone. Changes to data were difficult as the entire
program had to be changed, and further, the data was not available to other programs.

With advances in programming languages, this situation changed and data was maintained in
separate files that different programs could use. This improved the ability to share data, but it
introduced problems of data updating and integ- rity. If one program changed the data, other
programs had to be informed of this development and their logic had to be altered accordingly.

A start in organising data came with the idea of the relational data storage model, put forward by
British scientist E.F. Codd in 1970. Codd, then working with IBM in the USA, showed how data could
be stored in structured form in files that were linked to each other, and could be used by many
programs with simple rules of modification. This idea was taken up by commercial database software,
like Oracle, and became the standard for data storage and use.

SELF-ASSESSMENT QUESTIONS – 1
Multiple Choice Questions:
1 Before database systems were invented, the problem with storing data was that
a) data could not be found in the program
b) data and programs were not easily located on the computer
c) data was easily lost or corrupted
d) data and programs were together, and changing data required changing the program
True or False Questions:
2 E.F. Codd invented the idea of holding data in separate, linked files. (True/False)

Unit: 11 - Managing Data Resources 9


DMBA210 : Management Information System

4. CHALLENGES OF DATA MANAGEMENT


Consider the following facts:

1. Researchers estimate that the total amount of data stored in the world is of the order of 295
exabytes or 295 billion gigabytes. This estimate is based on an assessment of analogue and
digital storage technologies from 1986 to 2007. The report (by M. Hilbert and P. Lopez
appeared in Science Express in February 2011) states that paper-based storage of data,
which was about 33% in 1986, had shrunk to only about 0.07% in 2007, as now most of the
data is stored in digital form. Data is mostly stored on computer hard discs or on optical
storage devices.
2. The consulting firm IDC estimated (in 2008) that the annual growth in data takes place in
two forms:
(a) Structured: Here data is created and maintained in databases and fol- lows a certain data
model. The growth in structured data is about 22% annually (compounded).
(b) Unstructured: Here data remains in an informal manner. The growth in unstructured
data is about 62% annually.
3. The large online auction firm eBay has a data warehouse of more than 6 petabytes (6 million
gigabytes), and adds about 150 billion rows per day to its database tables.

The above examples highlight the incredible amounts of data that are being cre- ated and stored
around the world. Managing this data so that it could be used effectively presents a strong
challenge to database systems: The systems not only have to store the data but also have to make
it available almost instantly whenever needed, allow users to search through the data efficiently,
and also ensure that the data is safe and uncorrupted. Different aspects of the need for database
systems are discussed in the following sections.

4.1 Data independence


Databases allow data pertaining to an activity or a domain to be maintained independently. This
independence means that the data is stored in separate files in a structured manner, and the
creation and updating of the data is done independent of its uses. For instance, in a college, a
database of students is updated when a student joins or leaves the college, changes address,
changes phone number, and so on. This is independent of how the data is used by programs for

Unit: 11 - Managing Data Resources 10


DMBA210 : Management Information System

course registration or for the library. Furthermore, the programs and applications that use the data
are not aware of where and how the data is maintained; they only need to know how to make
simple calls to access the data.

4.2 Reduced data redundancy


One goal of databases is to reduce data redundancy. Data redundancy refers to the duplication of
data in different tables. If data on students is maintained in two or three different databases in the
college then for one change, say in a stu- dent’s mobile phone number, all the databases have to be
changed. Reduced data redundancy ensures that minimal storage is used for the data. With the
rapid increase in data over time, conserving space is an important management challenge.

4.3 Data consistency


It is important that data users have access to consistent data, that is, the data is the same regardless
of the application through which the user accesses it. Consistency implies that the integrity of the
data is maintained (the data has not been changed or corrupted in a manner unknown to the
system); the data is valid, which means that the data is the correct one to use for the moment; and
the data is accurate, which means that the data being used is the one that was designed to be used.
Consistency requires careful management of data updat- ing, deletion, copying, and security.

4.4 Data access


Data stored in databases must be accessible efficiently. Very large databases, such as those
maintained by eBay, have to be managed in a way that when users search within them, their results
should be available within a matter of seconds. A search in eBay results in a response within a few
seconds, even though the system has to search through billions of records. Furthermore, the
response from the database has to be presented to the user in a manner that is easy to read and
understand, which requires further processing.

4.5 Data administration


Data administration entails deciding who can create, read, update, or delete data. Many
organisations have strict controls over who can create or delete data fields or tables. This is
determined by the needs of the organisation and the roles defined for database administrators and
users. Read access is usu- ally provided to those who need to only see and use the data, but not
modify or change it in any way. Update access is also carefully restricted to those who have the

Unit: 11 - Managing Data Resources 11


DMBA210 : Management Information System

rights and privileges to do so. Modern database systems enable sophisticated ways in which these
four functions can be enabled or disabled for users and administrators.

4.6 Managing concurrency


A serious challenge for modern databases, especially those used for e-com- merce applications, is
that of managing concurrency. Data is often main- tained on many servers, distributed across a
wide geography. Concurrency entails ensuring that changes or updates to a particular element in
a table are reflected across all the distributed servers where users access the data. This is an
element of managing consistency, particularly for distributed databases.

4.7 Managing security


A substantial threat to modern databases is from crackers and unauthorised users. Database
systems have to provide a layer of security, over and above the security systems in place at the
organisation, which ensures protection across transactions and all administration tasks. This also
means that internal tamper- ing and mischief with data is carefully monitored and controlled.

4.8 Recovery from crashes


Databases are crucial to the internal working of an organisation – they are both a resource and an
asset. With the high levels of transactions happening within the IS of organisations, it is imperative
that the data is secured against failure. Modern database systems provide a sophisticated system
of backup, mirroring and recovery that allows rapid recovery from crashed servers.

4.9 Application development


Databases enable applications to be developed using the facilities of data man- agement provided
by them. E-commerce sites, for example, create a web pres- ence that includes search, display,
selection, sale, and payment for products, which rely on databases that provide all the relevant
data, and store data, for the transactions. Applications may be local to a function or department or
shared across many departments, and they may share data from the databases. Data- base systems
provide special languages by which their data can be manipu- lated, and hence can be used by
application developers.

Unit: 11 - Managing Data Resources 12


DMBA210 : Management Information System

SELF-ASSESSMENT QUESTIONS – 2
Multiple Choice Questions:
3 Serious challenges for modern databases include
a) managing concurrency, security and recovery from crashes
b) size
c) distributed access
d) support for heterogeneous databases

Unit: 11 - Managing Data Resources 13


DMBA210 : Management Information System

5. DATABASE CONCEPTS
A database is a collection of files that have stored data. The files and the data within them are related
in some manner – either they are from the same domain, the same function, the same firm, or some
other category. The files in the data- base are created according to the needs of the function or
department and are maintained with data that is relevant for the applications the department runs.

An example of a database in an organisation is an ‘Employee’ database. This will correspond to the


human resources function of the organisation. The ‘Employee’ database may contain files related to
employee details, their employment his- tory, their benefits details, their leave history, their family
details, their medical compensation details, and so on. The files are related to the employee concept,
although they contain different data, depending on the applications that will need the data.
Computations regarding medical reimbursements, for instance, will read data from the files related
to the employee’s status, benefits, and medi- cal history.

Consider another example of a ‘Product’ database. This may contain files related to the types of
products, the details about product features, the prices and history of prices of products, the regional
and seasonal sales figures of products, and the details of product designs. Such files could be used by
the manufacturing department to determine production schedules, by the market- ing department to
determine sales campaigns, or by the finance department to determine overhead allocations.

5.1 Fields, records, and files


A file in a database consists of particular spaces, called structures, in which data is maintained. The
basic structure of a file is called a field. A field is a defined space, of specific dimensions, in which data
is placed. Data is read from and can be deleted from fields. When defining or designing a field, the
contents of the field have to be specified exactly. For instance, if the field has to hold the date of birth
of an employee, it has to be specified how the data will be stored (in dd-mm-yyyy format or mm-dd-
yy format) and what kind of characters (numbers, in this case) will be permitted. There are several
other dimensions that specify a field and are held in a metadata file. (A metadata file contains details
about how data is stored in files, and provides information on how the data can be used and managed.)

A collection of fields is called a record. Each record is like a row in a spread- sheet; it consists of a pre-
defined number of fields. In most cases, the sizes of the fields are fixed; this ensures that the total size

Unit: 11 - Managing Data Resources 14


DMBA210 : Management Information System

of a record is also fixed. Records are a collection of fields that have meaning in the context of the
application.

Table 11.1 shows five fields that define a record. Each field contains data pertaining to some aspect
of a student – roll number (Aadhaar number in this case), first name, last name, year of birth, and the
subject in which the student is majoring. The data in each field is written in characters or numbers.
For each record, there should be at least one field that uniquely identifies the record. This ensures
that even if there are two students with exactly the same name (say Aamir Khan), with the same year
of birth, and the same major, then there is at least one field that will distinguish the records of the two
students. In Table 11.1, Aadhaar number is the unique identifier. In other cases this could be an
employee number, a tax number, or even a random number generated by the system. This unique
field is called a primary key.

Table 1: Two records of a student file

Aadhaar Number First Name Last Name Year of Birth Major


234577643239 Aamir Khan 1968 Physics
– – – – –

A table is contained in a file. Each table may contain a few records or a very large number of records.
A database consists of many files. Modern database systems allow table sizes to include billions of
records. Furthermore, very large tables may be split and stored on different servers. Figure 11.1
shows the basic elements of a database.

In relational databases, the tables are related to each other. These relations allow data to be linked
according to some logic and then extracted from the tables. A detailed example of this is provided in
a later section.

5.2 Basic architecture


Databases may be organised and used in many different ways. The most basic use is as a personal
database. Individual users create databases for their per- sonal use in organisations or at home. A
personal database may be on a per- sonal computer at office, on a mobile phone, or on a tablet
computer. The data in these databases is fed and updated by the user, and is principally used by him/
her. For instance, a contacts database on a mobile phone is a personal data- base. All the data is
entered and used by the mobile phone user. The design of the database is not created by the user

Unit: 11 - Managing Data Resources 15


DMBA210 : Management Information System

(such databases are often provided as off-the-shelf applications), but the use and maintenance is only
by the user.

Personal databases are highly tuned to the needs of the user. They are not meant to be shared. These
databases also cannot be shared, as they reside on personal devices; and this is a limitation of these
systems.

Workgroup databases or function databases are designed to be shared across employees in an


organisation, either belonging to a group or to a functional department. Such a database is maintained
on a central computer, along with applications relevant for the group or department. Users access
and update the data on the central database from their local computers.

Figure 1: The basic elements of a database – fields, records, and files

Figure 2: Client–server architecture of a database

Unit: 11 - Managing Data Resources 16


DMBA210 : Management Information System

Enterprise or organisational databases are accessed by all members of the organ- isation. Figure 11.2
shows how these are typically organised in the client–server mode. A central database server
provides database capabilities to different appli- cations that reside on other computers. These client
applications interact with the database server to draw on data services, whereas the database server
is man- aged independently. An advantage of these database servers is that they can be made highly
secure, with strong access restrictions, and can also be backed up carefully to recover from crashes.

Figure 3: Three-tier database architecture

While designing client–server databases, a prime issue to be addressed is – Where the processing
should take place? If data processing has to be done on the client from, say, three tables then these
tables have to be moved across the network to the client, which should have enough computing
capacity to do the processing. If, on the other hand, the computing is done on the server then the
clients have to send processing requests to the server and await the results, and this puts a lot of load
on the server. Clients such as mobile phones or personal computers often do not have the processing
capacity to deal with large amounts of data, so the processing is invariably left to the server.

The architecture often used in enterprises is referred to as three-tier architec- ture. Here the clients
interact with application servers, which then call upon database servers for their data needs. Here the
load of processing for applica- tions and for data is spread across two sets of servers, thus enabling
greater efficiency. Figure 11.3 depicts this architecture.

Databases may be centralised or decentralised within organisations. Central- ised databases are
designed on the client–server model, with a two-tier or three-tier architecture. Decentralised or
distributed databases have tables dis- tributed across many servers on a network. The servers may
be geographically distributed, but for the applications they appear as a single entity. One type of
distributed server has the entire database replicated across many servers. This is called a

Unit: 11 - Managing Data Resources 17


DMBA210 : Management Information System

homogeneous database. Figure 11.4 shows this database. Those users who are close to a particular
server are able to access data from that particular one, whereas others access data from other,
physically closer serv- ers. When data is changed on any one server, it is also changed on the others.

Figure 4: Distributed databases with homogenous and heterogeneous architectures

Distributed databases can also be federated in nature. It means the databases across the network are
not the same; they are heterogeneous. In such archi- tecture, when application servers draw on the
databases, special algorithms pull together the required data from diverse servers and present a
consoli- dated picture. This architecture is useful where the data entry and manage- ment of servers
is particular to a region. For example, multinational banks use federated databases as their databases
in different countries operate on different currencies and exchange criteria, and rely on local data.
For applica- tions requiring global data, the applications use special logic for analysing the disparate
data.

A special class of software is used to connect disparate databases and these are known as middleware.
As databases can have different data structures for the same kind of data, the middleware software
allows the databases to read and write data to and from each other. For example, the data field for
‘student name’ may have a space for 30 characters in one database and 40 characters in another. The
fact that they are referring to the same concept is captured by the middleware that enables the
translation from one to the other. The middle- ware is also used by the Application Layer to read and
use data from many databases. In modern web-centric applications, the middleware plays a major
role in allowing the use of distributed databases by application servers.

Unit: 11 - Managing Data Resources 18


DMBA210 : Management Information System

SELF-ASSESSMENT QUESTIONS – 3
Fill in the blanks:
4 A __________________ database is one whose tables are maintained on various different servers.
Multiple Choice Questions:
5 _________________ connects distributed databases to different client devices.
a) Firmware
b) Software
c) Dataware
d) Middleware
6 Federated databases imply databases across a network that are
a) the same
b) heterogeneous
c) centralised
d) highly secure
7 Personal databases are databases
a) created by individual users for their personal use in organisations or at home
b) contain the personal data of every employee of an organisation
c) are highly tuned to the needs of the user and therefore not meant to be shared
d) both (a) and (b)

Unit: 11 - Managing Data Resources 19


DMBA210 : Management Information System

6. DATA WAREHOUSES
Since the inception of desktop computing, in the mid-1980s, around the world, there has been a
proliferation of data use and needs for data storage. Almost all employees of organisations, above a
certain size, now use computers and produce, modify or read data. For very large organisations, the
amount of data that is used on a day-to-day basis could be as high as in petabytes. With this huge
explosion in data, organisations felt the need for:

1. Consolidating much of the data from various databases into a whole that could be understood
clearly.
2. Focusing on the use of data for decision making, as opposed to simply for running transactions.

The need for creating data warehouses arose from the above two needs. The technology of data
warehouses draws on enterprise databases to create a separate set of tables and relations that can be
used to run particular kinds of queries and analytical tools. Warehouses are different from transaction
databases, as users can run complex queries on them, which are related to the functions of the
enterprise that need not affect the transaction processing.

To create a data warehouse, data is extracted from transactional tables and pre-processed to remove
unwanted data types and then loaded into tables in the warehouse. The extraction process requires
making queries into transactional databases that are currently being used. This is a challenge as the
data tables may be distributed across various servers, and the data may be changing rapidly. The data
obtained from these tables is maintained in a staging area, a temporary storage area, where the data
is scrubbed. The idea of data scrubbing is to remove clearly recognisable erroneous data. This task is
often difficult, as errors are not obvious – say a misspelt name or a wrong address – and require
careful examination to remove them. At the scrubbing stage, data is not corrected in any manner; it is
invariably removed from the collection of raw data.

Once the data is scrubbed or cleaned, it is loaded onto the tables that constitute the warehouse. When
an organisation is creating a warehouse for the first time, the entire data is loaded into a database,
using a particular design. Subsequent data that is obtained from the transaction databases is then
extracted, cleaned, and loaded incrementally to the earlier tables.

Unit: 11 - Managing Data Resources 20


DMBA210 : Management Information System

Data pertaining to a particular domain or a problem to be analysed is maintained in data marts. For
example, a mart may be created to examine sales data alone. This mart will collect data related to the
sales activities across the organisation and store them in the warehouse. However, it will exclude the
data related to production, finance, employees, and so on. The mart can then be analysed for
particular problems related to the sales trends, sales predictions and so on. Furthermore, the mart
may be updated on a periodic basis to include the fresh data available.

Data in warehouses can be stored in tables with timestamps. This is the dimen- sional method of
creating warehouses. The idea here is to store data in a single or a few, unrelated tables that are given
one additional attribute of a timestamp (that indicates when the data was collected or created). For
example, one table in a dimensional warehouse may include data on customers, sales, products,
orders, shipping, and a timestamp of each transaction. Each timestamp will pertain to one particular
event in the life of the organisation when a transaction occurred and the data was created. Such a
table can be analysed to examine trends in sales, fluctuations in orders across seasons, and so on.
Another method of storing data is in the regular tables-and-relations format of relational databases.
Here too an additional attribute of a timestamp is included within the tables.

Various kinds of analysis can be conducted on data available in warehouses including data mining,
online analytical processing, and data visualisation. These different methods are designed to extract
patterns and useful information from very large data sets. Online analytical process (OLAP) is used to
analyse and report data on sales trends, forecasts, and other time-series-based analy- ses. Such
analyses allow managers to see and visualise the data in a form that shows interesting and unusual
patterns that would not be easily visible from the analysis of transaction data alone.

In modern organisations, ones that have a strong online presence and collect data from customer
visits to websites and transaction data from different types of e-commerce sites, the extent and size
of the data is such that analysing it for patterns is almost impossible, unless a warehouse is used. For
example, one firm analyses data, using OLAP, from millions of visitors to different pages of its website
to dynamically place advertisements that would conform to the visitors’ interests, as determined by
the regions on the page the visitor hovers over or clicks on.

Data warehouses are an active area of development and have strong com- mercial potential for
database vendors. Almost all major commercial vendors of DBMS have products that can be used to
create and manage warehouses.

Unit: 11 - Managing Data Resources 21


DMBA210 : Management Information System

6.1 Data mining uses


Data mining means extracting patterns and knowledge from historical data that is typically housed in
data warehouses. Data mining is a loose term that means many things at the same time – data storing,
data analysis, use of artificial intelligence techniques, statistical pattern recognition and others. The
original idea of data mining came from the field known as ‘knowledge discovery in databases’ (KDD).
KDD is a sub-field of artificial intelligence and is concerned with finding useful, human-
understandable knowledge from data. Several other terms are now used to describe the same ideas –
business intelligence, data analytics, and web analytics.

Data mining is used with data accumulated in data warehouses. Following are some examples of data
stored in warehouses that are used for mining:

1. Click-stream data: This data is collected from website pages as users click on links or other
items on the web page. Data on where a user clicks, after what interval, what page the users
goes to, does the user return and visit other links, etc., are collected. The data are mined to
identify which links are most frequently visited, for how long and by what kind of users. The
online search firm, Google, has initiated an entire field of mining click- stream data that is
known as web analytics.
2. Point-of-sale purchase data: Data obtained from retail sales counters is the classic data set
to which mining software was applied. The data per- tains to the item quantities, price values,
date and time of purchase, and details about customers that are obtained from point-of-sale
terminals. The data is used to perform ‘market basket’ analysis, which essentially shows what
kinds of items are typically purchased with each other. In a famous case, a large retailer found
from a market basket analysis that men in a certain part of the USA were likely to buy beer and
diapers on Thursday evenings. This was an unusual finding and the retailer sought to explain
why. It was later learned that many families with young children planned their weekend trips
on Thursday evening, at which point the women would ask men to go and buy diapers from
the store. The men would take this opportunity to buy beer, also for the weekend. The retailer
used this information to announce promotions and increase sales of both these products.
3. Online search data: This data is about search that users type in search boxes on web pages.
Many organisations collect the text typed in by users while they are searching for some
information. This text data reveals what users are interested in and is mined for patterns. The
data collected pertains to the search texts typed in, the time at which they are typed and the

Unit: 11 - Managing Data Resources 22


DMBA210 : Management Information System

number of times different searches are done. Many online retailers, such as Flipkart, mine this
data to identify what users are interested in and then make product suggestions based on
association rules. For example, users searching for books on Java programming may be offered
what others have seen and purchased, including associated books on program- ming they have
not considered.
4. Text data: This is text data that is posted by users on web pages, blogs, e-mails, wikis, twitter
feeds, and others. Many organisations have found that by mining this data they can glean
interesting insights and trends. Many tools and software programs have been created recently
to mine text data. One example is provided by the online site called Wordle.

The www.wordle.net site hosts an application that mines text submitted to it. The application counts
the frequency of words appearing in the submitted text and then creates a word ‘cloud’ with the most
frequent words appearing as the largest. For example, Figure 11.5 has a word cloud of the text of a
portion of the Constitution of India. Text from Part III of the Constitution, comprising of the
Fundamental Rights, was submitted to wordle.net. This part consists of about 13 pages of printed text.

Figure 5: Wordle cloud for text from part III of the Constitution of India pertaining to Fundamental
Rights

Unit: 11 - Managing Data Resources 23


DMBA210 : Management Information System

SELF-ASSESSMENT QUESTIONS – 4
Fill in the blanks:
8 A _______________ is a data warehouse containing data pertaining to a particular domain.
9 _________________ is the process of removing erroneous data from data ware- houses.

Unit: 11 - Managing Data Resources 24


DMBA210 : Management Information System

7. SUMMARY

Let us recapitulate the important concepts discussed in this unit:

• Different aspects of the need for database systems are data independence, reduced data
redundancy, data consistency, data access, data adminis- tration, managing concurrency,
managing security, recovery from crashes, and application development.
• The field in database whose values are unique is called primary key.
• In homogeneous databases, those users who are close to a particular server are able to access
data from that particular one, whereas other access data from other, physically closer servers.
• In heterogeneous databases, when application servers draw on the data- bases, special
algorithms pull together the required data from diverse serv- ers and present a consolidated
picture.
• Middleware is a special class of software which is used to connect dispa- rate databases.
• Different types of database designs are relational model, hierarchical model, object-oriented
model, network model, and object-relational model.
• Elements of DBMS are tables, queries, forms, and reports.
• Information regarding the data in tables – for what purpose it is created, by them, whom, who
maintains them, and so on – is referred to as metadata.
• The technology of data warehouses draws on enterprise databases to cre- ate a separate set of
tables and relations that can be used to run particular kinds of queries and analytical tools.
• Data mining means extracting patterns and knowledge from historical data that are typically
housed in data warehouses.

Unit: 11 - Managing Data Resources 25


DMBA210 : Management Information System

8. GLOSSARY

A software program that enables storage, access, and use of data by other
Database -
software applications.

Field - A defined space of given size, which stores the basic elements of data.

Record - A collection of fields.

File - A collection of records, also called a table.

Contains details about how data is stored in files; provides informa- tion
Metadata -
on how the data can be used and managed.

Primary key - A field that contains data that uniquely identifies a record.

Personal database - Databases created and used primarily by single users.

A database software that runs on an independent computer and provides


Database server -
services to client applications.

Distributed
- A database whose tables are maintained on various different servers.
database

Software that is used to connect distributed databases to different client


Middleware -
devices.

A massive database that is created from organisational data to enable


Data warehouse -
analysis and to assist with decision making.

Data scrubbing - Removing erroneous data from data warehouses.

Data mart - A data warehouse containing data pertaining to a particular domain.

Unit: 11 - Managing Data Resources 26


DMBA210 : Management Information System

9. TERMINAL QUESTIONS
1. What is the need for data management? Why is it difficult to manage data?
2. Describe some of the challenges of modern database management.
3. What is the difference between fields, records and files?
4. Why is a primary key needed?
5. What is the difference between a personal database and an organisa- tional database?
6. What is the advantage of three-tier architecture?
7. Why is middleware important?
8. Describe briefly how to create a data warehouse.

Unit: 11 - Managing Data Resources 27


DMBA210 : Management Information System

10. ANSWERS
Self-Assessment Questions
1. (d) data and programs were together, and changing data required changing the program
2. True
3. (a) managing concurrency, security and recovery from crashes
4. Distributed
5. (d) Middleware
6. (b) heterogeneous
7. (d) highly secure
8. Data mart
9. Data scrubbing

Terminal Questions Answers


Answer 1: Refer Section 11.3
Answer 2: Refer Section 11.4
Answer 3: Refer Section 11.5.1
Answer 4: Refer Section 11.5.1
Answer 5: Refer Section 11.5.2
Answer 6: Refer Section 11.5.2
Answer 7: Refer Section 11.5.2
Answer 8: Refer Section 11.6

Unit: 11 - Managing Data Resources 28


DMBA210 : Management Information System

11. REFERENCES

• Hoffer, J.A., Prescott, M.B. and McFadden, F.R. (2007) Modern Database Management, 8th edn,
Prentice Hall, NJ.

E-References

• An article ‘How much Information is there in the World?’ in USC News, February 2011 is
available at: http://uscnews.usc.edu (accessed on June 2011).
• An article ‘eBay’s Two Enormous Data Warehouses’, in DBMS2, 2009 is available at:
http://www.dbms2.com/2009/04/30/ebays-two-enormous-data- warehouses/ (accessed on
June 2011).

Unit: 11 - Managing Data Resources 29

You might also like