By now, you have learned that database management systems, DBMS form an integral part of the IT infrastructure.
In
this segment, you will understand the significance of DBMS and how it differs from traditional file organization
systems. Let's do this by taking a look at the example of Virgin Mobile Australia.
Virgin Mobile Australia has always been a leading player in the Australian mobile communications industry. The
company was always known for innovation in the products and services it offered its customer. As other Australian
telecom players entered the market with their own attractive offerings, Virgin Mobile was compelled to devise a
strategy to maintain its competitive edge.
To do so, the senior management at Virgin Mobile first needed a clear and comprehensive view of the company's
customers, offerings, and various operational metrics.
This proved to be a challenge because the company had multiple systems, each generating different and siloed pieces
of data. All of this data needed to be integrated.
The company then created a data repository which routed the data from all the systems into a centralized database,
allowing for an integrated and holistic view of Virgin Mobile's customers and offerings.
With the warehouse in place, the company's executives could analyze metrics such as the number of new registered
connections, the demographics of new customers, and the category of handsets they're purchased.
They could also track the performance of each marketing campaign, gauge public reaction to the new product, and
the number of customers that flocked to other telecom providers. Along with this, the data warehouse also
monitored the revenue generated per customer.
This kind of collected information empowered Virgin Mobile with the insights it needed to design and adopt business
strategies and remain profitable in a dynamic market. The experience of Virgin Mobile illustrates the effectiveness of
having data management systems software in place. The company created new processes to store, organize, and
access data, and now all in one place.
Let's now look at the definition of a DBMS. A DBMS is, in a sense, a software that centralizes an organization's data,
manages it effectively, and makes it easy for stakeholders to access that data. Let's understand the workings of a
DBMS with a simple example.
Look at the figure on your screen. On the left, you can see the human resources database, which stores multiple
details about employee compensation. The DBMS, in this case, divides the data into two categories, benefits and
payroll.
Suppose, the company hires a specialist to analyze the workings of the benefits. She can now do so by accessing the
data in category one which includes an employee's name, social security number, and health insurance coverage.
Similarly, the company's payroll department looking for data regarding the gross pay and net pay for certain
employees can now do so easily.
In the context of this example, before the existence of DBMS, the organization had a file for personnel, another file
for payroll, a file for medical insurance, a mailing list file, and so on and so forth, until hundreds of files for each
category existed. 5 to 10 years down the line, the organization would have thousands of files for programs,
categories, and applications.
Some critical problems with organization that don't have a DBMS in place are data redundancy, inconsistency, and
the inability to share data. With a DBMS, data sharing throughout the organization becomes much easier because the
data is now present in a single location rather than in thousands of fragmented files.
To create a database management system, it is imperative to understand the relationship between the type of data in
the database, the use of that data, and the changes that might be made to that data. A database is made up of a
conceptual design and a physical design.
In the conceptual design, data is grouped as per distinct data categories or elements. This design method identifies
the most efficient way to group data elements together to avoid any data redundancy or duplication. Data is
organized, refined, and streamlined until appropriate categories are formed, and the relationship between those
categories is defined. This process of creating small structures of data from complex data groupings is called
normalization.
Let's understand this with the help of an example. Look at the figure on the screen. The data that you see has not
been normalized.
A normalized design would look something like this. How was the design normalized? If you look at the order
number-order date pair, each combination is unique.This allows us to define order as order number, order date. The
combination of part number, part name, unit price, supplier number forms the field that defines part.
Similarly, each part has a supplier. The details for which are stored in a separate table called supplier. To keep track of
various parts that have been utilized in various orders, there is a different table called line item, which highlights the
relationship between order, part, and part quantity.
Apart from normalization, another concept you should have a tight grasp on is the entity relationship model
represented by entity relationship diagrams.
The diagram on the screen signifies the relationship between supplier, part, line item, and order. They're all entities.
The lines that connect one box with the other represent the relationship. Look at the image again. The line that
connects two boxes or entities ending in two short vertical marks, denotes a one-to-one relationship. The boxes that
have one short vertical mark along with an arrow denote a one-to-many relationship.
Let's understand this with only two boxes now. Here, if you see the line from the left to right, that is from supplier to
the part, it ends up with a single vertical mark and an arrow denoting a one-to-many relationship. It simply means
that one supplier can provide many parts.
Now, if you see the same line from right to the left, that is from part to the supplier. The line ends up with the two
short vertical marks. This denotes a one-to-one relationship. It simply means that each part can only have one
supplier.
An entity relationship diagram mapping is also important to identify the various business processes and their
dependencies on each other. It groups the orders in terms of its vendors, parts, suppliers, other raw materials, etc.
Both normalization and entity relationship diagrams help in grouping chunks of data into small structures by
categorizing them, either by their characteristic or their function. All of this is necessary when it comes to creating a
database management system.
Databases are generally used by businesses to keep track of basic transactions such as timely payment to suppliers,
vendors and employees, processing of orders, tracking existing customers, and attracting new customer bases. But,
that's not all. Businesses need databases to provide information that will help ease business operations.
However, with the quantum of big data comes a big responsibility to manage it. Take the example of web traffic
during Black Friday sales or the content on social media platforms such as Twitter and Instagram, or even the data
from stock trading platforms. There is an explosion of data all around. These are billions and trillions of records being
produced extremely rapidly. Today, this quantum of data is testing the capabilities of many database management
systems.
For example, according to a post on Twitter's official blog dated October 22nd, 2021, the platform processes nearly
400 billion events in real time and generates over petabytes scale data every day. To put this scale of data in
perspective, a petabyte is equal to 1000 terabytes, one terabyte is equal to 1000 gigabytes. To extract value from this
data, businesses are now investing in tools that have the capability to analyze traditional as well as non-traditional
data.
Some of these tools are data marts and data warehouses, in-memory computing, tools like Hadoop or even cloud
services. Let's look at these tools one by one.
Data warehouses and data marts. A data warehouse is a database that stores current and past data that would be of
potential interest to stakeholders all across the organization. It manages data that originates from multiple
departments such as accounts, marketing, finance, supply chain, and many more.
Now, what a data warehouse does is, it extracts both the current and past data from these departments and
combines it with data from the sources external to the organization. It also has the ability to correct the not so
accurate and incomplete data and reconstruct it before it is used by the management and the employees for analysis.
While companies generally use centralized and enterprise-wide data warehouses that benefit the entire organization,
they also create much smaller and decentralized data warehouses, which are called data marts. A data mart is usually
reserved for highly confidential and focused information that is accessible to a specific set of people in the
organization.
Big data is either structured such as bank transactions, non-structured like audio and video data, or semi-structured
like Facebook and Twitter feeds.
Hadoop is a tool that has the capability to handle all these types of big data. It is used by many organizations today.
Hadoop is an open source software that is managed by a company called Apache Software Foundation. It works by
dissecting a big data problem into many micro problems and distributes it across thousands of computers for
analysis. It then combines this analysis into many smaller sets of data that are easy to access by the organization's
management and employees. Companies make sure to use Hadoop for the analysis of large amounts of data before
they get loaded into a data mart or a data warehouse.
In-memory computing and analytical platforms: In-memory computing refers to storage of data on a computer's
memory or the RAM, Random Access Memory. Users can access the data that is stored in their computer system's
memory within a matter of seconds.
With in-memory computing, it is possible for big quantums of data, which may be the volume of a small data
warehouse to be housed completely in the RAM. Database vendors have also shifted gears from a data warehouse to
analytical platforms. These platforms are optimized for storage and analysis of big data sets.
For example, platforms such as IBM's pure data system for analytics integrate database, server, and components of
storage that manage complicated analytical questions 10 times or a 100 times faster than the traditional methods of
analysis.
But suppose, you are an owner of a company that deals in sales of three kinds of products - washing machines,
refrigerators, and air conditioners. If you wanted to find out the answer to say how many refrigerators were sold in
the past to past quarter, you could easily look at your sales database to find that data.
But say, if you wanted to analyze the number of refrigerators sold and compare it with the sales that were projected
for that quarter, what would you do?
Here is where online analytical processing O-L-A-P, OLAP comes in. OLAP is capable of analysis of data from multiple
dimensions. Each aspect of your refrigerator, its price, regions where it is sold and time period of its manufacturing.
Each aspect is representative of a separate dimension.
You can use OLAP to find out how many refrigerators were sold in the east region in the month of May last year and
compare it with this year's data, and also compare it with the forecast of sales. However, if you are looking to take
things up at large and derive insights from sources, from OLAP’s reach, you should understand data mining.
Data mining provides insights into data that fails to be fetched by OLAP. It does so by uncovering hidden designs in
extensive databases and using those insights to forecast future behavior.
For example, a study of supermarket purchasing patterns using data mining may reveal that when a big packet of
Cheetos are purchased, a Coca-Cola drink is purchased at least 70% of the time with it. However, during promotions,
the same drink is bought 90% of the time.
This aids business managers in understanding the profitability that comes with promotions. Data mining also helps in
analyzing time-bound events.
For example, a company selling household appliances may find that if a new home is purchased, a new refrigerator
and a washing machine will be bought within one to two weeks at least 75% of the time, and an oven will be bought
within four weeks of the house purchase, 60% of the time.
Database set up an analysis of data barely scratch the surface. To ensure that your business data stays reliable,
factual, and easily accessible to those who require it, a business needs special procedure to manage its data. That's
where information policies, data administration, and data governance come into play.
Each business, no matter how big or small, requires an information policy without fail. Your business data is an
important resource and it's misuse and malpractice should be prevented at all costs.
An information policy is a rulebook for sharing, transferring, cultivating, formalizing, and categorizing sensitive
business information. It states which grade of users and departments can disseminate and distribute information and
the stakeholders that are responsible for updation and maintenance of that information.
For instance, a standard information policy may specify that only a few designated members of the payroll
department will have administrative access to modify and view sensitive data, such as an employee's salary or social
security number. It would also specify guidelines for these departments to ensure that such employee data is
accurate and updated periodically.
Developing an information policy is a part of administering data of an organization. Along with this, data planning,
overseeing the creation and design of a database, revamping existing databases, monitoring the development of your
enterprise-wide data dictionary, and supervising how IT specialists use data is all part of data administration of an
organization.
Data governance, on the other hand, is concerned with the processes that govern data usability, data availability, and
data security in an organization. Here, special emphasis is on privacy, data quality, and ensuring maximum
compliance with regulations set by the government.
Disclaimer: All content and material on the upGrad website is copyrighted, either belonging to upGrad or its bonafide
contributors and is purely for the dissemination of education. You are permitted to access, print, and download
extracts from this site purely for your own education only and on the following basis:
● You can download this document from the website for self-use only.
● Any copies of this document, in part or full, saved to disk or to any other storage medium, may only be used
for subsequent, self-viewing purposes or to print an individual extract or copy for non-commercial personal
use only.
● Any further dissemination, distribution, reproduction, copying of the content of the document herein or the
uploading thereof on other websites, or use of the content for any other commercial/unauthorized purposes
in any way which could infringe the intellectual property rights of upGrad or its contributors, is strictly
prohibited.
● No graphics, images, or photographs from any accompanying text in this document will be used separately
for unauthorized purposes.
● No material in this document will be modified, adapted, or altered in any way.
● No part of this document or upGrad content may be reproduced or stored in any other website or included in
any public or private electronic retrieval system or service without upGrad’s prior written permission.
● Any right not expressly granted in these terms is reserved.