Chapter 5
Data and Knowledge
Management
1. Managing Data
2. Big Data
3. The Database Approach
4. Database Management Systems
5. Data Warehouses and Data Marts
6. Knowledge Management
5.1Managing Data
• The Difficulties of Managing
Data
• Data Governance
Difficulties in Managing
Data
• Data increases exponentially with
time
• Multiple sources of data
• Data rot, or data degradation
• Data security, quality, and integrity
• Government Regulation
The Difficulties of Managing Data
1. Managing data in organizations is difficult for
many reasons.
2. First, the amount of data increases exponentially
with time.
3. Much historical data must be kept for a long
time, and new data are added rapidly.
4. For example, to support millions of customers,
large retailers such as Walmart have to manage
petabytes of data.
The Difficulties of Managing Data
continue.......
• In addition, data are also scattered throughout
organizations, and they are collected by many
individuals using various methods and
devices.
• These data are frequently stored in numerous
servers and locations and in different
computing systems, databases, formats, and
human and computer languages.
The Difficulties of Managing Data
continue....
• Another problem is that data are generated from
multiple sources:
• internal sources (e.g., corporate databases and
company documents),
• personal sources (e.g., personal thoughts,
opinions, and experiences), and
• external sources (e.g., commercial databases,
government reports, and corporate Web sites).
• Data also come from the Web, in the form of
clickstream data.
Clickstream data
• Clickstream data are those data that
visitors and customers produce when they
visit a Web site and click on hyperlinks.
• Clickstream data provide a trail of the users’
activities in the Web site, including user
behaviour and browsing patterns.
Other sources of Data...
• Adding to these problems is the fact that new
sources of data, such as blogs, podcasts,
videocasts, and radio frequency identification
(RFID) tags and other wireless sensors, are
constantly being developed.
• In addition, data degrade over time. For
example, customers move to new addresses or
change their names, companies go out of
business or are bought, new products are
developed, employees are hired or fired, and
companies expand into new countries.
Data degradation...
• In addition, data degrade over time. For
example, customers move to new addresses
or change their names, companies go out of
business or are bought, new products are
developed, employees are hired or fired, and
companies expand into new countries.
Data Rot...
• Data are also subject to data rot. Data rot refers
primarily to problems with the media on which
the data are stored.
• Over time, temperature, humidity, and exposure
to light can cause physical problems with storage
media and thus make it difficult to access the
data.
• The second aspect of data rot is that finding the
machines needed to access the data can be
difficult.
Data security & Legal
requirement
• Data security, quality, and integrity are
critical, yet they are easily jeopardized.
• In addition, legal requirements relating to
data differ among countries as well as among
industries, and they change frequently.
Data Repetition
• Another problem arises from the fact that, over
time, organizations have developed information
systems for specific business processes, such as
transaction processing, supply chain management,
and customer relationship management.
• Information systems that specifically support these
processes impose unique requirements on data,
which results in repetition and conflicts across the
organization.
Data Repetition continue...
• For example, the marketing function might
maintain information on customers, sales
territories, and markets.
• These data might be duplicated within the
billing or customer service functions.
• This situation can produce inconsistent data
within the enterprise.
Federal Regulation
• Federal regulations (e.g., Sarbanes-Oxley) have made it a top priority for
companies to better account for how they are managing information.
Sarbanes-Oxley requires that
• (1) public companies evaluate and disclose the effectiveness of their
internal financial controls
• (2) independent auditors for these companies agree to this disclosure. The
law also holds CEOs and CFOs personally responsible for such
disclosures. If their companies lack satisfactory data management policies
and fraud or a security breach occurs, the company officers could be held
liable and face prosecution.
eg: World Com, Enron , Satayam Computers
• Second, companies are drowning in data, much of which is unstructured,
the amount of data is increasing exponentially. To be profitable, companies
must develop a strategy for managing these data effectively.
Data Hoarding
• There are two additional problems with data
management: Big Data and data hoarding.
• Data Hoarding: is excessive acquisition and
reluctance to delete old historic data no
longer valuable to the user or to an
organization.
• Big Data: Explain in the other slides.
Data Governance
• An approach to managing
information across an entire
organization.
• Master Data
• Master Data Management
Data Governance
• To address the numerous problems associated
with managing data, organizations are turning to
data governance.
• Data governance is an approach to managing
information across an entire organization.
• It involves a formal set of business processes and
policies that are designed to ensure that data are
handled in a certain, well-defined fashion.
• That is, the organization follows unambiguous
rules for creating, collecting, handling, and
protecting its information.
Data Governance continue..
• The objective is to make information
available, transparent, and useful for the
people who are authorized to access it, from
the moment it enters an organization until it is
outdated and deleted.
Data Governance continue...
• One strategy for implementing data
governance is master data management.
• Master data management is a process that
spans all organizational business processes
and applications.
Master Data
• It provides companies with the ability to
store, maintain, exchange, and synchronize a
consistent, accurate, and timely “single
version of the truth” for the company’s
master data.
• Master data are a set of core data, such as
customer, product, employee, vendor,
geographic location, and so on, that span the
enterprise information systems.
Master data vs Transaction
data
• Transaction data, which are generated and
captured by operational systems, describe the
business’s activities, or transactions.
• In contrast, master data are applied to
multiple transactions and are used to
categorize, aggregate, and evaluate the
transaction data.
• Let’s look at an example of a transaction: You
(Mary Jones) purchase one Samsung 42-inch
plasma television, part number 1234, from Bill
Roberts at Best Buy, for $2,000, on April 20,
2013. In this example, the master data are
“product sold,” “vendor,” “salesperson,”
“store,” “part number,” “purchase price,” and
“date.”
• When specific values are applied to the master
data, then a transaction is represented. Therefore,
transaction data would be, respectively, “42-inch
plasma television,” “Samsung,” “Best Buy,”
“Bill Roberts,” “1234,” “$2,000,” and “April 20,
2013.”
5.2Big Data
• Defining Big Data
• Characteristics of Big Data
• Managing Big Data
• Leveraging Big Data
Defining Big Data
• Big data is difficult to define
• Two Descriptions of Big Data
From Gartner Research
(Big Data Description 1
•of 2) high-volume, high-velocity information
Diverse,
assets that require new forms of processing to enable
enhanced decision making, insight discovery, and
process optimization. (www.gartner.com)
From the Bid Data
Institute (Big Data
•Description
Exhibit variety 2 of 2)
• Includes structured, unstructured, and semi-structured
data
• Are generated at high velocity with an uncertain pattern
• Do not fit neatly into traditional, structured, relational
databases
• Can be captured, processed, transformed, and analyzed in
a reasonable amount of time only by sophisticated
information systems.
• (www.the-bigdatainstitute.com)
Defining Big Data
• Big Data Generally Consist of:
– Traditional enterprise data
– Machine-generated/sensor data
– Social Data
– Images captured by billions of devices located
around the world
• Digital cameras, camera phones, medical scanners, and
security cameras
Characteristics of Big
Data
• Volume
• Velocity
• Variety
Volume
• We have noted the incredible volume of Big Data. Although the
sheer volume of Big Data presents data management problems,
this volume also makes Big Data incredibly valuable. Irrespective
of their source, structure, format, and frequency, data are always
valuable. If certain types of data appear to have no value today,
it is because we have not yet been able to analyze them
effectively. For example, several years ago when Google began
harnessing satellite imagery, capturing street views, and then
sharing these geographical data for free, few people understood
its value. Today, we recognize that such data are incredibly useful
• Consider machine-generated data, which are generated in much
larger quantities than nontraditional data. For instance, sensors
in a single jet engine can generate 10 terabytes of data in 30
minutes. With more than 25,000 airline flights per day, the daily
volume of data from just this single source is incredible. Smart
electrical meters, sensors in heavy industrial equipment, and
telemetry from automobiles increase the volume of Big Data.
Velocity
• The rate at which data flow into an organization
is rapidly increasing. Velocity is critical because
it increases the speed of the feedback loop
between a company and its customers.
• For example, the Internet and mobile
technology enable online retailers to compile
histories not only on final sales, but on their
customers’ every click and interaction.
Companies that can quickly utilize that
information—for example, by recommending
additional purchases—gain competitive
advantage.
Variety
• Traditional data formats tend to be
structured, relatively well described, and
they change slowly. Traditional data
include financial market data, point-of-
sale transactions, and much more. In
contrast, Big Data formats change rapidly.
They include satellite imagery, broadcast
audio streams, digital music files, Web
page content, scans of government
documents, and comments posted on
social networks.
Managing Big Data
• When properly analyzed big data can
reveal valuable patterns and
information.
• Database environment
• Traditional relational databases
versus NoSQL databases
• Open source solutions
Leveraging Big Data
• Creating Transparency
• Enabling Experimentation
• Segmenting Population to Customize
Actions
• Replacing/Supporting Human Decision
Making with Automated Algorithms
• Innovating New Business Models,
Products, and Services
• Organizations Can Analyze Far More
Data
5.3The Database Approach
• The Data Hierarchy
• Designing the Database
Databases Minimize Three
Main Problems
•Data Redundancy – The same data is
store in many places
•Data Isolation – Application cannot
access data associated with other
application
•Data Inconsistency – Various versions
of data do not agree
Databases Maximize the
Following
• Data Security – Because data are put in one place
in database, therefore database have extremely high
security measures to deter mistakes ant attack.
• Data Integrity – Data meet certain constrains such
as no alphabets in a social security field
• Data Independence – Application and data are
not linked together, so that all the application are able
to access the same data
Illustrates a university
database
Data Hierarchy
• Bit – smallest unit of data a computer can
process.
• Byte – Represent a single character (8bits)
• Field – Logical grouping of characters into a word
or identification number
• Record – Logical grouping of related fields
• Data File or Table – Logical grouping of related
records
• Database – Logical grouping of related files
Hierarchy of data for a
computer-based file.
Designing the Database
• Key Terms
– Data Model
– Entity
– Instance
– Attribute
– Primary Key
– Secondary Keys
Designing the Database
• Entity-Relationship Modeling
• Entity-Relationship Diagram
• Cardinality
• Modality
Cardinality and modality
symbols.
5.4Database Management
Systems
• The Relational Database Model
• Databases in Action
The Relational Database
Model
• Based on the concept of two-
dimensional tables
• Database Management System
(DBMS)
• Query Languages
• Data Dictionary
• Normalization
E-R Diagram
Entities, Attributes, and
Identifiers
5.5Data Warehouses and Data
Marts
• Describing Data Warehouses
and Data Marts
• A Generic Data Warehouse
Environment
Describing Data
Warehouses & Data Marts
• Data Warehouse
– A repository of historical data that are organized
by subject to support decision makers in the
organization
• Data Mart
– A low-cost, scaled-down version of a data
warehouse designed for end-user needs in a
strategic business unit (SBU) or individual
department.
Describing Data
Warehouses & Data Marts
• Basic characteristics of data
warehouses and data marts
– Organized by business dimension or subject
– Use online analytical processing (OLAP)
– Integrated
– Time variant
– Nonvolatile
– Multidimensional
A Generic Data Warehouse
Environment
• Source Systems
– Data Integration
– Storing the Data
• Metadata
• Data Quality
• Data Governance
• Users
Data Warehouse
Framework
5.6Knowledge Management
• Concepts and Definitions
• Knowledge Management
Systems
• The KMS Cycle
Concepts & Definitions
• Knowledge Management (KM)
– A process that helps manipulate important
knowledge that comprises part of the organization’s
memory, usually in an unstructured format.
• Knowledge
• Explicit & Tacit Knowledge
• Knowledge Management System
(KMS)
Knowledge Management
Systems (KMS)
• Refer to the use of modern
information technologies – the
Internet, intranet, extranets,
databases – to systematize,
enhance, and expedite intrafirm and
interfirm knowledge management.
– Best practices
The KMS Cycle
• Create Knowledge
• Capture Knowledge
• Refine Knowledge
• Store Knowledge
• Manage Knowledge
• Disseminate Knowledge
Mumbai University Questions:
• Q1> Explain the architecture of data mart
and Data warehouse in an organization.
• Q2> Write short Notes: Data Warehouse
and Data Mart.
• Q3> Explain Data warehouse and Data
mart with its characteristics.
• Q4> Define Knowledge Management.
Explain Knowledge Management system
in detail.
• Q5> Explain in detail Relational Database
Management model.