Unit II Lecture Notes
Unit II Lecture Notes
Analytic Functions
A Data Warehouse works as a central repository where information arrives from one or
more data sources. Data flows into a data warehouse from the transactional system and
other relational databases.
1. Structured
2. Semi-structured
3. Unstructured data
The data is processed, transformed, and ingested so that users can access the processed
data in the Data Warehouse through Business Intelligence tools, SQL clients, and
spreadsheets. A data warehouse merges information coming from different sources into
one comprehensive database.
By merging all of this information in one place, an organization can analyze its customers
more holistically. This helps to ensure that it has considered all the information available.
Data warehousing makes data mining possible. Data mining is looking for patterns in the
data that may lead to higher sales and profits.
Operational Data Store, which is also called ODS, are nothing but data store required 1.2
when neither Data warehouse nor OLTP systems support organizations reporting needs.
In ODS, Data warehouse is refreshed in real time. Hence, it is widely preferred for routine
activities like storing records of the Employees.
3. Data Mart:
A data mart is a subset of the data warehouse. It specially designed for a particular line of
business, such as sales, finance, sales or finance. In an independent data mart, data can
collect directly from sources.
1.3 Components of Data warehouse
Load manager: Load manager is also called the front component. It performs with all the
operations associated with the extraction and load of data into the warehouse. These
operations include transformations to prepare the data for entering into the Data
warehouse.
Query Manager: Query manager is also known as backend component. It performs all the
operation operations related to the management of user queries. The operations of this
Data warehouse components are direct queries to the appropriate tables for scheduling the
execution of queries.
1. Data Reporting
2. Query Tools
4. EIS tools,
Airline:
In the Airline system, it is used for operation purpose like crew assignment, analyses of
route profitability, frequent flyer program promotions, etc.
Banking:
It is widely used in the banking sector to manage the resources available on desk effectively.
Few banks also used for the market research, performance analysis of the product and
operations.
Healthcare:
Healthcare sector also used Data warehouse to strategize and predict outcomes, generate
patient's treatment reports, share data with tie-in insurance companies, medical aid
services, etc.
Public sector:
In the public sector, data warehouse is used for intelligence gathering. It helps
government agencies to maintain and analyze tax records, health policy records, for every
individual.
In this sector, the warehouses are primarily used to analyze data patterns, customer trends,
and to track market movements.
Retail chain:
In retail chains, Data warehouse is widely used for distribution and marketing. It also helps
to track items, customer buying pattern, promotions and also used for determining pricing
policy.
Telecommunication:
A data warehouse is used in this sector for product promotions, sales decisions and to make
distribution decisions.
Hospitality Industry:
This Industry utilizes warehouse services to design as well as estimate their advertising
and promotion campaigns where they want to target clients based on their feedback and
travel patterns.
The best way to address the business risk associated with a Data warehouse
implementation is to employ a three-prong strategy as below
Here, are key steps in Datawarehouse implementation along with its deliverables.
• Decide a plan to test the consistency, accuracy, and integrity of the data.
• The data warehouse must be well integrated, well defined and time stamped.
• While designing Datawarehouse make sure you use right tool, stick to life cycle, take
care about data conflicts and ready to learn you're your mistakes.
• Never replace operational systems and reports
• Don't spend too much time on extracting, cleaning and loading data.
9 Extract Data from Operational Data Store Integrated D/W Data Extracts
• Data warehouse allows business users to quickly access critical data from some
sources all in one place.
• Data warehouse provides consistent information on various cross-functional
activities. It is also supporting ad-hoc reporting and query.
• Data Warehouse helps to integrate many sources of data to reduce stress on the
production system.
• Data warehouse helps to reduce total turnaround time for analysis and reporting.
• Restructuring and Integration make it easier for the user to use for reporting and
analysis.
• Data warehouse allows users to access critical data from the number of sources in a
single place. Therefore, it saves user's time of retrieving data from multiple sources.
• Data warehouse stores a large amount of historical data. This helps users to analyze
different time periods and trends to make future predictions.
1.8 Disadvantages of Data Warehouse:
• Change in Regulatory constrains may limit the ability to combine source of disparate
data. These disparate sources may include unstructured data which is difficult to store.
• As the size of the databases grows, the estimates of what constitutes a very large
database continue to grow. It is complex to build and run data warehouse systems
which are always increasing in size. The hardware and software resources are available
today do not allow to keep a large amount of data online.
• Multimedia data cannot be easily manipulated as text data, whereas textual
information can be retrieved by the relational software available today. This could be
a research subject.
• retrieved by the relational software available today. This could be a research subject.
There are many Data Warehousing tools are available in the market. Here, are some most
prominent one:
1. Mark Logic:
MarkLogic is useful data warehousing solution that makes data integration easier and
faster using an array of enterprise features. This tool helps to perform very complex
search operations. It can query different types of data like documents, relationships,
and metadata.
2. Oracle:
Amazon Redshift is Data warehouse tool. It is a simple and cost-effective tool to analyze
all types of data using standard SQL and existing BI tools. It also allows running
complex queries against petabytes of structured data, using the technique of query
optimization.
(iii) KM is a continuous process; as the world economy is dynamic and full of challenges. It
requires constant creation of new skills and capabilities and improvement of existing
ones.
KM is not an outgrowth of IT. Rather, KM requires human skills, creativity and innovative
capabilities of people; which are the base of KM. In fact I there are tools of IT like Intranets,
Lotus Notes, MS-Exchange etc.; which provide an infrastructure for the free play of human
creativity and innovative powers for the formulation of corporation strategy, in a
competitive globalized environment.
The above ideas are illustrated with the help of the following
diagram:
Knowledge Management IT and Corporate Strategy
The first step in KM is an identification of what type of knowledge is required for the
successful designing and implementation of corporate strategy.
(ii) Determination of Knowledge Assets:
The management must identify what are the knowledge assets of the organisation;
which basically are competitors, suppliers, governmental agencies, products and
processes, technology etc. Management must plan to get maximum returns out of
knowledge assets.
(a) Acquisition of knowledge through knowledge assets e.g. knowledge about new
products (from competitors), new technologies, social, economic, political changes. It also
requires transformation of raw information into knowledge, useful to solve business
problems.
(b) Generation of knowledge, by creating conditions for the emergence of a learning
organization. This is the most important internal source of knowledge generation which
makes tacit knowledge of individuals available for organizational purposes.
(iv) Knowledge Storage:
It is a process which allows members of the organisation to have an access to the collective
knowledge of the organisation.
KM enables a corporation to build and sharpen its competitive edge, for survival and
growth in the competitive globalized economy. In fact, KM aided by IT tools enables a
corporation to design and implement most appropriate corporate strategies.
KM is basically built on the knowledge generated, shared and utilized through a learning
organisation. There is no doubt that learning organisation provides the foundation on
which the building of KM could be built. A learning organisation through facilitating
interaction among people of the organisation, leads to betterment of human relations;
which is a very big permanent asset an organisation can boast of to possess.
KM-its concept and practices – motivate people to enhance their intellectual capabilities,
resulting in new skills, improvement of existing skills etc. Thus not only does KM enhance
the intellectual elements of people; but also indirectly prevents depreciation of human
capital.
(v) Enhancement of Enterprise Goodwill:
Initiation and practices of KM help an enterprise enhance its goodwill in the global market;
enabling it to acquire more success and prosperity.
3.Types of Decisions in Business Intelligence
The characteristics of decisions faced by managers at different levels are quite different.
Decisions can be classified as structured, semi structured, and unstructured. Unstructured
decisions are those in which the decision maker must provide judgment, evaluation, and
insights into the problem definition. Each of these decisions is novel, important, and
nonroutine, and there is no well-understood or agreed-on procedure for making them.
Structured decisions, by contrast, are repetitive and routine, and decision makers can
follow a definite procedure for handling them to be efficient. Many decisions have elements
of both and are considered semi structured decisions, in which only part of the problem
has a clear-cut answer provided by an accepted procedure. In general, structured decisions
are made more prevalently at lower organizational levels, whereas unstructured decision
making is more common at higher levels of the firm.
There are four kinds of systems used to support the different levels. We introduced some
of these systems in Management information systems (MIS) provide routine reports and
summaries of transaction- level data to middle and operational-level managers to provide
answers to structured and semi structured decision problems.
1.Decision-support systems (DSS) are targeted systems that combine analytical models
with operational data and supportive interactive queries and analysis for middle managers
who face semi structured decision situations.
2.Executive support systems (ESS) are specialized systems that provide senior
management making primarily unstructured decisions with a broad array of both external
information (news, stock analyses, industry trends) and high-level summaries of firm
performance. The purpose of ESS to help the C- level managers to focus on the information
that really affect the overall profitability and success of the firm. The leading methodology
for understanding the really important information needed by the firm’s executive is called
the Balanced Score Card Method, a frame work for operationalizing the firm’s strategic plan
by focusing on measurable outcomes on four dimensions of firm performance. Financial,
business process, customer, learning and growth. Performance on each dimension is
measured using KPI’s.
3.Group decision-support systems (GDSS) are specialized systems that provide a group
electronic environment in which managers and teams can collectively make decisions and
design solutions for unstructured and semi structured problems. GDSS guided meetings
takes place in a conference room with special software and hardware tools to facilitate
group decision making. It makes possible to increase the meeting size and increase in
productivity. Because individuals contribute simultaneously at the same time rather than
one at a time.
Making decisions consists of several different activities. Simon (1960) describes four
different stages in decision making: intelligence, design, choice, and implementation
The decision-making process can be described in four steps that follow one another
in a logical order. In reality, decision makers frequently circle back to reconsider the
previous stages and through a process of iteration eventually arrive at a solution that is
workable.
Design involves identifying and exploring various solutions to the problem. Decision
support systems (DSS) are ideal in this stage for exploring alternatives because they
possess analytical tools for modeling data, enabling users to explore various options
quickly.
Choice consists of choosing among solution alternatives. Here, DSS with access
extensive firm data can help managers choose the optimal solution. Also group decision
support systems can be used to bring groups of managers together in an electronic online
environment to discuss different solutions and make a choice.
In the real world, the stages of decision making described here do not necessarily
follow a linear path. You can be in the process of implementing a decision, only to discover
that your solution is not working. In such cases, you will be forced to repeat the design,
choice, or perhaps even the intelligence stage.
For instance, in the face of declining sales, a sales management team may strongly
support a new sales incentive system to spur the sales force on to greater effort. If paying
the sales force, a higher commission for making more sales does not produce sales
increases, managers would need to investigate whether the problem stems from poor
product design, inadequate customer support, or a host of other causes, none of which
would be “solved” by a new incentive system.
Systems supporting management decision making originated in the early 1960s as early
MIS that created fixed, inflexible paper-based reports and distributed them to managers on
a routine schedule. In the 1970s, the first DSS emerged as standalone applications with
limited data and a few analytic models. ESS emerged during the 1980s to give senior
managers an overview of corporate operations. Early ESS were expensive, based on custom
technology, and suffered from limited data and flexibility.
The rise of client/server computing, the Internet, and Web technologies has made a
major impact on systems that support decision making. Many decision-support
applications are now delivered over corporate intranets. We see six major trends:
4.Business Intelligence
Business intelligence combines business analytics, data mining, data visualization, data
tools and infrastructure, and best practices to help organizations make more data-driven
decisions. In practice, you know you’ve got modern business intelligence when you have a
comprehensive view of your organization’s data and use that data to drive change,
eliminate inefficiencies, and quickly adapt to market or supply changes. Modern BI
solutions prioritize flexible self-service analysis, governed data on trusted platforms,
empowered business users, and speed to insight
Business Intelligence is a set of processes, architectures, and technologies that convert raw
data into meaningful information that drives profitable business actions. It is a suite of
software and services to transform data into actionable intelligence and knowledge.
Step 3) Using BI system the user can ask quires, request ad-hoc reports or conduct any
other analysis.
In an Online Transaction Processing (OLTP) system information that could be fed into
product database could be
Correspondingly, in a Business Intelligence system query that would beexecuted for the
product subject area could be did the addition of new product line or change in product
price increase revenues
Correspondingly, in BI system query that could be executed would be how many new
clients added due to change in radio budget
In OLTP system dealing with customer demographic data bases data that could be fed
would be
Correspondingly in the OLAP system query that could be executed would be can customer
profile changes support support higher product price
Example 2:
It also collects statistics on market share and data from customer surveys from each hotel
to decides its competitive position in various markets.
By analyzing these trends year by year, month by month and day by day helps management
to offer discounts on room rentals.
Example 3:
The use of BI tools frees information technology staff from the task of generating analytical
reports for the departments. It also gives department personnel access to a richer data
source.
The data analyst is a statistician who always needs to drill deep down into data. BI
system helps them to get fresh insights to develop unique business strategies.
2. The IT users:
CEO or CXO can increase the profit of their business by improving operational
efficiency in their business.
4. The Business Users”
Business intelligence users can be found from across the organization. There are mainly
two types of business users
The difference between both of them is that a power user has the capability of working with
complex data sets, while the casual user need will make him use dashboards to evaluate
predefined sets of data.
1. Boost productivity
With a BI program, It is possible for businesses to create reports with a single click thus
saves lots of time and resources. It also allows employees to be more productive on their
tasks.
2. To improve visibility
BI also helps to improve the visibility of these processes and make it possible to identify
any areas which need attention.
3. Fix Accountability
BI system assigns accountability in the organization as there must be someone who should
own accountability and ownership for the organization’s performance against its set goals.
BI system also helps organizations as decision makers get an overall bird’s eye view
through typical BI features like dashboards and scorecards.
BI takes out all complexity associated with business processes. It also automates analytics
by offering predictive analysis, computer modeling, benchmarking and other
methodologies.
6. It allows for easy analytics.
BI software has democratized its usage, allowing even nontechnical or non-analysts users
to collect and process data quickly. This also allows putting the power of analytics from the
hand’s many people.
Business intelligence can prove costly for small as well as for medium-sized enterprises.
The use of such type of system may be expensive for routine business transactions.
2. Complexity:
3. Limited use
Like all improved technologies, BI was first established keeping in consideration the buying
competence of rich firms. Therefore, BI system is yet not affordable for many small and
medium size companies.
It takes almost one and half year for data warehousing system to be completely
implemented. Therefore, it is a time-consuming process.
➢ To facilitate this kind of analysis, data is collected from multiple data sources and
stored in data warehouses then cleansed and organized into data cubes.
➢ Each OLAP cube contains data categorized by dimensions (such as customers,
geographic sales region and time period) derived by dimensional tables in the data
warehouses.
➢ Dimensions are then populated by members (such as customer names, countries
and months) that are organized hierarchically. OLAP cubes are often pre-summarized
➢ across dimensions to drastically improve query time over relational databases.
1. Relational OLAP
ROLAP servers are placed between relational back-end server and client front-end tools.
To store and manage warehouse data, ROLAP uses relational or extended-relational DBMS.
2. Multidimensional OLAP
3. Hybrid OLAP
Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers higher scalability of
ROLAP and faster computation of MOLAP. HOLAP servers allows to store the large data
volumes of detailed information. The aggregations are stored separately in MOLAP store.
Specialized SQL servers provide advanced query language and query processing support
for SQL queries over star and snowflake schemas in a read-only environment.
OLAP Operations
Since OLAP servers are based on multidimensional view of data, OLAP operations in
multidimensional data.
➢ Roll-up
➢ Drill-down
➢ Slice and dice
➢ Pivot (rotate)
➢ Roll-up. Also known as consolidation, or drill-up, this operation summarizes the
data along the dimension.
➢ Drill-down. This allows analysts to navigate deeper among the dimensions of
data, for example drilling down from "time period" to "years" and "months" to
chart sales growth for a product.
➢ Slice. This enables an analyst to take one level of information for display, such as
"sales in 2017."
➢ Dice. This allows an analyst to select data from multiple dimensions to analyze,
such as "sales of blue beach balls in Iowa in 2017."
➢ Pivot. Analysts can gain a new view of data by rotating the data axes of the cube.
OLAP software then locates the intersection of dimensions, such as all products sold in the
Eastern region above a certain price during a certain time period, and displays them. The
result is the "measure"; each OLAP cube has at least one to perhaps hundreds of measures,
which are derived from information stored in fact tables in the data warehouse.
OLAP begins with data accumulated from multiple sources and stored in a data warehouse.
The data is then cleansed and stored in OLAP cubes, which users run queries against.
Association analysis is useful for discovering interesting relationships hidden in large data
sets. The uncovered relationships can be represented in the form of association rules or
sets of frequent items. Given a set of transactions, find rules that will predict the occurrence
of an item based on the occurrences of other items in the transaction Market- Basket
transactions
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
Frequent Item set – An item set whose support is greater than or equal to minus threshold.
Association Rule – An implication expression of the form X -> Y, where X and Y are any
2 item sets.
Example: {Milk, Diaper}->{Beer}
Support(s) –
The number of transactions that include items in the {X} and {Y} parts of the rule as
percentage of the total number of transaction. It is a measure of how frequently the
collection of items occurs together as a percentage of all transactions.
It measures how often each item in Y appears in transactions that contains items in X
also.
Lift (l) –The lift of the rule X=>Y is the confidence of the rule divided by the expected
confidence, assuming that the item sets X and Y are independent of each other. The
expected confidence is the confidence divided by the frequency of {Y}.
Lift value near 1 indicates X and Y almost often appear together as expected, greater than
1 means they appear together more than expected and less than 1 means they appear less
than expected. Greater lift values indicate stronger association
➢ Example – From the above table, {Milk, Diaper}=>{Beer}
= 2/5
= 0.4
➢ c= (Milk, Diaper, Beer) ÷(Milk, Diaper)
= 2/3
= 0.67
= 0.4/ (0.6*0.6)
= 1.11
The Association rule is very useful in analyzing datasets. The data is collected using bar-
code scanners in supermarkets. Such databases consist of a large number of transaction
records which list all items bought by a customer on a single purchase. So the manager
could know if certain groups of items are consistently purchased together and use this data
for adjusting store layouts, cross-selling, promotions based on statistics.
6.Analytic Function
Analytic Functions is defined as a function that is locally given by the convergent
power series. The analytic function is classified into two different types, such as real
analytic function and complex analytic function. Both the real and complex analytic
functions are infinitely differentiable. Generally, the complex analytic function holds some
properties that do not generally hold for real analytic function.
A function “f” is said to be a real analytic function on the open set D in the real line if for any
x0 ∈ D, then we can write:
where the coeffienets a0, a1, a2, … are the real numbers and also the series is convergent to
the function f(x) for x in the neighbourhood of x0.
In other words, the real analytic function is defined as an infinitely differentiable function,
such that the Taylor series at any point x0 in its domain converges to the function f(x) for x
in a neighbourhood of x0 pointwise.
The collection of all the real analytic function on a given set D is represented by Cω (D).