0% found this document useful (0 votes)

193 views10 pages

Data Warehousing & Data Mining

1. A data warehouse is a central repository where data from multiple sources is stored and organized so that it can be analyzed. It allows data to be analyzed for business intelligence and insights. 2. Data in a data warehouse comes from operational databases and other sources. It is organized and aggregated for analysis rather than transactional purposes. Data warehouses provide historical, integrated views of data across the entire organization. 3. Data warehouses are used across many industries for applications like analyzing customer behavior, monitoring product performance, assessing risk, and gaining strategic insights for planning and decision making.

Uploaded by

Binay Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

193 views10 pages

Data Warehousing & Data Mining

Uploaded by

Binay Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Data Warehousing & Data Mining

Data: UNIT-1 INTRODUCTION

In general, data is any set of characters that is gathered and translated for some purpose, usually analysis. If
data is not put into context, it doesn't do anything to a human or computer.
When data are processed, organized, structured or presented in a given context so as to make them useful,
they are called Information.
Who creates data?
Data can be created on a computer by the user, software, or hardware connected to the computer.
How is data stored on a computer?
Data and information are stored on a computer using a hard drive or another storage device.
How Data warehouse works?
A Data Warehouse works as a central repository where information arrives from one or more data sources.
Data flows into a data warehouse from the transactional system and other relational databases.
Data may be:
1. Structured
2. Semi-structured
3. Unstructured data
Structured data is the data which conforms to a data model, has a well define structure, follows a
consistent order and can be easily accessed and used by a person or a computer program.
Structured data is usually stored in well-defined schemas such as Databases. It is generally tabular with
column and rows that clearly define its attributes.
Sources of Structured Data:
 SQL Databases
 Spreadsheets such as Excel
 OLTP Systems
 Online forms
 Sensors such as GPS or RFID tags
 Network and Web server logs
 Medical devices

Semi-structured data is data that does not conform to a data model but has some structure. It lacks a
fixed or rigid schema. It is the data that does not reside in a rational database but that have some
organizational properties that make it easier to analyze. With some processes, we can store them in the
relational database.
Sources of semi-structured Data:
 E-mails
 XML and other markup languages
 TCP/IP packets
 Zipped files
 Integration of data from different sources
 Web pages
Unstructured data is the data which does not conforms to a data model and has no easily identifiable
structure such that it can not be used by a computer program easily.
Sources of Unstructured Data:
 Web pages
 Images (JPEG, GIF, PNG, etc.)
 Videos
 Memos
 Reports
 Word documents and PowerPoint presentations
 Surveys

Er.Binay Kumar Yadav Page 1

Data Warehousing & Data Mining

Data Quality Definition

Data quality is the measure of how well suited a data set is to serve its specific purpose. Measures of data
quality are based on data quality characteristics such as accuracy, completeness, consistency, validity,
uniqueness, and timeliness.

Data Quality Dimensions

There are six main dimensions of data quality, which are
 Accuracy: The data should reflect actual, real-world scenarios; the measure of accuracy can be
confirmed with a verifiable source.
 Completeness: Completeness is a measure of the data’s ability to effectively deliver all the required
values that are available.
 Consistency: Data consistency refers to the uniformity of data as it moves across networks and
applications. The same data values stored in difference locations should not conflict with one
another.
 Validity: Data should be collected according to defined business rules and parameters, and should
conform to the right format and fall within the right range.
 Uniqueness: Uniqueness ensures there are no duplications or overlapping of values across all data
sets. Data cleansing and deduplication can help remedy a low uniqueness score.
 Timeliness: Timely data is data that is available when it is required. Data may be updated in real time
to ensure that it is readily available and accessible.
What is Data Warehousing?
A Data Warehousing (DW) is process for collecting and managing data from varied sources to provide
meaningful business insights. A Data warehouse is
typically used to connect and analyze business data
from heterogeneous sources.
Data warehouse system is also known by the
following name:
 Decision Support System (DSS)
 Executive Information System
 Management Information System
 Business Intelligence Solution
 Analytic Application
 Data Warehouse

Er.Binay Kumar Yadav Page 2

Data Warehousing & Data Mining

Important Features of Data Warehouse

The Important features of Data Warehouse are given below:
1. Subject Oriented
A data warehouse is subject-oriented. It provides useful data about a subject instead of the company's
ongoing operations, and these subjects can be customers, suppliers, marketing, product, promotion, etc. A
data warehouse usually focuses on modeling and analysis of data that helps the business organization to
make data-driven decisions.
2. Time-Variant:
The different data present in the data warehouse provides information for a specific period.
3. Integrated
A data warehouse is built by joining data from heterogeneous sources, such as social databases, level
documents, etc.
4. Non- Volatile
It means, once data entered into the warehouse cannot be change.
Advantages of Data Warehouse:
o More accurate data access
o Improved productivity and performance
o Cost-efficient
o Consistent and quality data
Types of Data Warehouse
Three main types of Data Warehouses (DWH) are:
1. Enterprise Data Warehouse (EDW):
Enterprise Data Warehouse (EDW) is a centralized warehouse. It provides decision support service across
the enterprise. It offers a unified approach for organizing and representing data. It also provide the ability to
classify data according to the subject and give access according to those divisions.
2. Operational Data Store:
Operational Data Store, which is also called ODS, are nothing but data store required when neither Data
warehouse nor OLTP systems support organizations reporting needs. In ODS, Data warehouse is refreshed
in real time. Hence, it is widely preferred for routine activities like storing records of the Employees.
3. Data Mart:
A data mart is a subset of the data warehouse. It specially designed for a particular line of business, such as
sales, finance, sales or finance. In an independent data mart, data can collect directly from sources.
General stages of Data Warehouse
Earlier, organizations started relatively simple use of data warehousing. However, over time, more
sophisticated use of data warehousing begun.
The following are general stages of use of the data warehouse (DWH):
Offline Operational Database:
In this stage, data is just copied from an operational system to another server. In this way, loading,
processing, and reporting of the copied data do not impact the operational system’s performance.
Offline Data Warehouse:
Data in the Data warehouse is regularly updated from the Operational Database. The data in Data warehouse
is mapped and transformed to meet the Data warehouse objectives.
Real time Data Warehouse:
In this stage, Data warehouses are updated whenever any transaction takes place in operational database. For
example, Airline or railway booking system.
Integrated Data Warehouse:
In this stage, Data Warehouses are updated continuously when the operational system performs a
transaction. The Datawarehouse then generates transactions which are passed back to the operational
system.

Er.Binay Kumar Yadav Page 3

Data Warehousing & Data Mining

Components of Data warehouse

Four components of Data Warehouses are:
Load manager: Load manager is also called the front component. It performs with all the operations
associated with the extraction and load of data into the warehouse. These operations include transformations
to prepare the data for entering into the Data warehouse.
Warehouse Manager: Warehouse manager performs operations associated with the management of the data
in the warehouse. It performs operations like analysis of data to ensure consistency, creation of indexes and
views, generation of denormalization and aggregations, transformation and merging of source data and
archiving and baking-up data.
Query Manager: Query manager is also known as backend component. It performs all the operation
operations related to the management of user queries. The operations of this Data warehouse components are
direct queries to the appropriate tables for scheduling the execution of queries.
End-user access tools:
This is categorized into five different groups like 1. Data Reporting 2. Query Tools 3. Application
development tools 4. EIS tools, 5. OLAP tools and data mining tools.
What Is a Data Warehouse Used For?
Here, are most common sectors where Data warehouse is used:
Airline:
In the Airline system, it is used for operation purpose like crew assignment, analyses of route profitability,
frequent flyer program promotions, etc.
Banking:
It is widely used in the banking sector to manage the resources available on desk effectively. Few banks also
used for the market research, performance analysis of the product and operations.
Healthcare:
Healthcare sector also used Data warehouse to strategize and predict outcomes, generate patient’s treatment
reports, share data with tie-in insurance companies, medical aid services, etc.
Public sector:
In the public sector, data warehouse is used for intelligence gathering. It helps government agencies to
maintain and analyze tax records, health policy records, for every individual.
Investment and Insurance sector:
In this sector, the warehouses are primarily used to analyze data patterns, customer trends, and to track
market movements.
Retain chain:
In retail chains, Data warehouse is widely used for distribution and marketing. It also helps to track items,
customer buying pattern, promotions and also used for determining pricing policy.
Telecommunication:
A data warehouse is used in this sector for product promotions, sales decisions and to make distribution
decisions.
Hospitality Industry:
This Industry utilizes warehouse services to design as well as estimate their advertising and promotion
campaigns where they want to target clients based on their feedback and travel patterns.

What is Data Mining?

The process of extracting information to identify patterns, trends, and useful data that would allow the
business to take the data-driven decision from huge sets of data is called Data Mining.

Data Mining is the process of investigating hidden patterns of information to various

perspectives for categorization into useful data, which is collected and assembled in particular areas such as
data warehouses, efficient analysis, data mining algorithm, helping decision making and other data
requirement to eventually cost-cutting and generating revenue.

Data Mining is a process used by organizations to extract specific data from huge
databases to solve business problems. It primarily turns raw data into useful information.

Er.Binay Kumar Yadav Page 4

Data Warehousing & Data Mining

Types of Data Mining

Data mining can be performed on the following types of data:
Relational Database:
A relational database is a collection of multiple data sets formally organized by tables, records, and columns
from which data can be accessed in various ways without having to recognize the database tables. Tables
convey and share information, which facilitates data searchability, reporting, and organization.
Data warehouses:
A Data Warehouse is the technology that collects the data from various sources within the organization to
provide meaningful business insights. The huge amount of data comes from multiple places such as
Marketing and Finance. The extracted data is utilized for analytical purposes and helps in decision- making
for a business organization. The data warehouse is designed for the analysis of data rather than transaction
processing.
Data Repositories:
The Data Repository generally refers to a destination for data storage. However, many IT professionals
utilize the term more clearly to refer to a specific kind of setup within an IT structure. For example, a group
of databases, where an organization has kept various kinds of information.
Object-Relational Database:
A combination of an object-oriented database model and relational database model is called an object-
relational model. It supports Classes, Objects, Inheritance, etc.
One of the primary objectives of the Object-relational data model is to close the gap between the Relational
database and the object-oriented model practices frequently utilized in many programming languages, for
example, C++, Java, C#, and so on.
Transactional Database:
A transactional database refers to a database management system (DBMS) that has the potential to undo a
database transaction if it is not performed appropriately. Even though this was a unique capability a very
long while back, today, most of the relational database systems support transactional database activities.
FEATURES OF DATA MINING:
 It is good with large databases and datasets
 It predicts future results
 It creates actionable insights
 It utilizes the automated discovery of patterns
ADVANTAGES OF DATA MINING:
 Fraud Detection:
It is used to find which insurance claims, phone calls, debit or credit purchases are fraud.
 Trend Analysis:
Existing marketplace trends are analyzed,which provides a strategic benefit as it helps in reduction of costs,
as in manufacturing per demand.
 Market Analysis:
It can predict the market and therefore help to make business decisions. For example: it can identify a target
market for a retailer, or certain types of products desired by types of customers.
Data Mining Techniques
Data mining uses algorithms and various techniques to convert large collections of data into useful output.
The most popular types of data mining techniques include:
 Association rules, also referred to as market basket analysis, searches for relationships between
variables. This relationship in itself creates additional value within the data set as it strives to link
pieces of data. For example, association rules would search a company's sales history to see which
products are most commonly purchased together; with this information, stores can plan, promote,
and forecast accordingly.
 Classification uses predefined classes to assign to objects. These classes describe characteristics of
items or represent what the data points have in common with each. This data mining technique
allows the underlying data to be more neatly categorized and summarized across similar features or
product lines.
 Clustering is similar to classification. However, clustering identified similarities between objects,
then groups those items based on what makes them different from other items. While classification

Er.Binay Kumar Yadav Page 5

Data Warehousing & Data Mining

may result in groups such as "shampoo", "conditioner", "soap", and "toothpaste", clustering may
identify groups such as "hair care" and "dental health".
 Decision trees are used to classify or predict an outcome based on a set list of criteria or decisions.
A decision tree is used to ask for input of a series of cascading questions that sort the dataset based
on responses given. Sometimes depicted as a tree-like visual, a decision tree allows for specific
direction and user input when drilling deeper into the data.
 K-Nearest Neighbor (KNN) is an algorithm that classifies data based on its proximity to other
data. The basis for KNN is rooted in the assumption that data points that are close to each are more
similar to each other than other bits of data. This non-parametric, supervised technique is used to
predict features of a group based on individual data points.
 Neural networks process data through the use of nodes. These nodes is comprised of inputs,
weights, and an output. Data is mapped through supervised learning (similar to how the human
brain is interconnected). This model can be fit to give threshold values to determine a model's
accuracy.
 Predictive analysis strives to leverage historical information to build graphical or mathematical
models to forecast future outcomes. Overlapping with regression analysis, this data mining
technique aims at supporting an unknown figure in the future based on current data on hand.
Challenges of Implementation in Data mining
Although data mining is very powerful, it faces many challenges during its execution. Various challenges
could be related to performance, data, methods, and techniques, etc. The process of data mining becomes
effective when the challenges or problems are correctly recognized and adequately resolved.

Incomplete and noisy data:

The process of extracting useful data from large volumes of data is data mining. The data in the real-world is
heterogeneous, incomplete, and noisy. Data in huge quantities will usually be inaccurate or unreliable. These
problems may occur due to data measuring instrument or because of human errors.
Data Distribution:
Real-worlds data is usually stored on various platforms in a distributed computing environment. It might be
in a database, individual systems, or even on the internet. Practically, It is a quite tough task to make all the
data to a centralized data repository mainly due to organizational and technical concerns.
Complex Data:
Real-world data is heterogeneous, and it could be multimedia data, including audio and video, images,
complex data, spatial data, time series, and so on. Managing these various types of data and extracting useful
information is a tough task. Most of the time, new technologies, new tools, and methodologies would have
to be refined to obtain specific information.
Performance:
The data mining system's performance relies primarily on the efficiency of algorithms and techniques used.
If the designed algorithm and techniques are not up to the mark, then the efficiency of the data mining
process will be affected adversely.
Data Privacy and Security:
Data mining usually leads to serious issues in terms of data security, governance, and privacy. For example,
if a retailer analyzes the details of the purchased items, then it reveals data about buying habits and
preferences of the customers without their permission.

Er.Binay Kumar Yadav Page 6

Data Warehousing & Data Mining

Data Visualization:
In data mining, data visualization is a very important process because it is the primary method that shows
the output to the user in a presentable way. The extracted data should convey the exact meaning of what it
intends to express. But many times, representing the information to the end-user in a precise and easy way is
difficult. The input data and the output information being complicated, very efficient, and successful data
visualization processes need to be implemented to make it successful.

Data Mining Applications

Data mining is highly useful in the following domains −
 Market Analysis and Management
 Corporate Analysis & Risk Management
 Fraud Detection
Apart from these, data mining can also be used in the areas of production control, customer retention,
science exploration, sports, astrology, and Internet Web Surf-Aid.
Market Analysis and Management
Listed below are the various fields of market where data mining is used −
 Customer Profiling − Data mining helps determine what kind of people buy what kind of products.
 Identifying Customer Requirements − Data mining helps in identifying the best products for
different customers. It uses prediction to find the factors that may attract new customers.
 Cross Market Analysis − Data mining performs Association/correlations between product sales.
 Target Marketing − Data mining helps to find clusters of model customers who share the same
characteristics such as interests, spending habits, income, etc.
 Determining Customer purchasing pattern − Data mining helps in determining customer
purchasing pattern.
 Providing Summary Information − Data mining provides us various multidimensional summary
reports.
Corporate Analysis and Risk Management
Data mining is used in the following fields of the Corporate Sector −
 Finance Planning and Asset Evaluation − It involves cash flow analysis and prediction, contingent
claim analysis to evaluate assets.
 Resource Planning − It involves summarizing and comparing the resources and spending.
 Competition − It involves monitoring competitors and market directions.
Fraud Detection
Data mining is also used in the fields of credit card services and telecommunication to detect frauds. In fraud
telephone calls, it helps to find the destination of the call, duration of the call, time of the day or week, etc. It
also analyzes the patterns that deviate from expected norms.
Data Preprocessing in Data Mining
Preprocessing in Data Mining:
Data preprocessing is a data mining technique which is used to transform the raw data in a useful and
efficient format.

Er.Binay Kumar Yadav Page 7

Data Warehousing & Data Mining

Steps Involved in Data Preprocessing:

1. Data Cleaning:
The data can have many irrelevant and missing parts. To handle this part, data cleaning is done. It
involves handling of missing data, noisy data etc.
(a). Missing Data:
This situation arises when some data is missing in the data. It can be handled in various ways.
Some of them are:
i. Ignore the tuples:
This approach is suitable only when the dataset we have is quite large and multiple values are
missing within a tuple.

ii. Fill the Missing values:

There are various ways to do this task. You can choose to fill the missing values manually, by
attribute mean or the most probable value.
(b). Noisy Data:
Noisy data is a meaningless data that can’t be interpreted by machines.It can be generated due to faulty
data collection, data entry errors etc. It can be handled in following ways :
i. Binning Method:
This method works on sorted data in order to smooth it. The whole data is divided into segments of
equal size and then various methods are performed to complete the task. Each segmented is handled
separately. One can replace all data in a segment by its mean or boundary values can be used to
complete the task.
ii. Regression:
Here data can be made smooth by fitting it to a regression function.The regression used may be
linear (having one independent variable) or multiple (having multiple independent variables).
iii. Clustering:
This approach groups the similar data in a cluster. The outliers may be undetected or it will fall
outside the clusters.
2. Data Transformation:
This step is taken in order to transform the data in appropriate forms suitable for mining process. This
involves following ways:
i. Normalization:
It is done in order to scale the data values in a specified range (-1.0 to 1.0 or 0.0 to 1.0)
ii. Attribute Selection:
In this strategy, new attributes are constructed from the given set of attributes to help the mining
process.
iii. Discretization:
This is done to replace the raw values of numeric attribute by interval levels or conceptual levels.
iv. Concept Hierarchy Generation:
Here attributes are converted from lower level to higher level in hierarchy. For Example-The attribute
“city” can be converted to “country”.
3. Data Reduction:
Since data mining is a technique that is used to handle huge amount of data. While working with huge
volume of data, analysis became harder in such cases. In order to get rid of this, we uses data reduction
technique. It aims to increase the storage efficiency and reduce data storage and analysis costs.
The various steps to data reduction are:
i. Data Cube Aggregation:
Aggregation operation is applied to data for the construction of the data cube.
ii. Attribute Subset Selection:
The highly relevant attributes should be used, rest all can be discarded. For performing attribute
selection, one can use level of significance and p- value of the attribute.the attribute having p-value
greater than significance level can be discarded.
iii. Numerosity Reduction:
This enable to store the model of data instead of whole data, for example: Regression Models.

Er.Binay Kumar Yadav Page 8

Data Warehousing & Data Mining

iv. Dimensionality Reduction:

This reduce the size of data by encoding mechanisms.It can be lossy or lossless. If after reconstruction
from compressed data, original data can be retrieved, such reduction are called lossless reduction else it
is called lossy reduction. The two effective methods of dimensionality reduction are:Wavelet transforms
and PCA (Principal Component Analysis).

Data Mining – Knowledge Discovery in Databases(KDD).

Why do we need Data Mining?
Volume of information is increasing everyday than we can handle from business transactions, scientific
data, sensor data, Pictures, videos, etc. So, we need a system that will be capable of extracting essence of
information available and that can automatically generate report,
views or summary of data for better decision-making.
Why Data Mining is used in Business?
Data mining is used in business to make better managerial decisions by:
 Automatic summarization of data
 Extracting essence of information stored.
 Discovering patterns in raw data.

Data Mining also known as Knowledge Discovery in Databases, refers to the nontrivial extraction of
implicit, previously unknown and potentially useful information from data stored in databases.
Steps Involved in KDD Process:

KDD process

1. Data Cleaning: Data cleaning is defined as removal of noisy and irrelevant data from collection.
 Cleaning in case of Missing values.
 Cleaning noisy data, where noise is a random or variance error.
 Cleaning with Data discrepancy detection and Data transformation tools.
2. Data Integration: Data integration is defined as heterogeneous data from multiple sources combined in
a common source(DataWarehouse).
 Data integration using Data Migration tools.
 Data integration using Data Synchronization tools.
 Data integration using ETL(Extract-Load-Transformation) process.
3. Data Selection: Data selection is defined as the process where data relevant to the analysis is decided
and retrieved from the data collection.
 Data selection using Neural network.
 Data selection using Decision Trees.
 Data selection using Naive bayes.
 Data selection using Clustering, Regression, etc.
4. Data Transformation: Data Transformation is defined as the process of transforming data into
appropriate form required by mining procedure.
Data Transformation is a two step process:
 Data Mapping: Assigning elements from source base to destination to capture transformations.
 Code generation: Creation of the actual transformation program.

Er.Binay Kumar Yadav Page 9

Data Warehousing & Data Mining

5. Data Mining: Data mining is defined as clever techniques that are applied to extract patterns
potentially useful.
 Transforms task relevant data into patterns.
 Decides purpose of model using classification or characterization.
6. Pattern Evaluation: Pattern Evaluation is defined as identifying strictly increasing patterns
representing knowledge based on given measures.
 Find interestingness score of each pattern.
 Uses summarization and Visualization to make data understandable by user.
7. Knowledge representation: Knowledge representation is defined as technique which utilizes
visualization tools to represent data mining results.
 Generate reports.
 Generate tables.
 Generate discriminant rules, classification rules, characterization rules, etc.

Er.Binay Kumar Yadav Page 10

Data Warehouse Architecture
100% (2)
Data Warehouse Architecture
5 pages
Unit - 1 Introduction To Data Warehousing
No ratings yet
Unit - 1 Introduction To Data Warehousing
57 pages
Data Warehouse
No ratings yet
Data Warehouse
22 pages
Data Warehouse - Final
No ratings yet
Data Warehouse - Final
28 pages
Warehousing
No ratings yet
Warehousing
15 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
135 pages
DWDM Imp Qnotes - Mid1
No ratings yet
DWDM Imp Qnotes - Mid1
31 pages
DM Module 1
No ratings yet
DM Module 1
16 pages
Bca Vi Sem (Datawartehousing) Unit - I Notes
No ratings yet
Bca Vi Sem (Datawartehousing) Unit - I Notes
66 pages
Data Warehouse
No ratings yet
Data Warehouse
33 pages
Unit I Data Warehouse
No ratings yet
Unit I Data Warehouse
33 pages
Data Warehouse References
No ratings yet
Data Warehouse References
40 pages
DWDM
No ratings yet
DWDM
12 pages
DWDM U-1
No ratings yet
DWDM U-1
45 pages
Data Warehousing & OLAP Basics
No ratings yet
Data Warehousing & OLAP Basics
31 pages
Dataware Housing Notes
No ratings yet
Dataware Housing Notes
134 pages
Data Warehousing-1
No ratings yet
Data Warehousing-1
51 pages
Overview of Data Warehousing and OLAP
No ratings yet
Overview of Data Warehousing and OLAP
12 pages
Data Warehousing
No ratings yet
Data Warehousing
11 pages
DWM Unit 1
No ratings yet
DWM Unit 1
34 pages
Data Warhousing Part A
No ratings yet
Data Warhousing Part A
12 pages
Data Warehousing Essentials Guide
No ratings yet
Data Warehousing Essentials Guide
20 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
26 pages
BDA Unit 2 B.tech
No ratings yet
BDA Unit 2 B.tech
9 pages
Data Warehouse and Data Mining
No ratings yet
Data Warehouse and Data Mining
12 pages
Basic Definitions
No ratings yet
Basic Definitions
5 pages
Unit 1 DWDM
No ratings yet
Unit 1 DWDM
122 pages
Datawarehousing&Datamining: R.Kartheek B.Tech-Iii RD I.T V.R.S College, Chirala
No ratings yet
Datawarehousing&Datamining: R.Kartheek B.Tech-Iii RD I.T V.R.S College, Chirala
18 pages
$RRWYO9T
No ratings yet
$RRWYO9T
71 pages
Unit 2
No ratings yet
Unit 2
26 pages
Data Mining Final New
No ratings yet
Data Mining Final New
109 pages
Data Ware Housing1
No ratings yet
Data Ware Housing1
18 pages
Basic Definitions
No ratings yet
Basic Definitions
10 pages
Malineni Lakshmaiah Engineering College S.KONDA-523101 Andhra Pradesh
No ratings yet
Malineni Lakshmaiah Engineering College S.KONDA-523101 Andhra Pradesh
15 pages
Module 1-1basic Concepts
No ratings yet
Module 1-1basic Concepts
40 pages
Unit 1 Notes - DW
No ratings yet
Unit 1 Notes - DW
25 pages
Data Warehousing
No ratings yet
Data Warehousing
20 pages
Data Warehousing
No ratings yet
Data Warehousing
4 pages
Unit 1 Notes - DW
No ratings yet
Unit 1 Notes - DW
29 pages
Need of Two Types of Data: Information
No ratings yet
Need of Two Types of Data: Information
7 pages
Bda U2
No ratings yet
Bda U2
44 pages
Data Mining & Warehousing Basics
No ratings yet
Data Mining & Warehousing Basics
30 pages
Data Warehousing
No ratings yet
Data Warehousing
71 pages
Data Warehouse Notes
No ratings yet
Data Warehouse Notes
14 pages
Data Warehouse & OLAP Essentials
No ratings yet
Data Warehouse & OLAP Essentials
45 pages
Data Mining
No ratings yet
Data Mining
65 pages
Unit II Lecture Notes
No ratings yet
Unit II Lecture Notes
26 pages
Data Warehousing Insights
No ratings yet
Data Warehousing Insights
8 pages
Data Warehousing and Data Mining Original Notes
No ratings yet
Data Warehousing and Data Mining Original Notes
47 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
Final Interview Questions (Etl - Informatica) : Subject Oriented, Integrated, Time Variant, Non Volatile
100% (1)
Final Interview Questions (Etl - Informatica) : Subject Oriented, Integrated, Time Variant, Non Volatile
77 pages
Data Warehousing Essentials
No ratings yet
Data Warehousing Essentials
25 pages
Data Warehouse
No ratings yet
Data Warehouse
143 pages
What Is A Data Warehouse
No ratings yet
What Is A Data Warehouse
9 pages
DWDM Lecturenotes PDF
No ratings yet
DWDM Lecturenotes PDF
133 pages
Unit-1.1 Data Warehouse
No ratings yet
Unit-1.1 Data Warehouse
29 pages
Selected Topics of Recent Trends in Information Technology
No ratings yet
Selected Topics of Recent Trends in Information Technology
21 pages
Power BI
100% (1)
Power BI
49 pages
Unit 4 Association Rule Mining
No ratings yet
Unit 4 Association Rule Mining
18 pages
Unit 2
No ratings yet
Unit 2
8 pages
Unit-4 (Service Oriented Architecture)
No ratings yet
Unit-4 (Service Oriented Architecture)
9 pages
Unit - 3 Data Cube Technology
No ratings yet
Unit - 3 Data Cube Technology
6 pages
Pentaho Lab
No ratings yet
Pentaho Lab
22 pages
Cloud Ecosystem & Management Guide
No ratings yet
Cloud Ecosystem & Management Guide
8 pages
Unit 2 (Cloud Computing Architecture)
No ratings yet
Unit 2 (Cloud Computing Architecture)
22 pages
Cloud Basics for IT Professionals
No ratings yet
Cloud Basics for IT Professionals
15 pages
CS 211: Computer Architecture Cache Memory Design
No ratings yet
CS 211: Computer Architecture Cache Memory Design
32 pages
Attack Trees
No ratings yet
Attack Trees
2 pages
SQL Server Distributed Replay
No ratings yet
SQL Server Distributed Replay
43 pages
ODK Documentation
No ratings yet
ODK Documentation
571 pages
Freeds
No ratings yet
Freeds
2 pages
Online Shopping SAD 1
No ratings yet
Online Shopping SAD 1
2 pages
AUTOSAR Module-KeyM
No ratings yet
AUTOSAR Module-KeyM
8 pages
CSE 421 ID: 18101085 Transport Layer Protocols (TCP) Examination Lab
No ratings yet
CSE 421 ID: 18101085 Transport Layer Protocols (TCP) Examination Lab
5 pages
Com - Pubg.imobhile64 Logcat
No ratings yet
Com - Pubg.imobhile64 Logcat
153 pages
Answers
No ratings yet
Answers
11 pages
Operating Systems For Bca
No ratings yet
Operating Systems For Bca
5 pages
Cybersecurity
No ratings yet
Cybersecurity
18 pages
Data Analytics Question Paper
100% (2)
Data Analytics Question Paper
2 pages
Iot IAT-2
No ratings yet
Iot IAT-2
2 pages
Lecture 1 The Security Environment
No ratings yet
Lecture 1 The Security Environment
82 pages
DBMS Notes For B.TECH
No ratings yet
DBMS Notes For B.TECH
80 pages
Computer Science 2210 0478 (2023-2026) Term Wise Breakdown IX, X, XI
No ratings yet
Computer Science 2210 0478 (2023-2026) Term Wise Breakdown IX, X, XI
2 pages
Cyber Incident Recovery Guide
No ratings yet
Cyber Incident Recovery Guide
9 pages
Qt Creator Setup Guide for Ubuntu Users
No ratings yet
Qt Creator Setup Guide for Ubuntu Users
5 pages
Microsoft OST To PST Converter
No ratings yet
Microsoft OST To PST Converter
13 pages
Sy0-501 4-10-2019
No ratings yet
Sy0-501 4-10-2019
174 pages
Ooad Question Bank
No ratings yet
Ooad Question Bank
9 pages
Afzal Yousaf-Resume (Lead D.C - 2024)
No ratings yet
Afzal Yousaf-Resume (Lead D.C - 2024)
1 page
G-LBDA1361NA 002 Tattle-Tape Workstation Datasheet LR2
No ratings yet
G-LBDA1361NA 002 Tattle-Tape Workstation Datasheet LR2
2 pages
Review Enlightenment Now - The Case For Reason, Science, Humanism, and Progress
No ratings yet
Review Enlightenment Now - The Case For Reason, Science, Humanism, and Progress
9 pages
Agile PLM Sample Resume - 1
No ratings yet
Agile PLM Sample Resume - 1
4 pages
السكرتارية PDF
No ratings yet
السكرتارية PDF
1 page
CQRS DDD Notes Greg Young
100% (1)
CQRS DDD Notes Greg Young
17 pages
Z5&Z6 - System Recovery Guide - V2.0 - EN
No ratings yet
Z5&Z6 - System Recovery Guide - V2.0 - EN
12 pages
Mysql - DBeaver Error Resolving Maven Dependencies - Stack Overflow
No ratings yet
Mysql - DBeaver Error Resolving Maven Dependencies - Stack Overflow
1 page

Data Warehousing & Data Mining

Uploaded by

Data Warehousing & Data Mining

Uploaded by

Data Warehousing & Data Mining

Data: UNIT-1 INTRODUCTION

Er.Binay Kumar Yadav Page 1

Data Quality Definition

Data Quality Dimensions

Er.Binay Kumar Yadav Page 2

Important Features of Data Warehouse

Er.Binay Kumar Yadav Page 3

Components of Data warehouse

What is Data Mining?

Data Mining is the process of investigating hidden patterns of information to various

Er.Binay Kumar Yadav Page 4

Types of Data Mining

Er.Binay Kumar Yadav Page 5

Incomplete and noisy data:

Er.Binay Kumar Yadav Page 6

Data Mining Applications

Er.Binay Kumar Yadav Page 7

Steps Involved in Data Preprocessing:

ii. Fill the Missing values:

Er.Binay Kumar Yadav Page 8

iv. Dimensionality Reduction:

Data Mining – Knowledge Discovery in Databases(KDD).

Er.Binay Kumar Yadav Page 9

Er.Binay Kumar Yadav Page 10

You might also like