UNIT – I INTRODUCTION
1. Data Mining
Definition: Data mining is the process of discovering meaningful
patterns, correlations, trends, or anomalies from large datasets using
statistical, mathematical, and machine learning techniques.
Purpose: To extract actionable insights that support decision-making in
business.
Applications:
o Customer segmentation
o Fraud detection
o Market basket analysis
o Sales forecasting
Example: Analyzing supermarket sales data to find that customers buying bread
also often buy butter.
2. Text Mining
Definition: Text mining refers to the process of extracting useful
information from unstructured textual data.
Purpose: To convert unstructured text into structured data for analysis.
Techniques:
o Natural Language Processing (NLP)
o Sentiment Analysis
o Topic Modeling
Applications:
o Analyzing customer reviews
o Monitoring social media sentiment
o Legal document analysis
Example: A company analyzing online reviews to understand customer
perception of a new product.
3. Web Mining
Definition: Web mining is the process of using data mining techniques to
extract information from web-based data.
Types:
o Web Content Mining: Extracting data from web page content.
o Web Structure Mining: Analyzing link structures between
websites.
o Web Usage Mining: Understanding user behavior from web logs.
Applications:
o Personalizing user experiences on websites
o Recommender systems (e.g., YouTube, Amazon)
o Improving web design and navigation
Example: Tracking which pages users visit before making a purchase.
4. Spatial Mining
Definition: Spatial mining involves extracting patterns from spatial
(geographical) data.
Purpose: To find patterns related to physical location or movement.
Applications:
o Urban planning
o Logistics and transportation
o Location-based marketing
Example: A retailer using spatial mining to identify high-performing store
locations based on foot traffic and demographics.
5. Process Mining
Definition: Process mining analyzes business processes based on event
logs to discover, monitor, and improve actual processes.
Purpose: To bridge the gap between process models and real-world
execution.
Techniques:
o Discovery (create models from data)
o Conformance (compare existing models with logs)
o Enhancement (improve process models)
Applications:
o Workflow optimization
o Compliance auditing
o Bottleneck identification
Example: A bank analyzing how loan applications are processed to reduce
approval time.
6. Data Warehouse
Definition: A data warehouse is a centralized repository that stores
integrated data from multiple sources for reporting and analysis.
Characteristics:
o Subject-oriented
o Time-variant
o Non-volatile
Purpose: To support decision-making by providing a consistent view of
data over time.
Applications:
o Executive dashboards
o Historical data analysis
o Business intelligence tools
Example: A company storing years of sales and financial data in a warehouse
for strategic planning.
7. Data Marts
Definition: A data mart is a subset of a data warehouse, focused on a
specific business area or department.
Types:
o Dependent: Created from the central data warehouse
o Independent: Built directly from operational systems
Purpose: To provide fast and easy access to relevant data for a specific
team.
Applications:
o Marketing analytics
o Financial reporting
o Human resources data analysis
Example: The marketing department using a data mart to analyze campaign
performance.
Summary Table
Concept Data Type Key Use Example Use Case
Data Mining Structured data Pattern discovery Sales prediction
Unstructured
Text Mining Sentiment, insights Analyzing product reviews
text
Website user path
Web Mining Web data Behavior analysis
optimization
Site selection for new
Spatial Mining Location data Geo-patterns
stores
Process Workflow Reducing process
Event logs
Mining optimization bottlenecks
Data Company-wide
Integrated data Central analytics
Warehouse performance analysis
Departmental Sales team tracking
Data Mart Specific analytics
data monthly targets