Highly Detailed Data Mining Answers
22. Explain data transformation and data reduction in detail.
**Data Transformation:**
- Converts raw data into a suitable format for mining.
- Techniques include:
1. **Normalization:** Adjusts values to a common scale (e.g., Min-Max Scaling).
2. **Aggregation:** Summarizes data at a higher level.
3. **Smoothing:** Removes noise using moving averages or binning.
**Data Reduction:**
- Reduces data volume while preserving patterns.
- Methods include:
1. **Dimensionality Reduction:** Uses Principal Component Analysis (PCA) to reduce features.
2. **Data Compression:** Encodes data efficiently.
3. **Sampling:** Uses subsets of data instead of the full dataset.
23. Explain with diagrams, various OLAP operations.
**OLAP (Online Analytical Processing) Operations:**
- Used in data warehousing to analyze multidimensional data.
**1. Roll-up:**
- Aggregates data to a higher level (e.g., summarizing sales by year instead of months).
**2. Drill-down:**
- Opposite of roll-up; moves from summarized to detailed data (e.g., breaking down sales from year
to quarter).
**3. Slice:**
- Extracts data for a single dimension (e.g., filtering sales data only for 2023).
**4. Dice:**
- Extracts a subset of data based on multiple dimensions (e.g., sales data for 2023 and product
category A).
**5. Pivot:**
- Rotates data for different perspectives (e.g., switching rows and columns in a report).
24. Explain with an example, how to perform correlation using lift.
**Lift Formula:**
- Lift = (Confidence of Rule) / (Expected Confidence)
**Example:**
- Consider a market basket analysis where:
- 20% of transactions include bread.
- 30% of transactions include milk.
- 10% of transactions include both bread and milk.
**Step 1:** Calculate Confidence:
- Confidence(Bread Milk) = P(Bread and Milk) / P(Bread)
- Confidence = 10% / 20% = 0.5 (50%)
**Step 2:** Calculate Expected Confidence:
- Expected Confidence = P(Milk) = 30% (0.3)
**Step 3:** Calculate Lift:
- Lift = 0.5 / 0.3 = 1.67
**Interpretation:**
- Since Lift > 1, it means buying bread increases the likelihood of buying milk.
25. Explain hierarchical method of clustering.
**Definition:**
- Hierarchical clustering builds a tree-like structure (dendrogram) of nested clusters.
**Types:**
1. **Agglomerative Hierarchical Clustering:**
- Starts with individual points and merges the closest clusters iteratively.
- Uses linkage methods:
- **Single Linkage:** Merges clusters based on the shortest distance.
- **Complete Linkage:** Merges clusters based on the farthest distance.
- **Average Linkage:** Uses the average distance between clusters.
2. **Divisive Hierarchical Clustering:**
- Starts with one large cluster and splits it iteratively.
- Less common than agglomerative clustering.
**Example:**
- Used in gene expression analysis to group similar gene sequences.
- Helps in customer segmentation by grouping similar buying behaviors.