Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views3 pages

Highly Detailed 4 Data Mining Answers

The document explains data transformation and reduction techniques, including normalization, aggregation, and dimensionality reduction. It also details various OLAP operations such as roll-up, drill-down, slice, dice, and pivot for analyzing multidimensional data. Additionally, it covers correlation using lift with a market basket analysis example and describes the hierarchical method of clustering, including agglomerative and divisive approaches.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views3 pages

Highly Detailed 4 Data Mining Answers

The document explains data transformation and reduction techniques, including normalization, aggregation, and dimensionality reduction. It also details various OLAP operations such as roll-up, drill-down, slice, dice, and pivot for analyzing multidimensional data. Additionally, it covers correlation using lift with a market basket analysis example and describes the hierarchical method of clustering, including agglomerative and divisive approaches.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Highly Detailed Data Mining Answers

22. Explain data transformation and data reduction in detail.


**Data Transformation:**
- Converts raw data into a suitable format for mining.
- Techniques include:
1. **Normalization:** Adjusts values to a common scale (e.g., Min-Max Scaling).
2. **Aggregation:** Summarizes data at a higher level.
3. **Smoothing:** Removes noise using moving averages or binning.

**Data Reduction:**
- Reduces data volume while preserving patterns.
- Methods include:
1. **Dimensionality Reduction:** Uses Principal Component Analysis (PCA) to reduce features.
2. **Data Compression:** Encodes data efficiently.
3. **Sampling:** Uses subsets of data instead of the full dataset.

23. Explain with diagrams, various OLAP operations.


**OLAP (Online Analytical Processing) Operations:**
- Used in data warehousing to analyze multidimensional data.

**1. Roll-up:**
- Aggregates data to a higher level (e.g., summarizing sales by year instead of months).

**2. Drill-down:**
- Opposite of roll-up; moves from summarized to detailed data (e.g., breaking down sales from year
to quarter).

**3. Slice:**
- Extracts data for a single dimension (e.g., filtering sales data only for 2023).

**4. Dice:**
- Extracts a subset of data based on multiple dimensions (e.g., sales data for 2023 and product
category A).

**5. Pivot:**
- Rotates data for different perspectives (e.g., switching rows and columns in a report).

24. Explain with an example, how to perform correlation using lift.


**Lift Formula:**
- Lift = (Confidence of Rule) / (Expected Confidence)

**Example:**
- Consider a market basket analysis where:
- 20% of transactions include bread.
- 30% of transactions include milk.
- 10% of transactions include both bread and milk.

**Step 1:** Calculate Confidence:


- Confidence(Bread Milk) = P(Bread and Milk) / P(Bread)
- Confidence = 10% / 20% = 0.5 (50%)

**Step 2:** Calculate Expected Confidence:


- Expected Confidence = P(Milk) = 30% (0.3)

**Step 3:** Calculate Lift:


- Lift = 0.5 / 0.3 = 1.67

**Interpretation:**
- Since Lift > 1, it means buying bread increases the likelihood of buying milk.

25. Explain hierarchical method of clustering.


**Definition:**
- Hierarchical clustering builds a tree-like structure (dendrogram) of nested clusters.

**Types:**
1. **Agglomerative Hierarchical Clustering:**
- Starts with individual points and merges the closest clusters iteratively.
- Uses linkage methods:
- **Single Linkage:** Merges clusters based on the shortest distance.
- **Complete Linkage:** Merges clusters based on the farthest distance.
- **Average Linkage:** Uses the average distance between clusters.

2. **Divisive Hierarchical Clustering:**


- Starts with one large cluster and splits it iteratively.
- Less common than agglomerative clustering.

**Example:**
- Used in gene expression analysis to group similar gene sequences.
- Helps in customer segmentation by grouping similar buying behaviors.

You might also like