Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
24 views3 pages

Detailed 12 Data Mining Answers

Data mining is the extraction of meaningful patterns from large datasets using statistical and machine learning techniques, with applications in fraud detection and market analysis. Key concepts include interestingness, data preprocessing categories, and classifiers like Support Vector Machines. Additionally, topics such as lazy learning, regression, clustering methods, and text mining are discussed, highlighting their significance in analyzing and interpreting data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views3 pages

Detailed 12 Data Mining Answers

Data mining is the extraction of meaningful patterns from large datasets using statistical and machine learning techniques, with applications in fraud detection and market analysis. Key concepts include interestingness, data preprocessing categories, and classifiers like Support Vector Machines. Additionally, topics such as lazy learning, regression, clustering methods, and text mining are discussed, highlighting their significance in analyzing and interpreting data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Detailed Answers on Data Mining

1. What do you mean by data mining?


Data mining is the process of extracting meaningful patterns, trends, and knowledge from large
datasets using techniques from statistics, machine learning, and databases. It helps in
decision-making and discovering hidden insights.

**Applications:**
- Fraud detection
- Market analysis
- Customer segmentation

2. What do you mean by interestingness?


Interestingness measures the significance and usefulness of patterns found in data mining. It is
evaluated using various metrics:
- **Support:** Frequency of occurrence of an itemset in the dataset.
- **Confidence:** Probability that a rule holds true.
- **Lift:** Measures how much more likely two items appear together than expected by chance.

3. Mention the 4 categories of data preprocessing.


1. **Data Cleaning:** Removing noise, handling missing values.
2. **Data Integration:** Merging data from multiple sources.
3. **Data Transformation:** Converting data into suitable formats (e.g., normalization).
4. **Data Reduction:** Reducing data size while preserving meaningful information (e.g., PCA).

4. What is technical metadata in a data warehouse?


Technical metadata provides information about the structure and properties of stored data:
- **Data types:** Integer, string, date.
- **Indexes:** Improve query performance.
- **Relationships:** Define connections between tables.
- **Data lineage:** Tracks data origin and transformations.

5. What do you mean by scalability of a classifier?


Scalability refers to a classifier's ability to handle increasing dataset sizes efficiently without
significant performance degradation. A scalable classifier:
- Maintains accuracy with large datasets.
- Uses optimized algorithms (e.g., SVM, decision trees, deep learning).

6. What is the objective of SVM?


Support Vector Machine (SVM) aims to find an optimal hyperplane that best separates different
classes in a dataset. The goal is to maximize the margin between the closest points (support
vectors) to improve classification accuracy.

7. What is lazy learning? Give an example.


Lazy learning defers model training until a query is made. Unlike eager learning, it stores training
data and performs computations at prediction time.

**Example:** k-Nearest Neighbors (k-NN) predicts labels based on the closest training examples.

8. What is regression?
Regression is a statistical method used to predict continuous values based on independent
variables.

**Example:** Predicting house prices based on square footage, location, and number of bedrooms.

9. What is a continuous ordinal variable? Give an example.


A continuous ordinal variable has ordered categories with meaningful numerical differences.

**Example:** Customer satisfaction rating on a scale from 1 to 10.

10. What do you mean by partitioning methods of clustering?


Partitioning methods divide a dataset into k clusters based on similarity. Examples:
- **k-Means:** Assigns data points to k clusters by minimizing intra-cluster variance.
- **k-Medoids:** Uses actual data points as cluster centers.

11. What do you mean by feature descriptor?


A feature descriptor captures essential characteristics of an object in pattern recognition and
computer vision.

**Example:** SIFT (Scale-Invariant Feature Transform) detects key image features for object
recognition.

12. What is text mining?


Text mining extracts meaningful insights from unstructured text data using Natural Language
Processing (NLP) techniques.

**Applications:**
- Sentiment analysis
- Spam detection
- Document classification

You might also like