0% found this document useful (0 votes)

3 views19 pages

? Data Preprocessing

Uploaded by

sarvavaishnavi4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views19 pages

? Data Preprocessing

Uploaded by

sarvavaishnavi4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 19

🔹 Data Preprocessing: An Overview

Real-world data are often dirty, incomplete, noisy, and inconsistent.

If we directly apply mining algorithms, we get poor or misleading results.
👉 Hence, we must preprocess the data.

1. What Defines Data Quality?

Data have quality if they satisfy the requirements of the intended use.

Key Factors of Data Quality:

1. Accuracy

o Correctness of values.

o Example: Age recorded as 250 → inaccurate.

o Cause: faulty sensors, entry errors, wrong units.

2. Completeness

o Missing attribute values or tuples.

o Example: Customer record missing phone number.

o Cause: not recorded, ignored, deleted, or lost.

3. Consistency

o Different representations of the same thing.

o Example: “Dept01” vs “D01” for the same department.

4. Timeliness

o Data should be up to date.

o Example: Sales reports submitted late → incomplete at month-end.

5. Believability

o Do users trust the data?

o Example: A database once had many errors → even after fixing, users may still
distrust it.

6. Interpretability

o How easily can users understand the data?

o Example: Database uses obscure accounting codes → hard for sales team to
interpret.

👉 Different users may perceive quality differently.

 A marketing analyst may accept 80% accurate customer addresses for campaigns.

 A sales manager may find the same data unreliable.

2. Major Tasks in Data Preprocessing

The four major preprocessing tasks are:

2.1 Data Cleaning

Fixes dirty (incomplete, noisy, inconsistent) data.

Methods:

 Fill in missing values

o Example: If “Age” is missing:

 Replace with mean/median/mode.

 Use most probable value (via regression, classification).

 Smooth noisy data

o Example: Sensor reading 50, 52, 500, 51 → smooth using binning or moving
average.

 Identify/remove outliers

o Example: Salary of 999999 among typical 30k–80k.

 Resolve inconsistencies

o Example: Date formats (12/01/24 vs 01-12-2024).

 Remove duplicates

o Example: Two identical customer records.

2.2 Data Integration

Combines data from multiple sources (databases, files, data cubes).

Problems:

 Schema mismatch → customer_id vs cust_id.

 Value mismatch → “Bill” vs “William.”

 Redundancy → same customer appearing in two datasets.

Example:

 Sales database has cust_id, purchase_amount.

 Customer database has customer_id, income.

👉 Integration merges them for richer analysis.

2.3 Data Reduction

Reduces the dataset size while preserving knowledge.

This makes mining faster.

Types:

1. Dimensionality Reduction (reduce attributes)

o Remove irrelevant or redundant features.

o Example: Drop “middle name” from customer dataset if not useful.

o Techniques:

 Attribute subset selection

 Attribute construction (derive new features, e.g., BMI from

weight/height)

 PCA (Principal Component Analysis)

2. Numerosity Reduction (reduce data volume)

o Replace data with smaller representations.

o Techniques:

 Parametric models: Regression, log-linear models.

Example: Fit a regression line instead of storing all points.

 Non-parametric models: Histograms, clustering, sampling.

Example: Store age distribution as bins (20–29: 2000 people, 30–39:
1500 people) instead of raw ages.

2.4 Data Transformation

Convert data into formats suitable for mining.

Methods:

1. Normalization (Scaling)

o Adjust values to a smaller range (e.g., [0,1] or [-1,1]).

o Example:

 Age = 25, Salary = 75,000.

 Without normalization, salary dominates distance-based algorithms

(kNN, clustering).

 After normalization: Age = 0.25, Salary = 0.75.

2. Discretization

o Convert continuous attributes into categories.

o Example:

 Age = 25 → “Youth”

 Income = 65,000 → “Medium Income.”

o Methods:

 Static (predefined ranges).

 Dynamic (data-driven, e.g., clustering).

3. Concept Hierarchy Generation

o Replace raw values with higher-level concepts.

o Example:

 City → State → Country

 Age (23, 27, 31) → “Young Adult”

3. Putting It All Together (Example Workflow)

Imagine you’re analyzing AllElectronics sales data.

1. Data Cleaning

o Fill missing “on_sale” attribute (if missing, mark as “unknown”).

o Fix date formats (2025-09-10 vs 10/09/2025).

o Remove duplicate transactions.

2. Data Integration

o Merge sales DB with customer demographics DB.

o Resolve mismatch: cust_id vs customer_id.

3. Data Reduction

o Drop irrelevant attributes like “middle name.”

o Use PCA to reduce 50 demographic features to 10 principal components.

o Sample 10% of transactions for testing.

4. Data Transformation

o Normalize attributes: age → [0,1], income → [0,1].

o Discretize income: “Low,” “Medium,” “High.”

o Generalize location: “Hyderabad” → “Telangana” → “India.”

Now, the data are clean, consistent, compact, and ready for mining.

🔹 Data Cleaning

Real-world data is often incomplete, noisy, or inconsistent.

Data cleaning (a.k.a. data cleansing) fixes these problems so mining results are reliable.

It mainly deals with:

1. Missing values

2. Noisy data

3. Inconsistent/discrepant data

1. Handling Missing Values

When some attributes have no values recorded (e.g., customer income).

Methods:

1. Ignore the tuple

o Drop the record if class label is missing.

o Bad if too many missing values → data loss.

2. Fill manually

o Possible for small datasets.

o Not scalable for big data.

3. Global constant

o Replace missing with “Unknown” or NULL.

o Risk: program may treat "Unknown" as a real category.

4. Central tendency (mean/median)

o If distribution is normal → use mean.

o If skewed → use median.

o Example: replace missing income with mean $56,000.

5. Class-wise mean/median

o Replace with average of same class.

o Example: For "High-risk" customers → use mean income of that group.

6. Most probable value (prediction models)

o Estimate using regression, Bayesian inference, or decision trees.

o Uses most info → preserves relationships best.

o Example: predict income using other customer attributes.

👉 Note: Missing ≠ error always. Sometimes it means “Not Applicable” (e.g., no driver’s
license). So metadata (rules about nulls) should guide replacement.

2. Handling Noisy Data

Noise = random error or variance in a measured variable.

Techniques:

1. Binning (local smoothing)

o Sort data, put into bins (equal width or equal frequency).

o Replace values in bin with mean, median, or boundaries.

o Example:
Original prices = [4, 8, 15] → bin mean = 9 → replaced as [9, 9, 9].

2. Regression (global smoothing)

o Fit a function (linear/multiple regression).

o Example: Predict price using other attributes.

3. Outlier analysis

o Use clustering or statistical tests.

o Values outside clusters = possible outliers.

3. Data Cleaning as a Process

It’s not just one step, but an iterative process:

Step 1: Discrepancy Detection

 Causes:

o Poorly designed forms (optional fields)

o Human error

o Data decay (outdated addresses)

o Integration issues (different formats/names)

 Use metadata:

o Attribute domain, range, type.

o Check mean, median, mode, skewness, std dev.

o Identify outliers & anomalies.

Step 2: Data Transformation

 Fix issues via:

o Data scrubbing tools (domain knowledge, fuzzy matching, spell-checking).

o Data auditing tools (find rules/relationships → flag violations).

o Data migration & ETL tools (transform formats).

o Custom scripts (SQL, Python, etc.).

Challenges:

 Iterative, error-prone.

 Some fixes create new errors.

 Often requires multiple iterations.

New Approaches:

 Interactive cleaning tools (e.g., Potter’s Wheel):

o Spreadsheet-like, immediate feedback, undo transformations.

o Discrepancy detection in background.

 Declarative languages:

o SQL extensions for specifying transformations.

 Metadata updates:

o Always update metadata after cleaning → future cleaning is easier.

Data Integration (Simplified Explanation)

When we mine data, often the data comes from different sources (databases, files, data
warehouses).
Before using it, we must integrate (combine) this data carefully to avoid errors, duplication,
and inconsistencies.

Challenges in Data Integration

1. Entity Identification Problem

o Same real-world thing may have different names in different databases.

o Example:

 One database: customer_id

 Another database: cust_number

 Both actually mean the same thing.

o Metadata (info about data: name, type, range, rules) helps in matching.

o Need to also check constraints:

 Example: Discount applied to an order vs. applied to each item →

must align.

2. Redundancy & Correlation Analysis

o Sometimes, two attributes contain the same information.

o Example: annual_revenue can be derived from monthly_revenue × 12.

o To detect redundancy, use correlation tests:

 For Nominal Data (categories like Male/Female, Fiction/Nonfiction):

 Use Chi-square (χ²) test.

 It checks if two attributes are independent or correlated.

 Example: Gender vs. Preferred Reading → strong correlation

found.

 For Numeric Data (numbers like income, marks):

 Use Correlation Coefficient (r):

 r > 0 → Positive correlation (A ↑, B ↑).

 r < 0 → Negative correlation (A ↑, B ↓).

 r = 0 → No correlation.

 Use Covariance (measures joint variation).

 Positive covariance → values rise together.

 Negative covariance → one rises, other falls.

 Example: Stock prices of two companies moving

together → positive covariance.

3. Tuple Duplication

o Same record may appear multiple times.

o Example: Two identical purchase orders in different systems.

o Can also occur when using denormalized tables (storing same info multiple
times).

o May lead to inconsistencies (e.g., same customer with different addresses).

4. Data Value Conflicts

o Even if the same entity is identified, values may conflict:

 Representation difference → Dates (25/12/2010 vs. 2010/12/25).

 Unit difference → Weight in kg vs. lbs.

 Currency/scale → Room price in ₹ vs. $.

 Abstraction level → Sales for a branch vs. sales for a region.

 Encoding differences → Pay type stored as H/S in one DB and 1/2 in
another.

o Must apply transformation rules to make them consistent.

🌐 What is Data Reduction?

When we mine very large datasets, analysis becomes slow and expensive.
So, we reduce the data size (rows/columns/values) without losing important patterns.

👉 Goal: Make mining faster while keeping results almost the same as with the original data.

Three Main Data Reduction Strategies

1. Dimensionality Reduction

👉 Reduce the number of attributes (columns/features).

 Why? Many features are irrelevant, redundant, or weakly correlated.

 Techniques:

o Wavelet Transform (DWT): Compresses data by keeping only strong

coefficients.
🔹 Example: An image with 1024 pixels → DWT keeps only the top 100
coefficients → you can still reconstruct the image roughly.

o Principal Component Analysis (PCA): Creates new features (principal

components) that capture most of the data’s variance.
🔹 Example: If a dataset has Height (cm) and Weight (kg), PCA might create a
single new variable “Body size” that summarizes both.

o Attribute Subset Selection: Just drop unnecessary features.

🔹 Example: A dataset with {Name, Age, Age in Months, Height}. “Age in
Months” is redundant → remove it.

2. Numerosity Reduction

👉 Replace the dataset with a smaller, approximate form.

 Parametric (model-based): Use a formula/model instead of raw data.

o Regression: Instead of storing 1M salary records, store a regression line

(Salary = 3000 + 200 × Experience).

o Log-linear Models: Approximate multi-dimensional distributions.

 Non-Parametric: Use data structures instead of equations.

o Histograms: Store only frequency counts for ranges.

🔹 Example: Instead of storing 10,000 student scores, store: {0–10: 50, 10–20:
120, 20–30: 300, …}.

o Clustering: Group similar data and store only cluster centers.

🔹 Example: In a dataset of 1M customers, cluster into 50 groups → store only
averages.

o Sampling: Take a small random sample that represents the whole.

🔹 Example: Take 5,000 rows instead of 5M rows.

o Data Cube Aggregation: Pre-compute and store summary values (e.g., totals,
averages).
🔹 Example: Instead of storing every daily sales transaction, keep monthly
sales totals.

3. Data Compression

👉 Transform the data into a smaller form.

 Lossless compression: Can fully reconstruct original.

🔹 Example: ZIP compression.

 Lossy compression: Can only reconstruct an approximation, but good enough.

🔹 Example: JPEG image compression (not every pixel preserved, but visually same).

Note: Both dimensionality reduction and numerosity reduction are also kinds of
compression.

📌 Data Transformation & Data Discretization

🔹 What is Data Transformation?

It is the process of converting data into a more suitable format for mining.
This makes the mining process more efficient and the patterns easier to understand.

🔹 Main Data Transformation Strategies

1. Smoothing

 Removes noise (random errors/outliers) from data.

 Techniques:
o Binning → Group values into bins and replace with bin mean/median.

o Regression → Fit a line/curve and smooth data.

o Clustering → Group similar values and smooth within groups.

👉 Example: Exam scores = {40, 42, 38, 100, 41}.

Outlier (100) can be smoothed by binning → bin = {38–42}, mean = 40.

2. Attribute Construction (Feature Construction)

 Create new attributes from existing ones to help mining.

👉 Example: From (height, weight), construct “BMI = weight / height²”.

3. Aggregation

 Combine data to a higher level.

 Often used in data cubes for OLAP.

👉 Example:

 Daily sales → Monthly sales → Yearly sales.

4. Normalization

 Scale attributes into a smaller range (e.g., [0,1] or [-1,1]).

 Useful for distance-based methods (clustering, kNN, neural nets).

Techniques:

 Min-Max Normalization
Formula:

v′=v−min⁡(A)max⁡(A)−min⁡(A)×(new_max−new_min)+new_minv' = \frac{v - \min(A)}{\max(A)

- \min(A)} \times (new\_max - new\_min) + new\_minv′=max(A)−min(A)v−min(A)
×(new_max−new_min)+new_min

👉 Example: Income $73,600 with min=12,000, max=98,000 → normalized to 0.716 in [0,1].

 Z-score Normalization (Standardization)

Formula:

v′=v−μσv' = \frac{v - \mu}{\sigma}v′=σv−μ

👉 Example: Income $73,600 with mean=54,000, SD=16,000 → z = 1.225.

 Decimal Scaling
Move decimal point by factor of 10^j.
👉 Example: Value 986 → 0.986 (divided by 1000).

5. Discretization

 Convert continuous values into interval labels or concept labels.

 Helps in simplification & concept hierarchy generation.

👉 Example:

 Age (numeric) → {0–10, 11–20, …}

 Or Age → {Youth, Adult, Senior}.

6. Concept Hierarchy Generation for Nominal Data

 Generalize categorical attributes into higher levels.

👉 Example:
Street → City → Country.
Product ID → Category → Department.

🔹 Data Discretization Techniques

Discretization = reducing continuous attributes into a few intervals.

It can be:

 Supervised → Uses class info (e.g., Decision Tree, ChiMerge).

 Unsupervised → No class info (e.g., Binning, Histogram).

 Top-down (Splitting) → Start broad, split further.

 Bottom-up (Merging) → Start detailed, merge intervals.

1. Binning (unsupervised, top-down)

 Equal-width bins (e.g., 0–10, 10–20, 20–30).

 Equal-frequency bins (e.g., 10 students per bin).

👉 Example: Income values into 3 bins of equal width.

2. Histogram Analysis (unsupervised)

 Partition values into disjoint ranges (buckets).

 Equal-width histogram = same size bins.

 Equal-frequency histogram = same number of tuples per bin.

👉 Example: Prices bucketed into {0–100, 100–200, …}.

3. Clustering (unsupervised, data-driven)

 Group attribute values into clusters → each cluster = interval.

 Closer data points go into the same cluster.

👉 Example: Age values grouped into natural clusters: {0–15}, {16–35}, {36–60}, {60+}.

4. Decision Tree Analysis (supervised, top-down)

 Use class labels & entropy to choose best split points.

 Produces intervals that improve classification accuracy.

👉 Example: Symptom “Temperature” discretized into {<37.5 = Normal, ≥37.5 = Fever} based
on diagnosis labels.

5. Correlation Analysis (ChiMerge, supervised, bottom-up)

 Merge intervals with similar class distributions.

 Uses chi-square test to decide merging.

👉 Example: Age values {21–25} and {26–30} both mostly map to “Student” → merge them.

📌 Basic Statistical Descriptions of Data

👉 Before preprocessing, we need an overall picture of the data.

Statistical descriptions help us:

 Understand center of the data.

 Understand spread (dispersion) of the data.

 Identify outliers & noise.

 Visualize data distribution.

🔹 1. Measures of Central Tendency

These describe the center of the data distribution.

(a) Mean (Arithmetic Average)

Mean=x1+x2+...+xNN\text{Mean} = \frac{x_1 + x_2 + ... + x_N}{N}Mean=Nx1+x2+...+xN

👉 Example: Salaries = {30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110}

Mean=69612=58\text{Mean} = \frac{696}{12} = 58Mean=12696=58

So, average salary = $58,000.

⚡ Problem: Sensitive to outliers (e.g., 110K pushes the mean up).

✔️Fix: Use trimmed mean (remove extreme top/bottom % before averaging).

(b) Median (Middle Value)

 Middle value when data is sorted.

 If odd N → median = exact middle.

 If even N → average of two middle values.

👉 Example: Salaries above → N=12 (even).

Middle values = 52, 56 → median = (52+56)/2 = 54 (=$54,000).

⚡ Advantage: Less affected by outliers/skewness.

(c) Mode (Most Frequent Value)

 Value that occurs most often.

 Data can be:

o Unimodal → 1 mode.

o Bimodal → 2 modes.

o Multimodal → >2 modes.

👉 Example: Salaries → Modes = 52, 70 → Bimodal.

💡 Relation (for moderately skewed data):

Mode≈3×Median−2×Mean\text{Mode} \approx 3 \times \text{Median} - 2 \times \

text{Mean}Mode≈3×Median−2×Mean
(d) Midrange

 Average of min and max values.

Midrange=min⁡+max⁡2\text{Midrange} = \frac{\min + \max}{2}Midrange=2min+max

👉 Example: Salaries → min=30, max=110

Midrange=30+1102=70\text{Midrange} = \frac{30+110}{2} = 70Midrange=230+110=70

So, midrange = $70,000.

✅ Summary of Central Measures

Measure Strength Weakness

Mean Uses all data Affected by outliers

Median Resistant to outliers Ignores distribution shape

Mode Works for categorical & numeric May not be unique

Midrange Easy to compute Highly sensitive to outliers

🔹 2. Measures of Dispersion (Spread)

(You’ll see this in the next part of the book, but summarizing for context):

 Range = max – min.

 Quartiles & IQR = Q3 – Q1.

 Variance & Standard Deviation = average squared deviation from mean.

 Boxplots help visualize spread & outliers.

🔹 3. Graphical Displays

 Bar charts, pie charts, line graphs → simple summaries.

 Histograms → distribution of numeric data.

 Scatter plots → correlation between 2 attributes.

 Quantile plots, Q-Q plots → compare distributions.

🔹 Skewness of Data

 Symmetric → mean = median = mode.

 Positively Skewed (right tail) → mean > median > mode.

 Negatively Skewed (left tail) → mean < median < mode.

👉 Example:

 Salaries (with 110 as outlier) → positively skewed (few very high salaries).

1. Range

 Definition: Difference between the maximum and minimum values.

 Formula:

Range=max⁡(X)−min⁡(X)\text{Range} = \max(X) - \min(X)Range=max(X)−min(X)

 Example: Salaries = {30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110}

Range=110−30=80\text{Range} = 110 - 30 = 80Range=110−30=80

🔹 2. Quartiles & Interquartile Range (IQR)

 Quartiles divide ordered data into 4 equal parts.

o Q1 = 25th percentile (cuts lowest 25%)

o Q2 = Median = 50th percentile

o Q3 = 75th percentile (cuts lowest 75%)

 Interquartile Range (IQR): Range of the middle 50% of data.

IQR=Q3−Q1IQR = Q3 - Q1IQR=Q3−Q1

 Example (same salaries, N=12):

o Q1 = 3rd value = 47

o Q2 = (6th + 7th)/2 = (52+56)/2 = 54

o Q3 = 9th value = 63

o IQR=63−47=16IQR = 63 - 47 = 16IQR=63−47=16

🔹 3. Five-Number Summary

 Provides a quick snapshot of distribution:

{Minimum, Q1, Median, Q3, Maximum}\{ \text{Minimum, Q1, Median, Q3, Maximum} \}
{Minimum, Q1, Median, Q3, Maximum}

 Example (salaries):

{30,47,54,63,110}\{30, 47, 54, 63, 110\}{30,47,54,63,110}

🔹 4. Boxplot (Graphical Display)

 Visual representation of the five-number summary.

 Components:

o Box → from Q1 to Q3 (IQR)

o Line inside box → Median

o Whiskers → extend to Min & Max (within 1.5 × IQR)

o Outliers → points beyond whiskers

👉 Helps to spot skewness and outliers.

🔹 5. Variance (σ²)

 Definition: Average of squared deviations from the mean.

 Formula:

σ2=1N∑i=1N(xi−xˉ)2\sigma^2 = \frac{1}{N} \sum_{i=1}^N (x_i - \bar{x})^2σ2=N1 i=1∑N(xi

−xˉ)2

 Example (salaries, mean = 58):

σ2=379.17\sigma^2 = 379.17σ2=379.17

🔹 6. Standard Deviation (σ)

 Definition: Square root of variance → expresses spread in original units.

 Formula:

σ=σ2\sigma = \sqrt{\sigma^2}σ=σ2

 Example (above data):

σ=379.17≈19.47\sigma = \sqrt{379.17} \approx 19.47σ=379.17≈19.47

🔹 7. Outlier Detection

 Rule of Thumb:

o Outlier if

x<Q1−1.5×IQRorx>Q3+1.5×IQRx < Q1 - 1.5 \times IQR \quad \text{or} \quad x > Q3 + 1.5 \
times IQRx<Q1−1.5×IQRorx>Q3+1.5×IQR

 Example (salaries):

o Q1 = 47, Q3 = 63, IQR = 16

o Lower fence = 47 - 1.5(16) = 23

o Upper fence = 63 + 1.5(16) = 87

o Any value > 87 = outlier → 110 is an outlier

Data Integration & Manipulation Guide
No ratings yet
Data Integration & Manipulation Guide
10 pages
Research Methodology MCQ Questions With Answers
100% (5)
Research Methodology MCQ Questions With Answers
48 pages
Assessment in Learning 2 Course
No ratings yet
Assessment in Learning 2 Course
14 pages
Data Mining Requires Collecting Great Amount of Data (Available in Data Warehouses or Databases) To Achieve The Intended Objective
No ratings yet
Data Mining Requires Collecting Great Amount of Data (Available in Data Warehouses or Databases) To Achieve The Intended Objective
37 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
50 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Machine Learning
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Machine Learning
35 pages
Data Pre Processing
No ratings yet
Data Pre Processing
48 pages
Data Preprocessing Part 1
No ratings yet
Data Preprocessing Part 1
14 pages
ICS 2408 - Lecture 2 - Data Preprocessing
No ratings yet
ICS 2408 - Lecture 2 - Data Preprocessing
29 pages
Data Preprocessing Techniques Guide
No ratings yet
Data Preprocessing Techniques Guide
32 pages
Data Preprocessing Essentials
No ratings yet
Data Preprocessing Essentials
33 pages
Unit - II
No ratings yet
Unit - II
56 pages
03preprocessing Part1
No ratings yet
03preprocessing Part1
21 pages
Math Jan 2008 Exam S1
No ratings yet
Math Jan 2008 Exam S1
24 pages
Data Preprocessing Essentials
No ratings yet
Data Preprocessing Essentials
41 pages
Hypotheis Testing
No ratings yet
Hypotheis Testing
12 pages
DataPreprocessing 2
No ratings yet
DataPreprocessing 2
68 pages
Data Preprocessing - Cleaning and Normalization
No ratings yet
Data Preprocessing - Cleaning and Normalization
11 pages
Statgraphics Centurion XVII User Manual
100% (2)
Statgraphics Centurion XVII User Manual
319 pages
WINSEM2023-24 - BECE352E - ETH - VL2023240504409 - 2024-02-03 - Reference-Material-I 2
No ratings yet
WINSEM2023-24 - BECE352E - ETH - VL2023240504409 - 2024-02-03 - Reference-Material-I 2
16 pages
03 Preprocessing
No ratings yet
03 Preprocessing
18 pages
Quality Management Practice in Ethiopia: African Journal of Business Management
No ratings yet
Quality Management Practice in Ethiopia: African Journal of Business Management
11 pages
3 DSEngineering
No ratings yet
3 DSEngineering
64 pages
3 Graphical Descriptive Techniques 2
No ratings yet
3 Graphical Descriptive Techniques 2
41 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
38 pages
Correlation
No ratings yet
Correlation
14 pages
Unit 3
No ratings yet
Unit 3
22 pages
Data Mining Basics
No ratings yet
Data Mining Basics
52 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
Collecting Presenting
No ratings yet
Collecting Presenting
18 pages
Unit 2
No ratings yet
Unit 2
11 pages
Stats 2022-2
No ratings yet
Stats 2022-2
6 pages
Module2 DataPreprocessing
No ratings yet
Module2 DataPreprocessing
27 pages
Unit 3
No ratings yet
Unit 3
18 pages
Data Cleaning and Data Transformation
No ratings yet
Data Cleaning and Data Transformation
13 pages
Chapitre 2
No ratings yet
Chapitre 2
44 pages
STPM Math Analysis: Data & Probability
No ratings yet
STPM Math Analysis: Data & Probability
1 page
DM Unit2
No ratings yet
DM Unit2
9 pages
Statistics: Types, Data, and Measures
No ratings yet
Statistics: Types, Data, and Measures
6 pages
DWDM LS3 Fall 24 25
No ratings yet
DWDM LS3 Fall 24 25
50 pages
Maths Grade 11 MEMO FINAL
No ratings yet
Maths Grade 11 MEMO FINAL
7 pages
Case Study 2 Motion Picture Industry Monica B Thomas
No ratings yet
Case Study 2 Motion Picture Industry Monica B Thomas
1 page
VIPDMTheory Chapter 3
No ratings yet
VIPDMTheory Chapter 3
87 pages
Aiml Data Preprocessing
No ratings yet
Aiml Data Preprocessing
99 pages
CHAPTER 8 DATA ANALYSIS (Autosaved)
No ratings yet
CHAPTER 8 DATA ANALYSIS (Autosaved)
115 pages
Pre Processing
No ratings yet
Pre Processing
68 pages
22UCS303 DS-Unit II-N
No ratings yet
22UCS303 DS-Unit II-N
71 pages
DWDM 3
No ratings yet
DWDM 3
12 pages
Skor Hasil Lempar Cakram B. Data Terserak (Tunggal)
No ratings yet
Skor Hasil Lempar Cakram B. Data Terserak (Tunggal)
8 pages
Report On Arong Panjabi
No ratings yet
Report On Arong Panjabi
61 pages
Unit 2 Data Gathering
No ratings yet
Unit 2 Data Gathering
14 pages
Datapreparation
No ratings yet
Datapreparation
59 pages
5 Assignment5
67% (3)
5 Assignment5
10 pages
Linear Regression Analysis Report
No ratings yet
Linear Regression Analysis Report
21 pages
Trend Analysis of Seasonal Rainfall and Temperature Pattern in Kalahandi
No ratings yet
Trend Analysis of Seasonal Rainfall and Temperature Pattern in Kalahandi
10 pages
Statictics 1st Paper (8 Experiments)
No ratings yet
Statictics 1st Paper (8 Experiments)
52 pages
Effect of Stock Split
100% (1)
Effect of Stock Split
28 pages
Lecture 3 Unit 1
No ratings yet
Lecture 3 Unit 1
61 pages
Module 2 - Data Preprocessing
No ratings yet
Module 2 - Data Preprocessing
16 pages
18 AS Statistics and Mechanics Practice Paper I Mark Scheme
No ratings yet
18 AS Statistics and Mechanics Practice Paper I Mark Scheme
8 pages
Thong Ke 3
No ratings yet
Thong Ke 3
19 pages
DM Unit 3
No ratings yet
DM Unit 3
15 pages
Predicting SCC Strength with ANN Models
No ratings yet
Predicting SCC Strength with ANN Models
31 pages
3 Preprocessing
No ratings yet
3 Preprocessing
27 pages
Profitability Matrix of Standalone Health Insurance Companies in India
No ratings yet
Profitability Matrix of Standalone Health Insurance Companies in India
9 pages
Module1.5 Preprocessing
No ratings yet
Module1.5 Preprocessing
40 pages
Data Preprocessing, Data Warehousing
No ratings yet
Data Preprocessing, Data Warehousing
9 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
16 pages
Data Mining
No ratings yet
Data Mining
22 pages
Java Rec
No ratings yet
Java Rec
99 pages
UNIT 2 Data Warehousing
No ratings yet
UNIT 2 Data Warehousing
45 pages
Statistics For Data Science PDF
No ratings yet
Statistics For Data Science PDF
16 pages
18mca52c U2
No ratings yet
18mca52c U2
23 pages
BI Unit 4 Final
No ratings yet
BI Unit 4 Final
2 pages
Data Mining Overview
No ratings yet
Data Mining Overview
4 pages
Statistics and Data Analysis For Nursing Research (2nd Edition) 2nd Editionpdf Download
100% (3)
Statistics and Data Analysis For Nursing Research (2nd Edition) 2nd Editionpdf Download
31 pages
Flutter Lab Manual
No ratings yet
Flutter Lab Manual
49 pages
Data Handling and Visualization 3rd Unit
No ratings yet
Data Handling and Visualization 3rd Unit
4 pages
UNIT 3 Data Preprocessing
No ratings yet
UNIT 3 Data Preprocessing
22 pages
DM Unit 1
No ratings yet
DM Unit 1
18 pages
Unit II (DWDM)
No ratings yet
Unit II (DWDM)
19 pages
Managerial Statistics Syllabus
No ratings yet
Managerial Statistics Syllabus
7 pages
Session 2-Data Preprocessing
No ratings yet
Session 2-Data Preprocessing
29 pages
Data Mining
No ratings yet
Data Mining
9 pages
Data Preprocessing
No ratings yet
Data Preprocessing
5 pages
Unit 2 Preprocessing
No ratings yet
Unit 2 Preprocessing
39 pages
Unit II - Data Preprocessing and Classification RSK-1
No ratings yet
Unit II - Data Preprocessing and Classification RSK-1
115 pages
Da Rec
No ratings yet
Da Rec
29 pages
Data Mining
No ratings yet
Data Mining
55 pages
Search Capabilites
No ratings yet
Search Capabilites
17 pages