Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
6 views4 pages

Detailed Notes On Predictive Analytics

Predictive Analytics utilizes historical data and machine learning to forecast future outcomes across various fields such as business, healthcare, and finance. Key steps include defining the problem, understanding data, performing statistical tests, and building models. A structured data audit is essential to ensure data quality and relevance before modeling.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views4 pages

Detailed Notes On Predictive Analytics

Predictive Analytics utilizes historical data and machine learning to forecast future outcomes across various fields such as business, healthcare, and finance. Key steps include defining the problem, understanding data, performing statistical tests, and building models. A structured data audit is essential to ensure data quality and relevance before modeling.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 4

Detailed Notes on Predictive Analytics

1. Overview of Predictive Analytics


 Definition: Predictive Analytics refers to using historical data, statistical
algorithms, and machine learning techniques to forecast future outcomes.
 Goal: Go beyond describing "what happened" to predict what is likely to happen.
 Applications:
o Business: customer churn prediction, sales forecasting.

o Healthcare: predicting disease risk, patient readmission.

o Finance: credit scoring, fraud detection.

o Engineering: equipment failure prediction.

Key Steps:

1. Define the problem.


2. Understand and preprocess data.
3. Explore variables through visualization.
4. Apply statistical tests for significance.
5. Build models and evaluate performance.

2. Setting Up the Problem


 Problem Formulation: Translate the business/real-world question into a data science
problem.
o Example: “Why are customers leaving?” → Predict whether a customer will
churn (classification problem).
 Define Outcome Variable (Target): What we want to predict (e.g., churn = Yes/No).
 Define Predictors (Features): Variables that might influence the outcome (e.g., age,
income, purchase history).
 Check Feasibility:
o Is relevant data available?

o Is there enough quantity and quality of data?

o Is the timeline practical for predictions?


3. Data Understanding
 Purpose: Build intuition about the dataset before modeling.
 Steps:
1. Data Collection – Gather data from internal sources (databases, logs) or
external sources (APIs, surveys).
2. Data Description – Identify number of observations (rows), variables
(columns), types of variables (categorical, numerical, ordinal).
3. Data Quality Check – Missing values, duplicates, inconsistencies.
4. Initial Insights – Basic statistics: mean, median, standard deviation,
correlations.

4. Single Variable Analysis


 Focus on understanding one variable at a time.
 For Categorical Variables:
o Frequency counts, mode, proportions.

o Example: Gender distribution (Male: 60%, Female: 40%).

 For Numerical Variables:


o Measures of central tendency: mean, median, mode.

o Dispersion: variance, standard deviation, range, interquartile range (IQR).

o Distribution shape: skewness, kurtosis.

 Purpose: Identify unusual distributions, outliers, or dominant categories.

5. Data Visualization in One Dimension


 Visualization helps detect patterns, skewness, and anomalies in a single variable.
 For Categorical Variables:
o Bar charts, Pie charts.

o Example: Bar chart of customer segments.

 For Numerical Variables:


o Histograms: show distribution of data.

o Box plots: highlight outliers and spread.

o Density plots: smooth estimation of distribution.

6. Data Visualization in Two or Higher Dimensions


 Two Variables (Bivariate Analysis):
o Helps identify relationships between predictor and target variable.

o Numerical vs. Numerical: Scatter plots, correlation heatmaps.

o Numerical vs. Categorical: Box plots, violin plots.

o Categorical vs. Categorical: Cross-tabulations, stacked bar charts.

 Higher Dimensions (Multivariate Analysis):


o Pair plots (scatterplot matrix).

o Heatmaps for correlations across many variables.

o Dimensionality reduction techniques: PCA (Principal Component Analysis), t-


SNE for visualization.
 Purpose: Understand variable interactions and potential predictors.

7. The Value of Statistical Significance


 Definition: Measures whether the observed relationship between variables is likely
due to chance.
 Key Concepts:
o Null Hypothesis (H₀): No effect/relationship.

o Alternative Hypothesis (H₁): There is an effect/relationship.

o p-value: Probability of observing results as extreme as current ones, assuming


H₀ is true.
 p < 0.05 → Statistically significant.
o Confidence Intervals: Range of values within which true effect lies with a
certain probability (e.g., 95%).
 Tests Used:
o t-test (difference between two means).
o Chi-square test (categorical associations).

o ANOVA (differences across multiple groups).

o Correlation coefficients (strength of linear relationships).

 Why Important?: Avoids building models on spurious correlations. Ensures


predictors are truly meaningful.

8. Pulling It All Together into a Data Audit


 Definition: A structured summary of the dataset before predictive modeling.
 Checklist for a Data Audit:
1. Data Availability – Where data is sourced, time span covered.
2. Data Quantity – Number of records, adequacy for modeling.
3. Data Quality – Missing values, noise, duplicates, outliers.
4. Variable Properties – Data types, ranges, distributions.
5. Variable Relationships – Correlations, significant predictors.
6. Potential Issues – Bias, imbalances (e.g., 95% No churn, 5% churn).
7. Documentation – Clear description for transparency and reproducibility.

Outcome:
A Data Audit Report ensures the dataset is clean, well-understood, and ready for feature
engineering and predictive modeling.

You might also like