Unit 2 Notes

The document discusses various statistical concepts including measures of relationship, position, and Bayes' theorem. It also covers topics like correlation, regression, percentiles, z-scores, quartiles, Bayesian networks, discriminative learning, hidden Markov models, and latent variable models.

Uploaded by

Vedant Chinta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views7 pages

Unit 2 Notes

Uploaded by

Vedant Chinta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Unit 2 : Statistical Inference II

Measure of Relationship:
Definition: The statistical measures which show a relationship between two or more variables
are called Measures of Relationship. Correlation and Regression are commonly used measures
of relationship.

Covariance and Karl Pearson's Coefficient of Correlation are measures used in statistics to
quantify the relationship between two variables. Let's explore each of these measures:

1. Covariance:
- Definition: Covariance measures the extent to which two variables change together. It
indicates whether an increase in one variable corresponds to an increase or decrease in another.
- Formula: The covariance (cov) between two variables X and Y in a dataset is calculated using
the following formula:

Interpretation:
- Positive covariance indicates a direct relationship (both variables increase or decrease
together).
- Negative covariance indicates an inverse relationship (one variable increases while the other
decreases).

2. Karl Pearson's Coefficient of Correlation (Pearson's r):

- Definition: Pearson's coefficient of correlation measures the strength and direction of a linear
relationship between two variables. It is normalized, providing a value between -1 and 1.
Key Differences:
- Covariance is not normalized and depends on the scales of the variables, making it difficult to
compare covariances across different datasets.
- Pearson's correlation coefficient is normalized, making it more interpretable and comparable
across datasets.
- Pearson's correlation coefficient specifically measures linear relationships, while covariance
does not provide information about the strength or type of relationship.

Measures of Position:
Measures of position in statistics help us understand the relative location of a particular data
point within a dataset.

Percentile:
A percentile is a measure that indicates the relative standing of a particular value within a
dataset.
Percentiles divide a dataset into 100 equal parts, and each percentile represents the percentage of
data points below it.
For example, the 80th percentile indicates that 80% of the data points are below that particular
value.
Z-score (Standard Score):
The Z-score measures how many standard deviations a data point is from the mean (average) of a
dataset.

Z-scores help standardize data, making it easier to compare values from different datasets.
Quartiles:
Quartiles divide a dataset into four equal parts, each containing approximately 25% of the data.
The three quartiles are:
First Quartile (Q1): The 25th percentile.
Second Quartile (Q2): The median or 50th percentile.
Third Quartile (Q3): The 75th percentile.
Interquartile Range (IQR) is the range between the first and third quartiles and is a measure of
the spread of the middle 50% of the data.
Bayes' Theorem:
Bayes' Theorem is a mathematical formula that describes the probability of an event based on
prior knowledge of conditions that might be related to the event.
It is named after Thomas Bayes, an 18th-century statistician and theologian.
The theorem is often expressed as follows:

Bayes' Theorem is widely used in various fields, including statistics, machine learning, and
medical diagnosis. It provides a systematic way to update probabilities as new information
becomes available, making it a powerful tool for reasoning under uncertainty.
Bayes Classifier:
A Bayes classifier, also known as a Naive Bayes classifier, is a probabilistic machine learning
model based on Bayes' Theorem. It is a simple and efficient algorithm for classification tasks,
especially in situations with a large number of features. Despite its simplicity, Naive Bayes often
performs well in practice, making it a popular choice for text classification, spam filtering, and
other similar applications.
The basic idea behind a Bayes classifier is to use prior knowledge about the distribution of
classes and features in the training data to make predictions about the class of new, unseen data.
Working of Naïve Bayes' Classifier can be understood with the help of the below example:
Suppose we have a dataset of weather conditions and corresponding target variable "Play". So
using this dataset we need to decide that whether we should play or not on a particular day
according to the weather conditions. So to solve this problem, we need to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.
Advantages of Naïve Bayes Classifier:

 Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
 It can be used for Binary as well as Multi-class Classifications.
 It performs well in Multi-class predictions as compared to the other Algorithms.
 It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:

 Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the
relationship between features.
Applications of Naïve Bayes Classifier:

 It is used for Credit Scoring.

 It is used in medical data classification.
 It can be used in real-time predictions because Naïve Bayes Classifier is an eager learner.
 It is used in Text classification such as Spam filtering and Sentiment analysis.

Types of Naïve Bayes Model:

There are three types of Naive Bayes Model, which are given below:

 Gaussian: The Gaussian model assumes that features follow a normal distribution. This
means if predictors take continuous values instead of discrete, then the model assumes
that these values are sampled from the Gaussian distribution.
 Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
multinomial distributed. It is primarily used for document classification problems, it
means a particular document belongs to which category such as Sports, Politics,
education, etc.
The classifier uses the frequency of words for the predictors.
 Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the
predictor variables are the independent Booleans variables. Such as if a particular word is
present or not in a document. This model is also famous for document classification
tasks.
Bayesian network:
Bayesian belief network is key computer technology for dealing with probabilistic events and to
solve a problem which has uncertainty. We can define a Bayesian network as:
"A Bayesian network is a probabilistic graphical model which represents a set of variables and
their conditional dependencies using a directed acyclic graph."
It is also called a Bayes network, belief network, decision network, or Bayesian model.
Bayesian networks are probabilistic, because these networks are built from a probability
distribution, and also use probability theory for prediction and anomaly detection.
Applications of Bayesian networks include:
1. Medical Diagnosis: Modeling the relationships between symptoms, diseases, and test
results to assist in diagnosing medical conditions.
2. Risk Assessment: Evaluating the probability and impact of different risks in a system.
3. Speech Recognition: Modeling dependencies between phonemes to improve the accuracy
of speech recognition systems.
4. Natural Language Processing: Capturing the probabilistic relationships between words in
a language to enhance language understanding.
Bayesian networks are valuable tools in decision support systems, where they help in reasoning
about uncertain and complex scenarios. They provide a principled framework for representing
and updating knowledge in the presence of uncertainty.
Discriminative learning with maximum likelihood:
Discriminative learning, often contrasted with generative learning, focuses on modeling the
decision boundary between different classes directly. Maximum Likelihood Estimation (MLE) is
a common approach in discriminative learning, aiming to find the parameters that maximize the
likelihood of the observed data given the model.
Discriminative models focus on finding the boundary that separates different classes, making
them well-suited for classification tasks. Maximum Likelihood Estimation provides a principled
way to estimate the parameters of the model based on the observed data.
It's worth noting that while maximum likelihood is a powerful and widely used approach, other
methods, such as maximum a posteriori estimation (MAP) or Bayesian approaches, also play
important roles in statistical learning.
Probabilistic models with hidden variables:
Probabilistic models with hidden variables are models that involve both observable (measurable)
variables and unobservable or hidden variables. These models are widely used in various fields,
including machine learning, statistics, and artificial intelligence, to represent complex
relationships in data where not all variables can be directly observed.
Here are two common types of probabilistic models with hidden variables:
Hidden Markov Models (HMMs):
Hidden Markov Models are widely used in sequential data analysis, such as speech recognition,
natural language processing, and bioinformatics.
In an HMM, there are observable variables (emissions) and hidden states. The model assumes
that the observed data depend on an underlying sequence of hidden states.
The key components of an HMM include transition probabilities (probabilities of moving from
one hidden state to another), emission probabilities (probabilities of observing a particular value
given the hidden state), and an initial state distribution.
Latent Variable Models:
Latent variable models involve both observed variables and unobserved latent variables that help
explain the structure of the data.
Examples include Principal Component Analysis (PCA), Factor Analysis, and Gaussian Mixture
Models (GMMs).
In PCA and Factor Analysis, the latent variables represent underlying patterns or factors that
explain the observed variability in the data. In GMMs, each data point is assumed to be
generated by a mixture of different Gaussian distributions, and the latent variable indicates the
specific component (cluster) responsible for generating the data point.

Linear models and regression analysis, particularly using the method of least squares, are
fundamental concepts in statistics and machine learning. Let's break down these terms:

1. Linear Models:
Linear models are mathematical representations used to describe the relationship between a
dependent variable (response) and one or more independent variables (features or predictors) in a
linear way.
Linear models describe a continuous response variable as a function of one or more predictor
variables. They can help to understand and predict the behaviour of complex systems or analyse
experimental, financial, and biological data.
Regression Analysis:
- Regression analysis is a statistical technique that aims to model and analyze the relationship
between a dependent variable and one or more independent variables.
- The primary goal is to understand how changes in the independent variables are associated
with changes in the dependent variable.
- Regression analysis can be used for prediction, understanding the strength and nature of
relationships, and making inferences about the population.
Least Squares:
Least Squares is a method used to estimate the parameters (coefficients) of a linear model by
minimizing the sum of the squared differences between the observed and predicted values.

The ordinary least squares (OLS) method generalizes to multiple linear regression with multiple
independent variables.

The steps for performing least squares regression include:

1. Specify the model.
2. Collect data, including the values of the dependent and independent variables.
3. Estimate the model parameters by minimizing the sum of squared residuals.
4. Assess the fit of the model and make inferences about the relationships.

3.measures of Dispersion
No ratings yet
3.measures of Dispersion
10 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
Quartile Deviation or Semi Inter - Quartile Range: Alternatively, For A Symmetrical Distribution
No ratings yet
Quartile Deviation or Semi Inter - Quartile Range: Alternatively, For A Symmetrical Distribution
6 pages
WINSEM2023-24 MCSE602L TH VL2023240501960 2024-03-13 Reference-Material-I
No ratings yet
WINSEM2023-24 MCSE602L TH VL2023240501960 2024-03-13 Reference-Material-I
132 pages
Learning Bayesian Networks Richard E. Neapolitan Download Full Chapters
No ratings yet
Learning Bayesian Networks Richard E. Neapolitan Download Full Chapters
153 pages
Machine Learning PPT Part III
No ratings yet
Machine Learning PPT Part III
26 pages
Statistic Form 4
No ratings yet
Statistic Form 4
34 pages
Classification 2
No ratings yet
Classification 2
19 pages
Module 3 - Classification
No ratings yet
Module 3 - Classification
111 pages
Q4 - Week 1 - Illustrating Quartiles, Deciles and Percentiles
No ratings yet
Q4 - Week 1 - Illustrating Quartiles, Deciles and Percentiles
11 pages
Jeb Am Am Assignment T 1
No ratings yet
Jeb Am Am Assignment T 1
16 pages
AML Unit 2
No ratings yet
AML Unit 2
6 pages
CCS - Lec 5
No ratings yet
CCS - Lec 5
33 pages
Bayesian Classification, Nearest
No ratings yet
Bayesian Classification, Nearest
46 pages
CSE38900 Tutorial 1 - Rev1
No ratings yet
CSE38900 Tutorial 1 - Rev1
3 pages
Classification Bayes
No ratings yet
Classification Bayes
21 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
16 pages
AIML-Unit 5 Notes
No ratings yet
AIML-Unit 5 Notes
45 pages
22cse61 Module 4
No ratings yet
22cse61 Module 4
110 pages
Learning Bayesian Networks Richard E. Neapolitan Instant Download Full Chapters
No ratings yet
Learning Bayesian Networks Richard E. Neapolitan Instant Download Full Chapters
158 pages
Lecture-8 Machine Learning With Python
No ratings yet
Lecture-8 Machine Learning With Python
35 pages
Unit 3
No ratings yet
Unit 3
27 pages
NB Classifier & Bayesian Network 2
No ratings yet
NB Classifier & Bayesian Network 2
37 pages
Grades 10 - 12 Maths Schemes of Work
No ratings yet
Grades 10 - 12 Maths Schemes of Work
27 pages
Quantitative Methods Module 1
No ratings yet
Quantitative Methods Module 1
24 pages
Learning Bayesian Networks (Neapolitan, Richard) PDF
100% (1)
Learning Bayesian Networks (Neapolitan, Richard) PDF
704 pages
Report Ai
No ratings yet
Report Ai
7 pages
UNIT I-Part 2
No ratings yet
UNIT I-Part 2
35 pages
Understanding Measures of Dispersion
No ratings yet
Understanding Measures of Dispersion
12 pages
Slide07 Bayes
No ratings yet
Slide07 Bayes
51 pages
ML 5
No ratings yet
ML 5
28 pages
AIML
No ratings yet
AIML
30 pages
Unit 5-6
No ratings yet
Unit 5-6
18 pages
Unit III 1
No ratings yet
Unit III 1
21 pages
(Machine Learning) BAYES' THEOREM AND CONCEPT LEARNING
No ratings yet
(Machine Learning) BAYES' THEOREM AND CONCEPT LEARNING
22 pages
ML 9
No ratings yet
ML 9
15 pages
DL Highlights
No ratings yet
DL Highlights
6 pages
Lecture 2 - Principle of Machine Learning
No ratings yet
Lecture 2 - Principle of Machine Learning
39 pages
AI Week 14
No ratings yet
AI Week 14
3 pages
Big Data Part-I
No ratings yet
Big Data Part-I
15 pages
10A Chapter 6 - Investigating Data PDF
No ratings yet
10A Chapter 6 - Investigating Data PDF
60 pages
Naïve Bayes & Bayesian Networks Guide
No ratings yet
Naïve Bayes & Bayesian Networks Guide
9 pages
Unit 2
No ratings yet
Unit 2
17 pages
Module - 4 - ECE3047 - Machine Learning
No ratings yet
Module - 4 - ECE3047 - Machine Learning
81 pages
Bayesian Classification Guide
No ratings yet
Bayesian Classification Guide
6 pages
Naive Bayes
No ratings yet
Naive Bayes
37 pages
Unit 2linear Regression Bayesian Learning
No ratings yet
Unit 2linear Regression Bayesian Learning
49 pages
Economics Students' GDP & Life Study
No ratings yet
Economics Students' GDP & Life Study
16 pages
2b.data Visualization
No ratings yet
2b.data Visualization
7 pages
Week2-Day 1-Introduction To Data Mining
No ratings yet
Week2-Day 1-Introduction To Data Mining
30 pages
Fairness Lectures-21
No ratings yet
Fairness Lectures-21
63 pages
Statistics Exam Review
No ratings yet
Statistics Exam Review
90 pages
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
No ratings yet
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
18 pages
Machine Learning UNIT-2: Logistic Regression
No ratings yet
Machine Learning UNIT-2: Logistic Regression
12 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
Brief Intro To ML PDF
No ratings yet
Brief Intro To ML PDF
236 pages
SB11 - Group 1
100% (1)
SB11 - Group 1
33 pages
Lecture-7 Classification Using Naive Bays
No ratings yet
Lecture-7 Classification Using Naive Bays
19 pages
Data Transformation With Dplyr - Cheatsheet
100% (1)
Data Transformation With Dplyr - Cheatsheet
2 pages
Unit 1
No ratings yet
Unit 1
15 pages
Screenshot 2024-10-16 at 8.23.19 PM
No ratings yet
Screenshot 2024-10-16 at 8.23.19 PM
68 pages
Statistical Methods
No ratings yet
Statistical Methods
43 pages
Bark08 Ghahramani Samlbb 01
No ratings yet
Bark08 Ghahramani Samlbb 01
26 pages
Measures of Position: Quartiles & Z-Scores
No ratings yet
Measures of Position: Quartiles & Z-Scores
5 pages
CNN For AVM (IEEE RA-L, July 2021)
No ratings yet
CNN For AVM (IEEE RA-L, July 2021)
8 pages
Autointegrate
No ratings yet
Autointegrate
20 pages
Uji Normalitas Dan Homogenitas
No ratings yet
Uji Normalitas Dan Homogenitas
18 pages
Module 7 - Measures of Variability
No ratings yet
Module 7 - Measures of Variability
16 pages
Chapter 1 Solutions
No ratings yet
Chapter 1 Solutions
13 pages
Chapter 04 - Measures of Dispersion (Part 1)
No ratings yet
Chapter 04 - Measures of Dispersion (Part 1)
15 pages
OCR S1 Revision Sheets
No ratings yet
OCR S1 Revision Sheets
12 pages
B.Ed. Assessment Validity Guide
No ratings yet
B.Ed. Assessment Validity Guide
11 pages
Fsurg 10 1151327
No ratings yet
Fsurg 10 1151327
7 pages
Creating Box Plots 1
No ratings yet
Creating Box Plots 1
2 pages
IB Questionbank Mathematical Studies 3rd Edition 1
No ratings yet
IB Questionbank Mathematical Studies 3rd Edition 1
2 pages
2017 Paper 2 Maths PDF
No ratings yet
2017 Paper 2 Maths PDF
20 pages
Additional Mathematics Results For SMK Taman SEA From 2013 - 2017
No ratings yet
Additional Mathematics Results For SMK Taman SEA From 2013 - 2017
8 pages

Unit 2 Notes

Uploaded by

Unit 2 Notes

Uploaded by

Unit 2 : Statistical Inference II

2. Karl Pearson's Coefficient of Correlation (Pearson's r):

Disadvantages of Naïve Bayes Classifier:

 It is used for Credit Scoring.

Types of Naïve Bayes Model:

The steps for performing least squares regression include:

You might also like