Inferential Statistics
Inferential Statistics
Introduction to Probability
Exploratory data analysis helped you understand how to discover patterns in data using various techniques and
approaches. As you learnt, EDA is one of the most important parts of the data analysis process. It is also the part on
which data analysts spend most of their time.
However, sometimes, you may require a very large amount of data for your analysis, which may need too much time and
resources to acquire. In such situations, you are forced to work with a smaller sample of the data instead of having to
work with the entire data.
Situations like these arise all the time at big companies like Amazon. For example, let's say the Amazon QC department
wants to know what proportion of the products in its warehouses are defective. Instead of going through all of its
products (which would be a lot!), the Amazon QC team can just check a small sample of 1,000 products and then find, for
this sample, the defect rate (i.e., the proportion of defective products). Then, based on this sample's defect rate, the
team can 'infer' what the defect rate is for all the products in the warehouses.
This process of 'inferring' insights from sample data is called 'inferential statistics'.
Note that even after using inferential statistics, you will arrive at only an estimate of the population data from the sample
data, not the exact values. This is because when you don't have the exact data, you can only make reasonable estimates
about it with a limited level of certainty. Therefore, when certainty is limited, we talk in terms of probability. Probability
is useful and important in inferential statistics.
In this session, you will learn the basic concepts of probability and the various rules associated with it. The broad agenda
of the session covers the following:
1. Permutation and combination
2. Definition of probability and its properties
3. Key terms related to probability
4. Probability rules (Addition and Multiplication)
Permutations
A permutation is a way of arranging a select group of objects in such a way that the order is of significance. As shown in
the example, when you arrange the top order batsmen of a cricket team, you use permutation to find all the possible
orders in which they can be arranged.
If there are n 'objects' that are to be arranged among r available 'spaces', then the number of ways in which this task can
be completed is n! / (n-r)! If there are n 'spaces' as well, then the number of ways would be just n!. Here n! (pronounced
as n factorial) is simply the product of all the numbers from n till 1 and is given by the following formula:
n! = n*(n-1) *(n-2) .... *3*2*1.
nPr or P (n, r) = n! /(n-r)!
In the case of counting using the method of permutations, you had considered the 'order' to be an important factor.
1. Finding all possible four-letter words that can be formed using the alphabets R, E, A and D
2. Finding all possible ways in which the final league standings of the eight teams can be in an Indian Premier
League (IPL) tournament
3. Finding all possible ways that a group of 10 people can be seated in a row in a cinema hall, and so on
Inferential Statistics
Combinations
Permutation vs combinations
Permutation Combinations
Order matters Order does not matter
In how many ways you can select In how many ways you can choose
three number from 1,2,3 three number from 1,2,3
123
132
213
123
231
312
321
When you just have to choose some objects from a larger set and the order is of no significance, then the rule of
counting that you use is called combination.
Some other examples of combinations are as follows.
1. The number of ways in which you can pick three letters from the word 'UPGRAD'
2. The number of ways a team can win three matches in a league of five matches
3. The number of ways in which you can choose 13 cards from a deck of 52 cards, and so on
The formula for counting the number of ways to choose r objects out of a set of n objects is as follows:
One way to look at it is to see if the order matters or not. If it does, then use the permutations formula, and if does not,
then use the one for combinations.
Note: A helpful hint here would be to look for a keyword in the given scenario to know which method is needed. If the
problem requires you to order/arrange a group of objects, then you would most probably use the method of
permutations. Else, if you are told to pick/choose a group of objects, then more often than not you would be using the
formula for combinations.
Inferential Statistics
Types of Events
The two main categories of events that you need to know right now are independent events and disjoint or mutually
exclusive events. Let's learn their formal definitions.
Independent events: If you have two or more events and the occurrence of one event has no bearing
whatsoever on the occurrence/s of the other event/s, then all the events are said to be independent of each
other. For example, the chances of rain in Bengaluru on a particular day has no effect on the chances of rain in
Mumbai 10 days later. Hence, these two events are independent of each other.
Disjoint or mutually exclusive events: Now, two or more events are mutually exclusive when they do not occur
at the same time, i.e., when one event occurs, the rest of the events do not occur. For example, if a student has
been assigned grade C for a particular subject in an exam, he or she cannot be awarded grade B for the same
subject in the same exam. So, the events in which a student gets a grade of B or C for the same subject in the
same exam are mutually exclusive or disjoint.
Inferential Statistics
For example
The events 'Customer A buys the product' and 'Customer B buys the product' are independent, whereas the
events 'Customer A buys the product' and 'Customer A does not buy the product' are disjoint.
The events 'You will win Lottery A' and 'You will win Lottery B' are independent, whereas 'You will win Lottery A'
and 'You will not win Lottery A' are disjoint events.
where,
P(A∪B) denotes the probability that either event A or B occurs.
P(A) denotes the probability that only event A occurs
P(B) denotes the probability that only event B occurs
P(A∩B) denotes the probability that both events A and B occur simultaneously.
You can also read
P(A∪B) as P(either event A or B occurs)
P(A∩B) as P(both events A and B occur).
As mentioned in the video, for disjoint events A and B, P(A∩B) = 0 since both cannot occur simultaneously. Hence, the
formula can be rewritten as P(A∪B) = P(A) + P(B).
Inferential Statistics
when an event A is not dependent on event B and vice versa, they are known as independent events. The multiplication
rule allows us to compute the probabilities of both of them occurring simultaneously, which is given as:
P(A and B) = P(A)*P(B)
P(A and B and C and D) = P(A)*P(B)*P(C)*P(D).
Comparison Between Addition Rule and Multiplication Rule
Both the addition rule and the multiplication rule allow you to compute the probabilities of the occurrence of
multiple events. However, there is a key difference between the two, which should help you to decide when to
use which rule.
1. The addition rule is generally used to find the probability of multiple events when either of the events can
occur at that particular instance. For example, when you want to compute the probability of picking a face
card or a heart card from a deck of 52 cards, a successful outcome occurs when either of the two events is
true. This includes either getting a face card, a heart card, or even both a face and a heart card. This rule
works for all types of events.
2. The multiplication rule is used to find the probability of multiple events when all the events need to occur
simultaneously. For example, in a coin toss experiment where you toss the coin three times and you need to
find the probability of getting three heads at the end of the experiment, a successful outcome occurs when
you get a head in the first, second and third toss as well. This rule is used for independent events only.
3. Also, in the addition rule, do you remember the P(A⋂B) that we used to compute the final value of P(A⋃B)?
This value is exactly the same as the P(A and B) that we compute in independent events using the
multiplication rule. You can go back and verify it for the same example shown in the video. There we had
P(Heart Card) = P(H) = 13/52, P(Face Card) = P(F) = 12/52 and P(Heart Card and Face Card) = P(H⋂F) = 3/52.
Now, as mentioned by the multiplication rule, you can see that P(H and F) = P(H)*P(F) = (13/52)*(12/52) =
3/52, which is the same as the value of P(H⋂F).
Note: A helpful hint here to decide when to use the addition rule and when to use the multiplication rule is to observe
the language of the question. If the question mentions an 'OR' to denote the relationship between the events, then you
need to apply the addition rule. That is, either of the given events can occur at that time, P(Event A or Event B). Else, if an
'AND' is used to denote the relationship between the events, then the multiplication rule should be used. Here, the
events need to happen simultaneously and must be independent, i.e., P(Event A and Event B).
Inferential Statistics
Basics of Probability
Random Variables
The random variable X converts the outcomes of experiments to measurable values.
For example, let’s say as a data analyst at a bank, you are trying to find out which of the customers will default
on their loan, i.e., stop paying their loans. Based on some data, you have been able to make the following
predictions:
Customer Number of
Yearly Income (in ₹) Amount of Loan Due (in ₹) Default Prediction (Yes/No)
Number Dependents
1 10 lakh 75 lakh 3 Yes
2 15 lakh 50 lakh 2 No
3 20 lakh 40 lakh 1 No
Now, instead of processing the yes/no response, it will be much easier if you define a random variable X to
indicate whether the customer is predicted to default or not. The values will be assigned according to the
following rule:
X = 1, if the customer defaults;
X = 0, if the customer does not default.
Now, the data changes to the following:
Customer Number of
Yearly Income (in ₹) Amount of Loan Due (in ₹) Default Prediction (Yes/No)
Number Dependents
1 10 lakh 75 lakh 3 1
2 15 lakh 50 lakh 2 0
3 20 lakh 40 lakh 1 0
Now, in this form, the table is entirely quantified, i.e., converted to numbers. Now that the data is entirely in quantitative
terms, it becomes possible to perform a number of different kinds of statistical analyses on it.
Eg. In casino, In the long run (i.e., if it is played a lot of times), is this game profitable for the players or for the house? Or,
will everybody break even in the long run?
we established a three-step process for answering this question:
1. Find all the possible combinations.
2. Find the probability of each combination.
3. Use the probabilities to estimate the profit/loss per player.
Inferential Statistics
Probability Distributions
A probability distribution is a form of representation that tells us the probability for all the possible values of X. It could
be any of the following:
A table A Chart An equation
Expected Value
Expected Value for a variable X, is the value of X that we would “expect” to get after performing the experiment an
infinite number of times.
It is also called the expectation, average or mean value.
Mathematically speaking, for a random variable X that can take the values: x1, x2, x3, ..........., xn.
The expected value (EV) is given by:
EV(X)=x1∗P(X=x1) + x2∗P(X=x2) + x3∗P(X=x3) + ........... + xn∗P(X=xn)
The expected value should be interpreted as the average value you get after the experiment has been conducted an
infinite number of times.
For example, the expected value for the number of red balls is 2.385. This means that if we conduct the
experiment (play the game) infinite times, the average number of red balls per game would end up being 2.385.
In a bag containing red and blue bolls, let’s say that the probability of getting 1 red ball in one trial is equal to p. hence
probability of getting 1 blue ball in one trial is equal to (1-p).
So, the probability distribution for X (i.e., the number of red balls drawn after 4 trials) if the probability of getting a red
ball in 1 trial is 'p' is as follows:
Let’s say
n = the number of trials, Probability of getting r red balls = pr
p = the probability of success Probability of getting (n-r) blue balls = (1-p)(n-r)
(1-p) = the probability of not success No of combination of r success is nCr
r = the number of successes after n trials.
Probability of getting 1 combination of r red balls and (n-r) blue balls = P(X=r) for one combination = p r x (1-p)(n-r)
However, there are some conditions that need to be met in order for us to be able to apply the formula.
1. If you toss a coin 20 times to see how many times you get tails, you are following all the conditions required for a
binomial distribution. The total number of trials is fixed (20), and you can only have two outcomes, i.e., tails or
heads. The probability of getting a tail is 0.5 each time you toss a coin.
Inferential Statistics
2. In a way, this is similar to drawing 20 balls out of a bag, replacing each ball after drawing it, and seeing how many
of the balls are red. Here, the probability of getting a red ball in one trial is 0.5.
3. When you toss a coin until you get heads, the total number of trials is not fixed. This is similar to taking out balls
from the bag repeatedly until you draw a red ball. You can still find the probability of getting heads in 1 trial, 2
trials, 3 trials etc. and so on, but you cannot use binomial distribution to find that probability.
4. In the second example, where binomial distribution is not applicable, the experiment does not have only two
outcomes, but several. It is similar to taking out balls from a bag that contains red, blue, black, orange, and
other-colored balls. The probability distribution for this experiment cannot be made using binomial distribution.
5. In the final example, the probability of trials is not equal to each other. For example, the probability of drawing a
red ball in the first trial is 3/5. Now, in the second trial, the probability of drawing a red ball would be equal
to 2/4 not 3/5, as the red ball taken out in the first trial was not put back. Hence, the probability of getting the
combination red-red-red-blue, for example, would be 3/5*2/4*1/3*2/2, which is not the value we got while
deriving binomial distribution (3/5*3/5*3/5*2/5). Again, you cannot use binomial distribution to find the
probability in this case.
In other words, binomial distribution is applicable in situations where there are a fixed number of yes or no questions,
with the probability of a yes or a no remaining the same for all questions.
A random variable follows a Bernoulli distribution if it only has two possible outcomes: 0 or 1.
For example, suppose we flip a coin one time. Let the probability that it lands on heads be p. This means the
probability that it lands on tails is 1-p.
Now, if we flip a coin multiple times then the sum of the Bernoulli random variables will follow a Binomial
distribution.
There are some more probability distributions that are commonly seen among discrete random variables. They are
not covered in this course, but if you want to go through some of them, you can use the following links:
1. Poisson Distribution :
It gives the probability of an event happening a certain number of times (k) within a given interval of
time or space. The Poisson distribution has only one parameter, λ (lambda), which is the mean number
of events.
2. Geometric Distribution :
The number of trials required to achieve success (P(X = n) = (1 − p)(n − 1) • p), trials are performed till first
success
3. Negative Binomial Distribution :
The number of failures before the nth success in a sequence of draws of Bernoulli random variables,
trials are performed till a certain number of successes.
4. Binomial Distribution:
Trials are fixed
Cumulative Probability
Inferential Statistics
In the previous example, we only discussed the probability of getting an exact value. For example, we know the
probability of X = 4 (4 red balls). But what if the house wants to know the probability of getting < 3 red balls, as the house
knows that for < 3 red balls, the players will lose and the house will make money?
Sometimes, talking in terms of less than is more useful. For example — how many employees can get to work in less than
40 minutes? Let’s explore how you can find the probability for such cases.
cumulative probability of X, denoted by F(x), is defined as the probability of the variable being less than or equal to x.
F(x) = P(X<=x)
For example:
F(4) = P(X<=3) = P(X=0) + P(X=1)+ P(X=2)+ P(X=3)
F(3) = P(X<=2)= P(X=0)+ P(X=1)+ P(X=2)
1. The first step is defining the random variable. The random variable (X) is the outcome of a die throw. So, X = {1,
2, 3, 4, 5, 6}
2. The second step is to calculate the probabilities related to each outcome. The probability of each outcome
is 1/6 in a die throw.
Now, you have X and P(X). If you plug these values in the formula E[X]=∑(X×P(X)), you’ll get 3.5 as the expected value. So
how to interpret this number? This means if you were to throw the die a large number of times, the average of those
numbers will tend towards 3.5.
So, why do we need the expected value at all? Well, the expected value lets you reason about real-world random
phenomenon more rationally.
Inferential Statistics
CDF and PDF these two functions talk about probabilities in terms of intervals rather than the exact values, it is
advisable to use them when talking about continuous random variables, not the bar chart distribution that we used for
discrete variables.
CDF, or a cumulative distribution function, is a distribution that plots the cumulative probability of X against X.
A PDF, or a Probability Density Function, however, is a function in which the area under the curve gives you the
cumulative probability.
The main difference between the cumulative probability distribution of a continuous random variable and a discrete
one lies in the way you plot them. While a continuous variables’ cumulative distribution is a curve, a distribution for
discrete variables looks more like a bar chart.
CDF for Continuous Variables (Commute Time) CDF for Discrete Variables (Number of Red Balls)
The continuous variable, i.e., the daily commute time, The reason for the difference is that for discrete
you have a different cumulative probability value for variables, the cumulative probability does not change
every value of X. For example, the value of cumulative very frequently. In the discrete variable example, we only
probability at 21 will be different from its value at 21.1, care about what the probability is for 0, 1, 2, 3 and 4. This
which will again be different from the one at 21.2, and so is because the cumulative probability will not change
on. Hence, you would show its cumulative probability as a between, say, 3 and 3.999999. For all values between
continuous curve, not a bar chart. these two, the cumulative probability is equal to 0.8704.
Uniform Distribution
A commonly observed type of distribution among continuous variables is a uniform distribution. For a
continuous random variable following a uniform distribution, the value of probability density is equal for all
possible values. Let’s explore this distribution a little more.
Since all possible values are between 0 and 10, the area under the curve between 0 and 10 is equal to 1.
The value of the PDF for all values between 0 and 10 is 0.1.
Inferential Statistics
Well, PDFs are more commonly used in real life. The reason is that it is much easier to see patterns in PDFs as
compared to CDFs. For example, here are the PDF and the CDF of a uniformly distributed continuous random
variable:
The PDF clearly shows uniformity, as the probability density’s value remains constant for all possible values.
However, the CDF does not show any trends that help you identify quickly that the variable is uniformly
distributed.
Again, it is clear that the symmetrical nature of the variable is much more apparent in the PDF than in the CDF.
Hence, generally, PDFs are used more commonly than CDFs.
Normal Distribution
Normally distributed data follows the 1-2-3 rule. This rule states that there is a:
1. 68% probability of the variable lying within 1 standard deviation of the mean,
2. 95% probability of the variable lying within 2 standard deviations of the mean, and
3. 99.7% probability of the variable lying within 3 standard deviations of the mean.
Inferential Statistics
Mean (µ) =0
Basically, Z value tells us, how many standard deviations away from the mean your random variable is. We can
find the cumulative probability corresponding to a given value of Z, using the Z table:
Inferential Statistics
The value of σ is an indicator of how wide the graph is. A low value of σ means that the graph is narrow, while
a high value implies that the graph is wider. This is because a wider graph has more values away from the
mean, resulting in a high standard deviation.
Again, some more probability distributions are commonly seen among continuous random variables. They are
not covered in this course, but if you want to go through some of them, you can use the links below:
1. Exponential Distribution
2. Gamma Distribution
3. Chi-Squared Distribution
Suppose for a business application, you want to find out the average number of times people in urban India visited malls
last year. That’s 400 million (40 crore) people! You cannot possibly go and ask every single person how many times they
visited the mall. That’s a costly and time-consuming process. How can you reduce the time and money spent on finding
this number?
Sampling Distributions
Sampling distributions, whose properties, will help us estimate the population mean from the sample mean.
So, the sampling distribution, specifically the sampling distribution of the sample means, is a probability density function
for the sample means of a population.
This distribution has some very interesting properties, which will later help you estimate the sampling error. Let's take a
look at these properties.
The sampling distribution’s mean is denoted by μX̄, as it is the mean of the sampling means. Let’s see what it is for this
sampling distribution.
Note that you would divide by n and not n-1 as you have the data for
all 100 entries (i.e. mean of samples) of the distribution and you don't
need to sample the distribution.
Inferential Statistics
Inferential Statistics
Properties of Sampling Distributions
We’ve been saying that the sampling distribution has some interesting properties that will later help you estimate the
error in your samples. Let’s finally see what these properties are.
So, there are two important properties of a sampling distribution of the mean:
So, the central limit theorem says that for any kind of data, provided a high number of samples has been taken, the
following properties hold true:
Central Limit Theorem says that, no matter how the population is distributed the sampling distribution will be
σ
approximately normal, with mean (μ) and standard deviation , for n>30 the sampling distribution can be taken as
√n
exactly a normal distribution.
Sample means of any population follow a normal distribution, no matter if the population follows a
normal distribution or not if all we need to estimate the population mean the CLT makes it possible to do
that
CLT Demonstration
Link for python calculations for sample mean of population i.e. CLT
No matter the parent population distribution, when you take samples, compute their means and find the sampling
distribution, it will always be normal, or at least nearly normal. This is one of the most important implications of the
Central Limit Theorem.
Inferential Statistics
CLT Demonstration: II
The population mean, i.e., the daily commute time of all 30,000 employees (μ) = 36.6 = (sample mean) + some margin of
error.
we can find this margin of error using the CLT (central limit theorem).
Consider a food industry and find the average lead content in the food product (let’s say maximum permissible lead
content is 2.33 ppm)
So, population mean (µ) = sample mean X̄ ± Margine error= 2.3 ppm ± Margine error
1. Random Sampling:
In this method, people in the sample are selected randomly. This is similar to randomly pulling names out of a
hat.
Example: Suppose you want to find out the average internet usage per person in India. You just put the names of
all the Indians in a hat and pull out 100 names at random, and then calculate the average internet usage
of these 100 Indians.
Inferential Statistics
2. Stratified Sampling:
Here, people are divided into subgroups and then selected randomly from those subgroups. But this is done in
such a way that the final sample has the same proportions of these subgroups as the population.
Example: Again, suppose you want to find out the average internet usage per person in India. Note that 70% of
Indians live in rural areas, and 30% live in urban areas. So, you would put the names of all the rural
Indians in hat A and the names of all the urban Indians in hat B. Then, you’d pull 70 names out of hat
A and 30 names out of hat B. Now, again, you’d have a sample of 100 Indians, but this time, your
sample would be more representative of the population as its rural and urban proportions would be
the same as that of the population.
3. Volunteer Sampling:
Here, your sample is composed of people who want to volunteer for the survey.
Example: Suppose that once more, you want to find out the average internet usage per person in India. You
could ask people to take an online survey, which asks them how often/much they use the internet. You
could ask the same question through a telephonic survey.
The good thing about this type of sampling is that it looks unbiased and random because the survey participants
are selected at random through the medium (internet, telephone) itself. There is no human interference.
However, the medium will also bring in some bias. For example, an internet survey is more likely to include
people who have high internet usage, whereas a telephone survey is a little more likely to have a balanced
representation of heavy internet users and people who use the internet infrequently.
3. Opportunity Sampling:
In this method, the people around and close to the surveyor form their sample space.
Inferential Statistics
Example: This time, when you want to find out the average internet usage per person in India, you just ask 100
people around you about their internet usage.
Clearly, this sampling method has the potential to become extremely biased. The only good thing here, probably, is that
this is a relatively convenient sampling method.
So, there are four typical cases in which sampling is generally used but not limited to:
1. Market research: Suppose your company wants to launch a product whose usage depends on people having a
decent internet connection, such as Hotstar, Netflix, etc. Before launching such a product, you need to
understand the potential market size. For this, you need to conduct a survey with some people and based on
their data, infer parameters such as the average data usage, the willingness to adopt new technologies, etc. for
the entire population.
2. Marketing campaign efficacy: Suppose you work for a company such as Hotstar or Netflix. You want more and
more people to move from your competitors’ platforms to your platform. You plan to do this through a
marketing campaign. But how should you structure this marketing campaign? What should be its budget? Which
strategy should be used (free membership for a week, lower membership fees for a few weeks, etc.)? You can
use the data from your past marketing campaign and your knowledge of sampling techniques to make these
decisions.
3. Pilot testing: Again, let’s take the Hotstar and Netflix example. Suppose you’ve done all the market research
required and developed a product. Now, before putting your product out in the market, you want to give it a trial
run. For this, you can perform what is called a pilot test. It means that instead of giving your product a full-
fledged launch, you can just launch it partially to a few people, who can test your product and help you decide
whether it is good enough for a full launch.
4. Quality control: This is more of a manufacturing-centered application. Let’s say your company produces 10
million smartphones annually. This means that around 30,000 phones are manufactured every day. In such a
situation, quality assurance (QA) becomes a function of utmost importance. Since it is difficult to check all 30,000
phones every day, your company would just “sample” a few and then make decisions based on those samples.
So, now you know how stratified sampling can be used to improve your inferences. Let’s go through the case again:
1. You want to conduct a brand equity survey for e-commerce brands. In other words, you want to find out how
much of the e-commerce market is controlled by Flipkart, Amazon, and Snapdeal, respectively.
2. An important part of this process would be to conduct a survey, the results of which would tell you the
proportion of Indian e-commerce buyers that uses each of these websites.
3. However, in order to do this, you would need to perform stratified sampling on the basis of gender
(male/female), age, and location (metro/tier 2/other urban/rural). Not doing this would mean that you run the
risk of erroneous selection, for example, selecting too many people from metro areas or too few women, etc.
Hence, by not using stratified sampling you might end up with an unrepresentative sample.
Inferential Statistics
4. So, you give the questionnaire prepared by your team to the general public. Once you’ve acquired sufficient
sample data, you can make estimations for the general population and estimate the brand equity of major Indian
e-commerce brands.
5. However, you must not accept every entry you get. You can run some checks to screen out fraudulent entries.
For example, if a person takes only one minute to fill a survey that usually takes 10 minutes, he/she is probably
committing fraud.
However, as much as you would like to believe that you have used stratified random sampling, there actually is a big
chance that the sampling done here is closer to stratified volunteer sampling or to stratified opportunity sampling than
to stratified random sampling. Let’s understand why this is the case.
So, let’s say you used email as the medium for your survey. Once you decided on your quota guide, etc. and sent the
emails, you probably used the survey results to estimate population parameters. That’s the entire process. But where
exactly did you make it a volunteer/opportunity sampling exercise?
For many people, the email could have ended up in the spam folder. If this happened, you would probably not get a
response from them. Now, if all these people happened to fall in a specific general category (such as old people who
don’t understand how to filter spam), then your survey would have ended up being biased.
Another potential source of bias is non-response. Let’s say that out of 80 people, only 40 chose to respond to the email.
In that case, the 40 who did not respond would not be represented in your survey results. Hence, again, if these 40
people happened to disproportionately represent a particular segment (such as people who are digitally less savvy), the
survey results would be biased.
To be able to answer all these questions, you would need to perform A/B testing. Basically, you would divide your current
customer base into four groups, say, group A, group B, group C, and group D. Then, each of the groups would be
subjected to one of the above strategies. For example, group A would get a 20% discount coupon, group C would get an
app reminder, etc. Then, when you got the data for these sample groups, you could use the concepts of hypothesis
testing and sampling to answer the questions asked above.
Well, first of all, you’d need to break up your population into various small segments on the basis of factors such
as the acquisition channel, the frequency of shopping, the payment mode generally used, etc.
Once you break your population into microsegments this way, you can then get sample information for each
segment. Remember that the reason for making these divisions would be to ensure that the sample represents
the population as closely as possible.
Finally, once you make these probably unbiased divisions, you would have your sample. And, once you get the
sample, you can perform A/B testing.
So, if you are creating any product, the process you will follow is given below.
Before you even start with the product development process, you will need to test your concept. This can be
accomplished by asking a few people how they would feel about a web streaming service and if such a service existed,
how often would they use it. How much would they be willing to spend on it? For people who don’t want to use it, what
is the reason for not using it? Will they reconsider their decision if certain features are added to the product?
Once you have the results of this concept test, you can start developing the actual product accordingly. Now, when this
product nears completion, you should have a few people try it out and collect their feedback. Based on their feedback,
you can make some last-minute changes which will help you rectify any mistakes or help you add small features you may
have missed. This process of getting your product checked once before its final development is called pilot testing.
Once this product, i.e., the streaming service, is developed in accordance with the results of the concept test and the
pilot test, you can launch it. However, if you wish to be really careful, there is one last thing you could do — you could
have a few people try this developed product, take their feedback, and make one more round of changes before you
launch the product. This process, where you get your final product checked, is called beta testing.
Hence, once you have conducted a concept test, a pilot test, and preferably, a beta test too, you will be ready to launch
your product. Now, let’s listen to Ujjyaini as she further explains this framework for product development.
Now there is scope for using sampling at various stages of the product development process. For example:
1. In the initial stages, you want to talk to people and figure out if they are interested in a digital payment service.
However, you need to be careful about how you design this survey: the people you talk to should be a mix of
those who are already comfortable with cashless products such as credit cards, etc. and those who are only
comfortable with cash. While interpreting your findings, you need to make sure that each stratum of the society
is represented correctly in the survey and that there are enough people of each type in your survey. If you only
interviewed 20 people who are comfortable with digital payments, you may need to use booster methods.
Inferential Statistics
2. In the final stages, during the pilot testing stage, you will need to use the sampling concepts again. Since this
stage also involves you surveying people and making decisions for the population based on the sample you
surveyed, you will need to stratify your sample accordingly, steer clear of biases, use boosters wherever needed,
etc.
Finally, let’s go through the fourth use case of sampling, i.e., its use in quality control. Quality control is a process
followed at manufacturing sites, where batches of the manufactured product are regularly checked to ensure that they
meet the standards the company would like them to meet. Since it will be very expensive and time-consuming to check
each and every product manufactured, companies typically just check a few randomly selected products and decide for
the entire batch based on that.
Let’s say you’re inspecting a batch of bolts to assess their quality. You decide to check every 1000th bolt and see whether
it is manufactured as per the desired quality or not. Since all the bolts you inspect turn out to be good, you decide that
there are no defects in the batch.
However, there is a problem with this approach. What if the 6th product made by the machine is defective, and then,
every 1000th product the machine makes after that is defective? In that case, the defective pieces will have ID numbers
as follows: 6, 1006, 2006, 3006, and so on. However, since you’re only checking every 1000th product, i.e., ID
numbers 1000, 2000, 3000, etc., you will never find the defective piece.
The point is if the defects occur in a pattern, then your best chances to catch them are if you randomly select batch
numbers. If your selection has any trend to it, you risk matching the pattern of the defects and missing out.
Hence, it is always advisable to use a table of random numbers to decide which batches you’re going to inspect.
Inferential Statistics
Note: Using this table will not ensure that you will detect the defective pieces, but it will make that more likely.
Inferential Statistics
Inferential Statistics - Additional Resources
Basics of Probability
Probability is a measure of the likelihood of the occurrence of an event.
Measures of probability range from 0 to 1;
0 means an impossible event and 1 means a certain event.
Terminology
1. Trial or experiment: This refers to an action, the result of which is uncertain. For example, throwing a dice,
tossing a coin, etc.
2. Event: This refers to a single result of an experiment.
3. Sample space: This is the total number of possible outcomes of an experiment.
4. Sample point: This refers to one of the possible outcomes.
Formula
Number of favourable outcomes
The probability of an event=
Total number of possible outcomes
Throwing a die
This can lead to six outcomes; you can get any value between 1 and 6, both included:
The probability of the occurrence of any number on the die is 1/6.
Addition Rule
( Probability of happening of event A'OR' B):
P(X ⋃ Y) = P(X) + P(Y) - P(X በ Y)
2. A bag contains three red balls, five green balls and ten black balls. What is the probability of you getting either a
red ball or a green ball when you randomly draw a ball from the bag?
Solution: The probability of getting a red ball = 3/18
The probability of getting a green ball = 5/18
The probability of getting either a red ball or a green ball = 8/18 = 4/9
Conditional Probability
Conditional probability is the probability of an event, given that some other event has already occurred.
In conditional probability, the two events are dependent.
Notation P(B|A): This notation denotes the probability of event ‘B’, given that event ‘A’ has already occurred.
Formula:
P(B በ A )
P(B∨ A)=
P( A)
Inferential Statistics
3. A student has applied to a university and has a 50% chance of getting an admission. Also, as per the university
guidelines, 50% of the admitted students will get hostel accommodation. What is the probability of the student
getting hostel accommodation, given that he has been admitted?
P(Hostel በ Admission)
Solution: P(Hostel∨Admission)=
P (Admission)
P(Hostel | Admission) = 0.5 * 0.5/0.5
P(Hostel | Admission) = 0.5
4. A bag contains four red balls and five green balls. You draw a ball from it without replacing it with another one.
What is the probability of you drawing a green ball in the second draw, & a red ball in the first draw?
Solution: The probability of you drawing a red ball in the first draw is 4/9,
and the probability of you drawing a green ball in the second draw is 5/8.
So, the probability of you drawing a green ball after drawing a red ball is (4/9) * (5/8) = 5/18.
Bayes' Theorem
Bayes’ theorem describes the probability of an event based on prior knowledge of conditions that might be related to
the event.
If you know the conditional probability P(B|A), you can use Bayes’ rule to find out the reverse probability P(A|B).
Formula:
Demonstration
We will now understand the application of Bayes’ theorem through a demonstration:
1. Two bags contain red and green balls. The first bag contains two red and three green balls; the second bag
contains five red and seven green balls. If a green ball is drawn from one of the bags, what is the probability that
it was drawn from the first bag?
Now let's compute the different probabilities needed for solving this using the Bayes' Theorem.
Let A be the event that the first bag is chosen and B be the event that a green ball is chosen.
Inferential Statistics
Therefore,
P(A) = 1/2 [same as P(1)]
P(B) = Probability of getting a green ball = P(1)*P(G1) + P(2)*P(G2)
= (1/2 * 3/5) + (1/2*7/12)
= 71/120
P(B|A) = 3/5 [same as P(G1)]
Applying Bayes’ theorem to determine the probability of a green ball being drawn from bag 1, we get:
P(A|B) = [P(B|A)*P(A)]/P(B)
= (3/5 * 1/2 )/ (71/120)
= 36/71
You can learn more about conditional probability and Bayes’ theorem here.
A unique combination of mean (μ) and standard deviation (σ) represents or defines a unique normal distribution. So, to
analyze or compare different normal distributions, you make use of a standardized normal distribution. A standardized
normal distribution is a special type of normal distribution where μ = 0 and σ = 1.
A normal distribution is converted into a standardized normal distribution with the help of the Z score.
As it’s evident from the formula, for every value of x (or the values on the X-axis), we will calculate the corresponding Z
scores using the formula above and plot these Z scores against their respective probabilities on the Y-axis.
For example, for a normal distribution with μ= 35 and σ = 5, the normal distribution curve and the standard normal
distribution curve will look like this:
Sampling Methods
Population: This refers to the entire data.
Sample: This refers to the part of the population selected by a defined procedure to be representative of the data.
Types of Sampling
Random sampling:
In this kind of sampling, each element of the population has the same probability of getting selected in the
sample.
o Simple random sampling with replacement:
In simple random sampling with replacement, for the creation of a sample size n, you select an
element from the population and then return it to the population. This procedure is repeated n
times. Thus, each element of the population can be selected more than once in a sample. This is
used when the population size is small.
o Simple random sampling without replacement:
In simple random sampling without replacement, for the creation of a sample size n, you select
an element from the population and don’t return it to the population. The selection of elements
from the population is repeated n times. This is used when the population size is large.
o Stratified random sampling:
In stratified random sampling, the population is divided into strata on the basis of common
characteristics. The elements are then selected from these strata.
o Cluster sampling:
In cluster sampling, the population is divided into clusters, and then, a simple random sample of
these clusters is selected.
o Systematic sampling:
In systematic sampling, a starting point is selected in the population, and then, the elements are
selected at regular, fixed intervals.
Inferential Statistics
Non-random sampling:
In this kind of sampling, each element of the population does not have the same probability of getting
selected in the sample.
o Convenience sampling:
In convenience sampling, the researcher selects the elements from the population on the basis
of the convenient accessibility of these elements.
o Judgemental sampling:
In judgemental sampling, the researcher selects the elements on the basis of his judgement and
bias.
o Quota sampling:
The population is divided into groups or quotas, on the basis of which you select the sample.
Quota sampling is, to a certain extent, similar to random sampling; the sampling procedure is
more or less the same in both the cases, except the quota is fixed in quota sampling. That is, you
don't consider the entire population, just a section of it to create a quota.
o Snowball sampling:
In the case of snowball sampling, a small sample is first selected, say a sample of five people.
Then, each of the five members can suggest five names, and those five can suggest five more
each. This creates a snowball effect.
What is the difference between stratified random sampling and cluster sampling?
In stratified random sampling, the whole population is divided into strata based on common characteristics, and
then, elements are selected from each stratum. In cluster sampling, on the other hand, the whole population is
divided into clusters, and then, some of the clusters are chosen randomly to create a sample.
Sampling Distribution
Sampling distribution is the probability distribution of a particular sample statistic (such as mean) obtained by
drawing all possible samples of a particular sample size ‘n’ from the population and calculating their statistics.
Important property:
Mean of the sample means (μx̅) = Mean of the population (μ)
σ
σ x̅ = where n is the sample size of all the samples.
√n
So, the normal variate or the Z-score for the sampling distribution of a sample means is:
( x̅ −μ)
Z=
(σ /√ n)
Estimation
The process of drawing inferences about a population using the information from its samples is known as estimation.
Types of estimation
1. Point estimate:
Here, a statistic obtained from a sample is used to estimate a population parameter. So, its accuracy
depends on how well the sample represents the population. The population parameters derived from
sample statistics of various samples may vary. This is why interval estimate is preferred to point estimate.
2. Interval estimate:
Here, the lower and upper limits of values (that is, the confidence interval) within which a population
parameter will lie are estimated along with a certain level of confidence.
So, you can say that the population mean μ will lie between:
¿ ¿
X̅ −( Z (σ /√ n))< μ< X̅ +(Z (σ /√ n))
The formula above is used to calculate the upper and the lower limits of μ for a certain level of confidence (a
certain value of Z), where the value of σ is known.
Inferential Statistics
What if the value of σ is not known? In that case, you use the t-distribution.
T-distribution
Properties of T-distribution:
1. It can only be applied when the samples are drawn from a normally distributed population.
2. It is flatter than a normal distribution.
3.
Here, there is only one unknown parameter: the population standard deviation. So, the degree of freedom for
a t-distribution is given by ‘sample size (n) - 1’.
( X̅ i−μ)
Standard normal variate∨test statistic for t−distribution=
(s/ √ n)
where ‘s’ is the sample standard deviation.