Statistics:-
-- It is the science of collecting, analyzing, and presenting numerical data to
provide useful information.
Types of statistics:
1.Discriptive Statistics:-
-- It is a set of methods used to summarize and describe the main features of a
dataset.
2.Inferential Statistics:-
-- sample mai se data inkalte hai aur population main se conclusion decision lete
hai
-- It is the practice of using data from a sample to make conclusion or prediction
about a larger population.
-- IS Means using data from a sample to make conclusion about larger population.
___________________________________________________________________________________
________
Population:-
-- A population is the entire group that you want to draw conclusion about.
Sample:
-- It is a subset of data from larger group, or population, that is used to
represent the whole.
-- It is a subset of data from larger group.
___________________________________________________________________________________
___
Basics Formula
1.AVG:- Sum of total no / count of Total no
Or
Sum of all obersvation / Total no.of observation
____________________________________________________________________
A.Discriptive Statistics:-
*1.Measure of central tendency:
-- It is a single value that represent the middle or center of a set of data.
*Subpoints:-
a.Mean:- Mean is average i.e
Sum of all obersvation / Total no.of observation
b.Median:-to find the median we have to sort the data in ascending order.
for odd: M = (n+1/2)th
for even : M = (n/2)th+(n/2+1)th/2
EX- for odd-> 2,3,4,5,9
n=5 ->odd
Meadian=(5+1)/2
=6/2
= 3rd term
i.e= 4
EX- for even->1,2,3,4,5,9
n=6 ->even
(3+4)/2
=7/1
=3.5th term
3.5th term = avg of 3rd term & 4th term= 3+4/2
=3.5 Median
c.Mode:- Most frequent value.(Most repeated value)
EX-2,3,4,5,2,6,6,2,9,10,2
mode=most freqent value is-
mode=2
*2. Mesaure of dispersion:-
-- It is a statistical tool that describe how spread out a set of data is, or how
much it varies from a cental value.
Subpoint:-
1. Range:--1to 1
-- Formula = max-min
Ex-4,3,6,7,10
max(10)-min(3)
= 7 Range
2.Variance:-
--Formula = sigma square = summation N and i=1(xi-miu)2/N
Ex-{3,5,8,1}
mean=4.25
N=4
(3-4.25)^2+(5-4.25)^2+(8-4.25)^2+(1-4.25)^2
3.Standard Deviation:-
___________________________________________________________________________________
________
Sampling Techniques:-
Types of Sampling Techniques:-
1.Simple Random Sampling:-
-- When we pass n number of i.e. n=5 it gives the random 5 numbers
-- Randomly selecting a subset of a population.
-- Where each member of the population has an equal chance of being selected.
EX- 1,2,7,6,3
n=3
->1,7,6 any random numbers selected
2.Systematic Sampling:- (pattern follow/ step parameter/equal no of interval)
-- It is a probability sampling method where researchers select members of a
population at regular intervals.
-- EX- A researcher might select every 15th person on a list of the population.
EX - 2,3,4,7,6,8,9,10,12
->2,7,9
3.Stratified Sampling:-It is a technique that involves dividing a population into
subgroups(homogeous),
or strata, and then selecting equal samples from each stratum independently.
4.Cluster Sampling:-
-- It is a method that involves dividing a population into smaller
groups(heterogeous), or clusters and then randomly
selecting a sample from those clusters.
5.Convenience Sampling:-
-- It is method where researchers select participants based on convenience,
rather than random selection.
-- convenience ke according sampling karte hai.
___________________________________________________________________________________
_____________________
Co-Relation:- It is statistical measure that describes the relationship between two
or more variables.
-- It shows direction and strength both.
-- Strongly->nearest of 1
-- mid-> midest of 1
-- negative-> -1 to 1
Types of co-relation:-
1. Positive co-relation
2.Negative co-relation
3.Zero co-relation
___________________________________________________________________________________
______________________
Covariance :-
-- It is a statistical tool that measures the relationship between two random
variables and how much they change together
-- It shows only the directions.
___________________________________________________________________________________
__________________
Causation:-
--It is a relationship between two variables or events where one event or variable
causes an effect on the other.
-- causation is also known as cause and effect.
___________________________________________________________________________________
__________________
Outliers:-
--It is a data point that is significantly different from other data points in a
set.
-- It is totaly different from normal data
Q1 25 Percentile data
formula- Q1 = (n+1)x 1/4
Q2 means 50 Percentile data
formula- Q2 = (n+1)/2
Q3 75 Percentile data
Q3 = (3(n+1)/4)th term
Ex-
180 156 9 176 163 1827 166 171
Now sorting the dataset
9 156 163 166 171 176 180 1827
where n = 8
now calculating Q1
Q1 = (n+1)/4
= (8+1)/4
= 9/4
= 2.25 the term
2nd term + 0.25 percentage of (3rd term)
Q1=158
------------------
Q3 = (n+1)*3/4
= (27+1)*3/4
= 6.75 th
6th term + 0.75(7th term)
Q3=181
____________________________________________
EX-2
26 37 24 28 35 22 31 53 41 64 29
now sorting
22 24 26 28 29 31 35 37 41 53 64
Now
Q2 (n+1)/2
=(11+1)/2
= 12th term/2
6th term = 31
-------------
Q1 (n+1)/4
= 12/4
3rd term = 26
------------
Q3 (n+1)*3/4
=(11+1)*3/4
9th term = 41
IQR = inter quartile range
= Q3 - Q1 -- formula of IQR
=41 - 26
=15
lower fence = Q1-1.5*IQR
=26-1.5*15
= 3.5
Upper fence = Q3+1.5*IQR
= 41+1.5*15
= 63.5
Outlier = 64
___________________________________________________________________________________
______________________
Probability = possible no.of outcomes / Total no. of outcomes
Total Probability is always 1.
EX-
one coin is toosed how many probability will come?
Ans- 1/2
Q.1 Suppose we tossed a 3 coins, so what is the probability of getting atleast 2
head
=> 2**3=2*2*2
=8
combinations = {HHH,TTT,THT,HTH,HHT,TTH,HTT,THH}
probability = possible no.of outcomes/Total No.of outcomes
=4/8
=1/2
Q.2 At most tail
=> 7/8
Q.3 Exactly two head
=> 3/8
___________________________________________________________________________________
_____________
Complement Rule:-
formula= P(not A)=1-P(A)
P(A)= Probability of head = 1-3/4 = 1/4
Addition Rule:-
p(AUB)=p(A)+P(B)-P(AnB)
EX-A=1,2,3,4,5,
B=6,7,3,5,9
union(AUB)=1,2,3,5,6,7,9
intersection(AnB)=3,5
union gives unique element
intersection gives common element
___________________________________________________________________________________
__________
Addition Rule :-(or operator)
Dataset:- A deck of 10 cards (5 red and 5 blacks) labeled as follows:
{R1,R2,R3,R4,R5,B1,B2,B3,B4,B5}
The Addition Rule calculates the probability of the union of two events:
P(AUB)= P(A)+P(B)-P(AnB)
___________________________________________________________________________________
__________________
Multiplication Rule:-(for independent event) (and operator)
P(AnB)=P(A).P(B)
The multiplication Rule calculates the probability of the intersection of two
independent events:
P(AnB)=P(A).P(B)
___________________________________________________________________________________
_________________
Skewness:-
It is a measure of how asymmetrical(unequal) a distribution is , or how much data
is spread out on one side of the mean
___________________________________________________________________________________
__________________
Kurtosis:-
-- It is a measure of the tailedness of a distribution.
-- tells us height of the data
Types :
Leptokurtic(positive kurtosis)
2.Mesokurtic(Normal kurtosis)
3.Leptokurtic()
___________________________________________________________________________________
____________
Random Variable:
--It is a any number whose value can change all the time and takes up the value
which you assign to it.
-- There are two types of random variable
Types of random variable:
1.Continuous random variable:-
-- It is a one which takes an infinity number of possible values.
-- infinite number of possible values
Ex- Height, Weight, amount of sugar in orange
2. Discrete random variable:-
-- It is a countable number of distinct values, such as heads or tails, playing
cards, or the sides of a die.
-- It is finite number of possible values
Ex-Head or tail
___________________________________________________________________________________
_________________
Distribution:-
-- It gives us a snapshot of our data.
-- They reveal patterns, showing us things like the average value (central
tendency), how spread out the values
are(variability) on one side of the average than the other(skewness).
-- Distribution are the backbone of statistical model
Types of Distribution:-
1.Uniform Distribution:-
-- It refers to a type of probability distribution in which all outcomes are
equally likely.
-- equal time mei equal work ho raha hai then its a uniform distribution
EX- The possible outcomes of rolling a 6-sided die.The possible values would be
1,2,3,4,5 or 6.In this case, each of the six numbers has equal chance of appearing
2.Bernoulli Distribution:-
-- It is a distribution in which only single trial is conducted.
where the variable can only take two values:1(success) or 0(failure)
EX- Probability of success and failure in binary.
Ex- Probabiliy of head and tail in a single trial
p=probability of success
q=(1-p)=probability of failure
3. Binomial Distribution:-
-- Binomial Distribution shows the probability of success and failure in multiple
trials.
Ex-The number of heads that occur when a coinnis flipped a fixed number of times.
-- In this example each flip is a trail and each trial has two possible
outcomes:heads or tails.The probability of getting heads is 0.5
and the expected number of heads in 50 flips is 25.
___________________________________________________________________________________
_____________________________________