0% found this document useful (0 votes)

4 views11 pages

Document

Descriptive statistics involves summarizing numerical data through measures of central tendency, variability, and distribution. Key measures include mean, median, mode, range, variance, and standard deviation, which help in understanding data characteristics. Additionally, concepts like correlation coefficients, linear regression, skewness, kurtosis, percentiles, and quartiles are essential for analyzing data relationships and distributions.

Uploaded by

2420017

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as RTF, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views11 pages

Document

Uploaded by

2420017

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as RTF, PDF, TXT or read online on Scribd

You are on page 1/ 11

escriptive statistics

Descriptive statistics is a branch of statistics that deals with the analysis and summarization of numerical
data. It involves calculating various measures such as central tendency, variability, and distribution of
data. The purpose of descriptive statistics is to provide a concise summary of the data that can be easily
understood and interpreted.

The most commonly used measures of central tendency include the mean, median, and mode. The
mean is the arithmetic average of the data, the median is the middle value when the data is ordered,
and the mode is the most frequently occurring value.

Measures of variability include the range, variance, and standard deviation. The range is the difference
between the largest and smallest values, the variance is a measure of how spread out the data is, and
the standard deviation is the square root of the variance.

The distribution of data can be represented by graphs such as histograms, frequency polygons, and box
plots. These graphs can give an idea of the shape of the data and any outliers that may be present.

Descriptive statistics is used in various fields such as finance, psychology, medicine, and sociology. It is
often used to summarize and analyze large amounts of data, and to make informed decisions based on
the results.

[email protected]

measure of dispersion

A measure of dispersion is a statistic that describes the amount of variability or spread in a set of data.
Dispersion measures are important because they help to provide a more complete understanding of the
data than measures of central tendency (such as the mean or median) alone. Here are some common
measures of dispersion:
Range: The range is the simplest measure of dispersion and is calculated as the difference between the
largest and smallest values in a data set.

Variance: The variance is a measure of how spread out the data is from the mean. It is calculated by
taking the average of the squared differences between each value and the mean.

Standard deviation: The standard deviation is the square root of the variance and is another common
measure of dispersion. It measures the average amount of deviation of each data point from the mean.

Interquartile range: The interquartile range is the range between the first quartile and the third quartile
of a data set. It is a useful measure of dispersion that is less sensitive to outliers than the range.

Mean absolute deviation: The mean absolute deviation (MAD) is a measure of dispersion that calculates
the average absolute difference between each data point and the mean.

Each of these measures of dispersion has its own strengths and weaknesses, and the choice of which to
use will depend on the specific situation and the type of data being analyzed.

[email protected]

co efficient of variation

The coefficient of variation (CV) is a measure of relative variability and is used to compare the degree of
variation of different sets of data, particularly when their means are different. It is defined as the ratio of
the standard deviation (SD) to the mean (M) of a set of data, expressed as a percentage:

CV = (SD/M) x 100%
The coefficient of variation is a useful measure when comparing the variability of two or more data sets
that have different means. For example, if the mean salary of a group of employees is $50,000 with a
standard deviation of $10,000, and the mean salary of another group is $80,000 with a standard
deviation of $20,000, the CVs for these two groups would be:

CV for group 1 = (10,000/50,000) x 100% = 20%

CV for group 2 = (20,000/80,000) x 100% = 25%

In this case, the CVs show that the second group has a higher degree of relative variability than the first
group, despite having a higher mean salary.

The coefficient of variation can also be used as a tool for quality control. For example, if a production
process has a low CV, it suggests that the process is producing consistent results, while a high CV
suggests that the process may be inconsistent and require further investigation.

[email protected]

linear regression analysis with formula

Linear regression analysis is a statistical method used to model the relationship between a dependent
variable (Y) and one or more independent variables (X) that are believed to affect Y. The basic formula for
linear regression analysis can be expressed as:

Y = b0 + b1X1 + b2X2 + ... + bnXn + ε

Where:

Y is the dependent variable (also known as the response variable)

X1, X2, ..., Xn are the independent variables (also known as predictor variables or regressors)

b0, b1, b2, ..., bn are the regression coefficients that represent the strength and direction of the
relationship between Y and X

ε is the error term, which represents the amount of unexplained variation in Y that is not accounted for
by the independent variables

The goal of linear regression analysis is to estimate the values of the regression coefficients that
minimize the sum of the squared differences between the predicted values of Y and the actual values of
Y. This is known as the method of least squares.

The formula for estimating the regression coefficients is:

b1 = (Σ(Xi - Xbar)(Yi - Ybar)) / Σ(Xi - Xbar)^2

where:

b1 is the estimated regression coefficient for X1

Xi is the value of X1 for observation i

Xbar is the mean value of X1 for all observations

Yi is the value of Y for observation i

Ybar is the mean value of Y for all observations

The formula for the intercept (b0) is:

b0 = Ybar - b1Xbar

Once the regression coefficients have been estimated, the formula for predicting the value of Y for a
given set of values of X is:
Y = b0 + b1X1 + b2X2 + ... + bnXn

Linear regression analysis can be extended to include multiple independent variables, interactions
between variables, and other more complex models, but the basic principles and formulas remain the
same.

[email protected]

determination of correlation coefficient

The correlation coefficient is a statistical measure that indicates the strength and direction of the linear
relationship between two variables. It ranges from -1 to 1, with values closer to -1 or 1 indicating a
stronger relationship, and values closer to 0 indicating a weaker relationship. A correlation coefficient of
0 indicates no linear relationship.

The formula for calculating the correlation coefficient (r) between two variables X and Y is:

r = (nΣXY - ΣXΣY) / sqrt[(nΣX^2 - (ΣX)^2)(nΣY^2 - (ΣY)^2)]

where:

n is the number of observations

ΣXY is the sum of the products of X and Y for all observations

ΣX and ΣY are the sums of X and Y, respectively, for all observations

ΣX^2 and ΣY^2 are the sums of the squared values of X and Y, respectively, for all observations

Once the correlation coefficient has been calculated, it can be interpreted as follows:
r = 1: Perfect positive correlation

0.7 ≤ r < 1: Strong positive correlation

0.4 ≤ r < 0.7: Moderate positive correlation

0.2 ≤ r < 0.4: Weak positive correlation

r = 0: No correlation

-0.2 ≤ r < 0: Weak negative correlation

-0.4 ≤ r < -0.2: Moderate negative correlation

-0.7 ≤ r < -0.4: Strong negative correlation

r = -1: Perfect negative correlation

It is important to note that correlation does not imply causation, and that other factors may be
responsible for any observed relationship between two variables. It is also possible to have a significant
correlation coefficient between two variables that are not causally related, or to have a weak correlation
coefficient between two variables that are causally related.

[email protected]

the least squares criterion

The least squares criterion is a principle used in linear regression analysis to estimate the values of the
regression coefficients that best fit the data. The goal of linear regression analysis is to find the line that
best fits the data, where "best" means that the line minimizes the sum of the squared differences
between the predicted values of the dependent variable and the actual values of the dependent
variable.

In other words, the least squares criterion aims to minimize the sum of the squared residuals, which are
the differences between the observed values of the dependent variable and the predicted values of the
dependent variable based on the regression line. The formula for the sum of the squared residuals is:
SSR = Σ(yi - ŷi)^2

where:

SSR is the sum of squared residuals

yi is the observed value of the dependent variable for observation i

ŷi is the predicted value of the dependent variable based on the estimated regression line

The least squares criterion estimates the values of the regression coefficients that minimize the sum of
the squared residuals. This is achieved by finding the values of the regression coefficients that solve the
following system of equations:

Σyi = nb0 + b1Σxi

Σxiyi = b0Σxi + b1Σxi^2

where:

n is the number of observations

xi and yi are the values of the independent and dependent variables, respectively, for observation i

b0 and b1 are the regression coefficients

These equations can be solved using matrix algebra, and the resulting values of b0 and b1 provide the
estimated regression line that best fits the data according to the least squares criterion.

The least squares criterion is widely used in linear regression analysis because it is a simple and intuitive
method for estimating the regression coefficients that best fit the data. However, it is important to note
that there may be other methods for estimating the regression coefficients that may be more
appropriate for specific types of data or research questions.
[email protected]

skewness and kurtosis

Skewness and kurtosis are two measures of the shape of a probability distribution. Skewness describes
the degree of asymmetry in the distribution, while kurtosis describes the degree of peakedness or
flatness in the distribution.

Skewness:

Skewness is a measure of the asymmetry of a probability distribution. A distribution is said to be

symmetric if the two halves of the distribution are mirror images of each other. A positively skewed
distribution has a longer tail on the right side of the distribution, while a negatively skewed distribution
has a longer tail on the left side of the distribution. The degree of skewness can be quantified using the
skewness coefficient, which is calculated as:

skewness = (3 * (mean - median)) / standard deviation

where:

mean is the mean of the distribution

median is the median of the distribution

standard deviation is the standard deviation of the distribution

A skewness coefficient of 0 indicates a perfectly symmetric distribution. A positive skewness coefficient

indicates a positively skewed distribution, while a negative skewness coefficient indicates a negatively
skewed distribution. The magnitude of the skewness coefficient indicates the degree of skewness, with
larger magnitudes indicating more extreme skewness.
Kurtosis:

Kurtosis is a measure of the peakedness or flatness of a probability distribution. A distribution with high
kurtosis has a sharper peak and thicker tails, while a distribution with low kurtosis has a flatter peak and
thinner tails. The degree of kurtosis can be quantified using the kurtosis coefficient, which is calculated
as:

kurtosis = (Σ(xi - mean)^4 / n) / standard deviation^4 - 3

where:

xi is the ith observation in the distribution

mean is the mean of the distribution

standard deviation is the standard deviation of the distribution

n is the sample size

A kurtosis coefficient of 0 indicates a perfectly normal distribution, while positive values indicate a more
peaked distribution (i.e., leptokurtic) and negative values indicate a flatter distribution (i.e., platykurtic).
The magnitude of the kurtosis coefficient indicates the degree of peakedness or flatness, with larger
magnitudes indicating more extreme values.

It is important to note that skewness and kurtosis are just two measures of the shape of a probability
distribution and should be used in conjunction with other descriptive statistics and visualizations to fully
understand the distribution of the data.

[email protected]

percentile and quartiles

Percentiles and quartiles are commonly used measures of the distribution of a set of numerical data.

Percentiles:

A percentile is a measure that divides a set of data into 100 equal parts. For example, the 30th percentile
of a set of test scores is the score that is greater than or equal to 30% of the scores and less than or
equal to 70% of the scores. Percentiles are useful for comparing individual data points to the rest of the
distribution. The formula for calculating the pth percentile is:

p-th percentile = (p/100)(n+1)

where:

p is the desired percentile (e.g., 50th percentile is the median)

n is the sample size

Quartiles:

Quartiles divide a set of data into four equal parts. The first quartile (Q1) is the value below which 25% of
the data fall, the second quartile (Q2) is the value below which 50% of the data fall (i.e., the median),
and the third quartile (Q3) is the value below which 75% of the data fall. The difference between the
third and first quartiles is called the interquartile range (IQR) and is a measure of the spread of the
middle 50% of the data. The formulas for calculating the quartiles depend on the method used to define
the median:

If the median is defined as the middle value of the data, then:

Q1 is the median of the lower half of the data

Q3 is the median of the upper half of the data

If the median is defined as the average of the two middle values (i.e., for an even number of data
points), then:

Q1 is the median of the lower half of the data

Q3 is the median of the upper half of the data, excluding the median

Quartiles are useful for summarizing the spread of the data and identifying potential outliers. The first
quartile and third quartile can be used to define a box-and-whisker plot, which is a graphical
representation of the data that shows the quartiles, median, and potential outliers.

Tackling Child Marriage in Indonesia
No ratings yet
Tackling Child Marriage in Indonesia
83 pages
Statistics 02
No ratings yet
Statistics 02
8 pages
Introduction To Statistics (4485) : Semester: Spring, 2023
No ratings yet
Introduction To Statistics (4485) : Semester: Spring, 2023
26 pages
Chapter 4 Regression (2) - Unlocked
No ratings yet
Chapter 4 Regression (2) - Unlocked
97 pages
Econometrics For Finance
100% (1)
Econometrics For Finance
54 pages
Business Analytics: Advance: Simple & Multiple Linear Regression
No ratings yet
Business Analytics: Advance: Simple & Multiple Linear Regression
38 pages
Linear Regression Analysis: Module - Iv
No ratings yet
Linear Regression Analysis: Module - Iv
10 pages
Final Project: Raiha, Maheen, Fabiha Mahnoor, Zara
No ratings yet
Final Project: Raiha, Maheen, Fabiha Mahnoor, Zara
14 pages
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
No ratings yet
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
10 pages
Correlation Regression
No ratings yet
Correlation Regression
58 pages
Linear Regression Essentials
No ratings yet
Linear Regression Essentials
16 pages
Quantitative Anaysise Solomon
No ratings yet
Quantitative Anaysise Solomon
51 pages
Lecture 2
No ratings yet
Lecture 2
25 pages
14 Statistics and Probability
No ratings yet
14 Statistics and Probability
37 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Business Analytics Regression Guide
No ratings yet
Business Analytics Regression Guide
91 pages
DAM Class 21-24 Regression Analysis
No ratings yet
DAM Class 21-24 Regression Analysis
93 pages
Corr - Regression Analysis
No ratings yet
Corr - Regression Analysis
19 pages
Regression Course For Second Year (Chap 1-3)
No ratings yet
Regression Course For Second Year (Chap 1-3)
59 pages
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
No ratings yet
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
31 pages
Unit III
No ratings yet
Unit III
13 pages
Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
Regression Analysis Basics
No ratings yet
Regression Analysis Basics
12 pages
ch2 Linear Regression
No ratings yet
ch2 Linear Regression
39 pages
Bivariate Analysis & Associations
No ratings yet
Bivariate Analysis & Associations
66 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
45 pages
BSC - Applied Statistics - Correlation and SLR
No ratings yet
BSC - Applied Statistics - Correlation and SLR
67 pages
MetNum1 2023 1 Week 13
No ratings yet
MetNum1 2023 1 Week 13
70 pages
Business Mathematics & Statistics
No ratings yet
Business Mathematics & Statistics
31 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Class Note II - 044242
No ratings yet
Class Note II - 044242
19 pages
Correlation and Regression Analysis - Updated
No ratings yet
Correlation and Regression Analysis - Updated
49 pages
6 ASAP Advanced Statistics-Regression
No ratings yet
6 ASAP Advanced Statistics-Regression
53 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
Relationship - Correlation and Regression
No ratings yet
Relationship - Correlation and Regression
42 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Linear Regression Analysis - 1
No ratings yet
Linear Regression Analysis - 1
18 pages
Linear Regression & Correlation
No ratings yet
Linear Regression & Correlation
9 pages
Session 5 Marked B PDF
No ratings yet
Session 5 Marked B PDF
36 pages
Module 11
No ratings yet
Module 11
21 pages
Unit 2 Regression
No ratings yet
Unit 2 Regression
31 pages
Observarion + (+1) TH Median of Continuous Frequency9
No ratings yet
Observarion + (+1) TH Median of Continuous Frequency9
9 pages
Document
No ratings yet
Document
23 pages
Correlation and Regression Analysis: BMT 1063 Business Statistics
No ratings yet
Correlation and Regression Analysis: BMT 1063 Business Statistics
42 pages
Business Research Methods: Bivariate Analysis: Measures of Associations
No ratings yet
Business Research Methods: Bivariate Analysis: Measures of Associations
66 pages
Correlation and Regression
No ratings yet
Correlation and Regression
23 pages
Regression
100% (1)
Regression
43 pages
Correlation and Simple Linear Regression Analyses: Objectives
No ratings yet
Correlation and Simple Linear Regression Analyses: Objectives
6 pages
7 Regression
No ratings yet
7 Regression
96 pages
Linear Regression Analysis: Gaurav Garg (IIM Lucknow)
No ratings yet
Linear Regression Analysis: Gaurav Garg (IIM Lucknow)
96 pages
Linear Regression
No ratings yet
Linear Regression
22 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
12 pages
Y X y X N B: Linear Regression
No ratings yet
Y X y X N B: Linear Regression
7 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
MGT201
No ratings yet
MGT201
5 pages
Chapter 3 Extra
No ratings yet
Chapter 3 Extra
6 pages
Chapter 4 Extra
No ratings yet
Chapter 4 Extra
6 pages
1104financial Accounting & Reporting (Suggested Answers Jul-Aug 24)
No ratings yet
1104financial Accounting & Reporting (Suggested Answers Jul-Aug 24)
10 pages
401.FIMP - .L IV December 2020
No ratings yet
401.FIMP - .L IV December 2020
4 pages
401.FIMP - .L IV Question CMA JUNE 2020 Exam
No ratings yet
401.FIMP - .L IV Question CMA JUNE 2020 Exam
5 pages
24 Jan Fim Icmab
No ratings yet
24 Jan Fim Icmab
7 pages
FIM P.L IV December 2020
No ratings yet
FIM P.L IV December 2020
5 pages
24 Jan Icmab Solutions
No ratings yet
24 Jan Icmab Solutions
5 pages
CH 11 Cost of Capital
No ratings yet
CH 11 Cost of Capital
14 pages
What Is Monitoring
No ratings yet
What Is Monitoring
14 pages
Checklist For Structuring and Organizing A Comparison
No ratings yet
Checklist For Structuring and Organizing A Comparison
2 pages
Procurement Performance 1
No ratings yet
Procurement Performance 1
16 pages
Task 7 Writing A Report
No ratings yet
Task 7 Writing A Report
3 pages
South Indian Inscriptions Vol. 01
No ratings yet
South Indian Inscriptions Vol. 01
199 pages
Geoforum: Sciencedirect
No ratings yet
Geoforum: Sciencedirect
13 pages
Emerson, Et Al - Writing Ethnographic Field Notes (CH 1)
100% (1)
Emerson, Et Al - Writing Ethnographic Field Notes (CH 1)
14 pages
EXPERIMENT 8: DETERMINATION OF Fe IN A FERUM SOLUTION BY GRAVIMETRIC ANALYSIS CHM421
75% (8)
EXPERIMENT 8: DETERMINATION OF Fe IN A FERUM SOLUTION BY GRAVIMETRIC ANALYSIS CHM421
4 pages
Food Quality
No ratings yet
Food Quality
2 pages
RESEARCH Points
100% (1)
RESEARCH Points
10 pages
Market Research for Businesses
No ratings yet
Market Research for Businesses
9 pages
Jeepney Barkers: A Way of Life
No ratings yet
Jeepney Barkers: A Way of Life
4 pages
Patient Satisfaction With Primary Healthcare in Kashmir India
No ratings yet
Patient Satisfaction With Primary Healthcare in Kashmir India
12 pages
Engaging Grade 1 with Apps
No ratings yet
Engaging Grade 1 with Apps
3 pages
Aamer, A. M., Al-Awlaqi, M. A., & Alkibsi, S.M. (2017) .
No ratings yet
Aamer, A. M., Al-Awlaqi, M. A., & Alkibsi, S.M. (2017) .
24 pages
The Differences Between Formative and Summative Assessment
100% (1)
The Differences Between Formative and Summative Assessment
4 pages
Dani Rodrik and Mark R. Rosenzweig
No ratings yet
Dani Rodrik and Mark R. Rosenzweig
1 page
Blomstermo A, Sharma D, 2006, Choice of Foreign Market Entry Mode in Service Firms
No ratings yet
Blomstermo A, Sharma D, 2006, Choice of Foreign Market Entry Mode in Service Firms
20 pages
POE
No ratings yet
POE
37 pages
Business Analytics Assignment Guide
No ratings yet
Business Analytics Assignment Guide
5 pages
Assessment of Nutrional Status
No ratings yet
Assessment of Nutrional Status
35 pages
Supplier Evaluation Matrix Guide
No ratings yet
Supplier Evaluation Matrix Guide
6 pages
3-8 Week-2nd Sem-1st Session-Methods of Research in Computing - BSIS III
No ratings yet
3-8 Week-2nd Sem-1st Session-Methods of Research in Computing - BSIS III
40 pages
Test 2 Study Guide Test 2 Covers Sections 11.1-11.7.: X y Xy Yx
No ratings yet
Test 2 Study Guide Test 2 Covers Sections 11.1-11.7.: X y Xy Yx
2 pages
Performance Management vs Appraisal
No ratings yet
Performance Management vs Appraisal
28 pages
Engineering Mathematics - IV - Assignments - 1 - 2
No ratings yet
Engineering Mathematics - IV - Assignments - 1 - 2
3 pages
(FREE PDF Sample) Handbook of Regression Analysis With Applications in R, Second Edition Samprit Chatterjee Ebooks
100% (1)
(FREE PDF Sample) Handbook of Regression Analysis With Applications in R, Second Edition Samprit Chatterjee Ebooks
54 pages
Survey
No ratings yet
Survey
10 pages
Employers' Perception On Soft Skills of Graduates
No ratings yet
Employers' Perception On Soft Skills of Graduates
9 pages