Introduction to
Statistics
Chapter - 1
We are into that
journey…
And, we start
with
What is Statistics?
“statistics is a discipline that is dedicated to drawing actionable
insights from available data”
Statistics
Definition
• Science of gathering, presenting, analyzing, and
interpreting data
• Uses mathematics and probability
Parameter vs. Statistic
• Parameter — descriptive measure of the population
o Usually represented by Greek letters
denotes population parameter
2 denotes population variance
denotes population standard deviation
• Statistic — descriptive measure of a sample
o Usually represented by Roman letters
x denotes sample mean
s 2 denotes sample variance
s denotes sample standard deviation
Copyright 2011 John Wiley & Sons, Inc. 6
Parameter vs. Statistic
Population Sample
Subset
Statistic
Parameter
Populations have Parameters,
Samples have Statistics.
1.7
Indian Census
• Every 10 years, the
Govt. attempts to
measure all persons
living in the country.
• The Census 2011 was
the 15th National
census survey
conducted by the
Census Organization of
India.
Branches of statistics
Descriptive Statistics
…If a business analyst is using data gathered on a group to
describe or reach conclusions about that same group, the
statistics are called descriptive statistics. The methods
include:
Graphical Techniques and
Numerical Techniques
The actual method used depends on what information we
would like to extract. Are we interested in…
• measure(s) of central location? and/or
• measure(s) of variability (dispersion)?
1.10
Inferential Statistics
Descriptive Statistics describe the data set that’s
being analyzed, but doesn’t allow us to draw any
conclusions or make any interferences about the
data. Hence we need another branch of statistics:
inferential statistics.
Inferential statistics is also a set of methods, but it is
used to draw conclusions or inferences about
characteristics of populations based on data from a
sample.
1.11
Statistical Inference
Statistical inference is the process of making an
estimate, prediction, or decision about a population
based on a sample.
Population
Sample
Inference
Statistic
Parameter
What can we infer about a Population’s Parameters
based on a Sample’s Statistics?
Data and Data Sets
Data are the facts and figures collected, analyzed,
and summarized for presentation and interpretation.
All the data collected in a particular study are referred
to as the data set for the study.
Definitions…
A variable is some characteristic of any entity being studied that
is capable of taking different values.
E.g. student grades, age of a worker, return on
investment, total sales.
Typically denoted with a capital letter: X, Y, Z…
The values of the variable are the range of possible values for a
variable.
E.g. student marks (0..100)
age of a worker (18.. 65)
Data are the observed values of a variable.
E.g. student marks: {67, 74, 71, 83, 93, 55, 48}
age of a worker: {25, 53, 35, 42, 27}
Types of Data
Data Type Information Type Measurement Type
Categorical Do you practice data? Yes No
Discrete How many books do
you have in ur library? Number
Numerical
Continuous What is your height? Centimeters
or Inches
Scales of Measurement
• A measurement is when a standard process is used
to assign numbers to particular attributes or
characteristics of a variable.
• Many measurements are obvious, such as time
spent in a store shopping by a customer, age of a
worker.
• However, some measurements, such as customer
satisfaction, return on investment, have to be
defined by a business researcher.
• Once such measurements are recorded and stored,
they can be denoted as data.
The following are the four common data
measurement levels used
Nominal Level
For the nominal level of measurement observations of a
qualitative variable can only be classified and counted.
Example: In which of the following departments do you
work?
1. Marketing
2. HR
3. Information Technology
4. Operations
5. Finance and Accounting
6. Any other (please specify)
OTHER EXAMPLES
• Gender
• Religion
• Geographic location The numbers assigned in a
• Place of Birth nominal scale cannot be
added, subtracted, multiplied
• Telephone number or divided.
• Employee ID number
ORDINAL LEVEL
• The next higher level of data is the ordinal level
• Rating of a finance Professor
RATING FREQUENCY
5 SUPERIOR 6
4 GOOD 28
3 AVERAGE 25
2 POOR 12
1 INFERIOR 3
One classification is “higher” or “better” than the next
one.
However, we are not able to distinguish the magnitude of
the differences between groups.
Examples of Ordinal Scale
• Mutual funds as investments are
sometimes rated as High, Rank Company
medium and low risk. 1 Walmart
2 Exxon Mobil
High Risk is assigned 3 3 Chevron
Medium Risk is assigned 2 4 Berkshire Hathaway
Low Risk is assigned 1 5 Apple
6 General Motors
• Ranking of top 10 most admired 7 Phillips 66
companies in Fortune Magazine
8 General Electric
in 2015. Companies are ranked
by total revenues for their 9 Ford Motor
respective fiscal years. 10 CVS Health
One can compute median, percentiles and quartiles of the distribution.
INTERVAL LEVEL DATA
• This is the next highest level of data measurement.
• It includes all the characteristics of the ordinal level, but
in addition, the difference between values is a constant
size.
• Example: The high temperature on three consecutive
summer days in Delhi are 42, 44 and 43 degrees Celsius.
• There temperatures can be easily ranked, but we can
also determine the difference between temperatures.
Important : Zero is just the point on a scale. It does not
represent the absence of the condition.
One can compute arithmetic mean, standard deviation, correlation coefficient and
conduct a t-test, Z-test, regression analysis and many more….
RATIO LEVEL DATA
• The ratio level is the “highest” level of measurement.
It has all the characteristics of the interval level, but
in addition, the 0 point is meaningful and the ratio
between two numbers is meaningful.
• Examples: Wages, weight, changes in stock prices,
distance between branch offices, and height.
• Father – Son Income Combination
NAME FATHER SON
Laheyo $ 80,000 $ 40,000
Nale 90,000 30,000
Rho 60,000 1,20,000
Steele 75,000 1,30,000
Examples of Ratio Level
• Ratio scales are usually used in organizational
research when exact figures on objective factors
are desired.
1. How many other organizations did you work for
before joining this job?
2. Please indicate the number of children you have
in each categories:
Over 6 years but under 12
12 years and over
3. How many retail outlets do you operate?
Data Level, Operations, and
Statistical Methods
Data Level Meaningful Operations
Nominal Classifying and Counting
Ordinal All of the above plus Ranking
Interval All of the above plus Addition,
Subtraction, Multiplication, and
Division (including means,
standard deviations, etc.)
Ratio All of the above
Qualitative and Quantitative Data
• Data can be further classified as being qualitative or
quantitative.
• The statistical analysis that is appropriate depends on whether
the data for the variable are qualitative or quantitative.
• In general, there are more alternatives for statistical analysis
when the data are quantitative.
Qualitative Data
• Labels or names used to identify an attribute of each element.
• Often referred to as categorical data.
• Use either the nominal or ordinal scale of measurement
• Can be either numeric or nonnumeric
• Appropriate statistical analyses are rather limited
Quantitative Data
Quantitative data indicate how many or how
much:
discrete, if measuring how many
continuous, if measuring how much
Quantitative data are always numeric.
Ordinary arithmetic operations are meaningful for
quantitative data.
Scales of Measurement
Data
Qualitative Quantitative
Numerical Nonnumerical Numerical
Nominal Ordinal Nominal Ordinal Interval Ratio
Cross-sectional versus Time
series data
• Cross-sectional data are collected at the same or
approximately the same point in time.
Example: data detailing the number of building
(warehouse) permits issued in Jan 2015 in each of
the states of India.
• Time series data are collected over several time
periods.
Example: data detailing the number of building
(warehouse) permits issued in Delhi/NCR, in each of
the last 36 months
Data Sources
• Existing Sources
o Data needed for a particular application might
already exist within a firm. Detailed information is often
kept on customers, suppliers, and employees for
example.
o Substantial amounts of business and economic data
are available from organizations that specialize in
collecting and maintaining data.
Data Sources
• Existing Sources
o Government agencies are another important source of
data.
o Data are also available from a variety of industry
associations and special-interest organizations.
Data Sources
• Internet
o The Internet has become an important source of data.
o Most government agencies, like the Bureau of the Census
(www.census.co.in), make their data available through a
web site.
o More and more companies are creating web sites and
providing public access to them.
o A number of companies now specialize in making
information available over the Internet.
Data Sources
Some Economic & Corporate databases
Indiastat.com
Economic Outlook Database :CMIE
PROWESS Database
CRISIL Database
Applications in
Business and Economics
Accounting
Public accounting firms use
statistical sampling procedures
when conducting audits for their
clients.
Economics
Economists use statistical
information in making forecasts
about the future of the economy or
some aspect of it.
Applications in
Business and Economics
Marketing
Electronic point-of-sale scanners at
retail checkout counters are used to
collect data for a variety of
marketing research applications.
Production
A variety of statistical quality control
charts are used to monitor the
output of a production process.
• Finance
Financial advisors use price-earnings ratios and
dividend yields to guide their investment
recommendations.