Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
54 views48 pages

Lecture 02

The document discusses various concepts related to data collection and sampling. It defines key terms like population, sample, parameter, and statistic. It explains different types of data like categorical, numerical, time series, and cross-sectional data. It also outlines different scales of measurement for data like nominal, ordinal, interval, and ratio scales. Finally, it covers different sampling methods like simple random sampling, systematic sampling, stratified sampling, cluster sampling, and differences between statistical and non-statistical sampling.

Uploaded by

Yin Yin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views48 pages

Lecture 02

The document discusses various concepts related to data collection and sampling. It defines key terms like population, sample, parameter, and statistic. It explains different types of data like categorical, numerical, time series, and cross-sectional data. It also outlines different scales of measurement for data like nominal, ordinal, interval, and ratio scales. Finally, it covers different sampling methods like simple random sampling, systematic sampling, stratified sampling, cluster sampling, and differences between statistical and non-statistical sampling.

Uploaded by

Yin Yin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 48

LECTURE 2

Data Collection
Contents

 Basic Concepts
 Scales of Measurement
 Sampling Concepts
 Sampling Methods
 Surveys
Objectives
 Explain the distinction between numerical and categorical
data
 Explain the difference between time series and cross-
sectional data
 Recognize levels of measurement in data
 Explain the common sampling methods and how to
implement them
 Describe basic elements of survey design, survey types,
and sources of error
 Recognize a Likert scale and know how to use it
Definition
What is data
 Data are facts and figures… collected for analysis, presentation and interpretation.
 Variable: a characteristic about the items that we want to study (e.g., student name, Gender, DOB).
 Observation: a single member of items that we want to study, such as a student, firm, or region.
 Data set: all the values of all of the variables for all of the observations we chose.
What is data
Variable

Employee Name Gender DOB Annual


Income in $

Observation Gladys Simpson Female 1-May- 120,000


1971

Divid Hinds Male 17-Dec- 135,000


1968

Kenneth Henry Male 3-Sep- 98,000


1965

A data set with 3 observations


Types of Data
 Categorical or qualitative data have values that
are described by words, may be coded.
 Numerical or quantitative data comes from
counting, measuring, or mathematical operation.
Time Series Data

 Each observation represents a different


equally spaced point in time.
 Periodicity may be annual, quarterly,
monthly, weekly, daily, hourly, etc.
 To study trends and patterns over time
 Example: daily closing price of a certain
stock recorded last week.
Cross-Sectional Data
 Each observation represents a different
individual unit at the same point in time.
 To study variation among observations or
relationships.
 Example: daily closing prices of a group of
20 stocks recorded on December 1, 2015.
Pooled Data
 Combine the two data types to get pooled
cross-sectional and time series data.
 Example: daily closing price of a group of
20 stocks recorded last week.
Collecting Data
Primary Secondary
Data Collection Data Compilation

Print or Electronic

Observation Survey

Experimentation
Scales of Measurement
Scales of Measurement

Scales
Scales of
of measurement
measurement include:
include:
Nominal Interval
Ordinal Ratio

The
The scale
scale determines
determines the
the amount
amount of
of information
information
contained
contained in
in the
the data.
data.

The
The scale
scale indicates
indicates the
the data
data summarization
summarization and
and
statistical
statistical analyses
analyses that
that are
are most
most appropriate.
appropriate.
Nominal Scale

 Data are labels or names used to identify an


attribute of the element.
 Example:
 Students of a university are classified by as Business,
Humanities, Education, and so on.
 Alternatively, a numeric code could be used for the
school variable (e.g. 1 denotes Business, 2 denotes
Humanities, 3 denotes Education, and so on).

 No ordering.
Ordinal Scale

 The data have the properties of nominal data and


the order or rank of the data is meaningful.
 Example:
 Students of a university are classified as Freshman,
Sophomore, Junior, or Senior.
 Alternatively, a numeric code could be used for the
class standing variable (e.g. 1 denotes Freshman, 2
denotes Sophomore, and so on).

 Ordering, but differences have no meaning.


Interval Scale
 The data have the properties of 0 °C 32.0 °F
ordinal data, and the difference 1 °C 33.8 °F
between measurements is meaningful 2 °C 35.6 °F
quantity, but the measurements have 3 °C 37.4 °F
no true zero value. 4 °C 39.2 °F
 Example: 5 °C 41.0 °F
6 °C 42.8 °F
 Difference between a temperature of 00C
and 20C is the same difference as
between 20C and 40C, but we couldn’t say
that 40C is as twice as hot as 20C.
 Differences have meaning, but ratios have no
meaning.
Ratio Scale

 The data have all the properties of interval data


and the ratio of two values is meaningful.
 The measurements have a true zero value.
 Example:
 Kevin has $200, while Melissa has $100. Kevin has
twice as much money as Melissa.

 Ratios have meaning.


Scales of Measurement

Differences between
Ratio Data measurements and ratios
 money
Interval Data Differences between
measurements but no ratio
 year, temperature,…

Ordinal Data Ordered Categories

Nominal Data Categories (no ordering)


Quiz
Classify each of the following as Nominal, Ordinal, Interval
or Ratio data:
1.letter grade you will receive in this class ordinal
2.country you were born in nominal
3.amount of money you have ratio
4.gender of customer nominal
5.brand of chocolate you prefer nominal
6.year of your birth interval
7.weight of a package ratio
8.satisfaction rating from 1 to 5 ordinal
9.pizza sizes  (Small, Medium, Large, Extra Large) ordinal
Quiz

An investment firm rates bonds for AardCo Inc. as


"B+" while bonds of Deva Corp. are rated "AA."
Which level of measurement would be appropriate
for such data?
Quiz

We can perform different operations on the various


type of data.  For each of the following type of data
(Nominal 1, Ordinal 12, Interval 123, or Ratio 1234)
on which we can perform the operation:
1.count the frequencies
2.put in order
3.add items
4.divide one item by another
Sampling Concepts
Population vs. Sample
 A population is the collection of all items of
interest or under investigation, could be finite or
infinite.
 A census is an examination of all items in a
defined population.
 A sample is an observed subset of the
population.
Population vs. Sample
Population Sample

a b cd b c
ef gh i jk l m n gi n
o p q rs t u v w o r u
x y z y
Parameters vs. Statistics

A parameter is a specific characteristic of a population


A statistic is a specific characteristic of a sample
Sampling Concepts

 The population must be carefully specified and


the sample must be drawn scientifically so that
the sample is representative.
 The target population is the population we are
interested in (e.g., U.S. gasoline prices).
 The sampling frame is the group from which we
take the sample (e.g., 115,000 stations).
Sampling Concepts

 If we allow duplicates when sampling, then we


are sampling with replacement.
 Duplicates are unlikely when n is much smaller
than N.
 If we do not allow duplicates when sampling,
then we are sampling without replacement.
Sampling Methods
Sampling Methods

Sampling Methods

Nonstatistical (non-
Statistical Sampling
random) Sampling
Convenience Simple Systematic
Random
Judgment
Cluster
Focus group Stratified
Nonstatistical Sampling
(Non-random Sampling)
 Convenience Sample
 Use a sample that happens to be available (e.g., ask
co-worker opinions at lunch).
 Judgment Sample
 Use expert knowledge to choose “typical” items (e.g.,
which employees to interview).
 Focus Groups
 In-depth dialog with a representative panel of
individuals (e.g. iPhone users).
Statistical Sampling

 Items of the sample are chosen based on


known or calculable probabilities

Statistical Sampling
(Probability Sampling)

Simple Random Systematic Stratified Cluster


Simple Random Sampling

 Every member of the population has an equal


chance of being selected
 Every possible sample of a given size has an equal
chance of being selected
 Selection may be with replacement or without
replacement
 The sample can be obtained using a table of random
numbers or computer random number generator
Computer Methods
Systematic Random Sampling
 Decide on sample size: n
 Divide frame of N individuals into n groups of k
individuals: k=N/n
 Randomly select one individual from the first
group
 Select every kth individual thereafter
N = 64
n=8 First Group
k=8
Stratified Random Sampling
 Divide population into subgroups (called strata)
according to some common characteristic (e.g.
age, gender, occupation)
 Select a simple random sample from each
subgroup
 Combine samples from subgroups into one

Population
Divided
into 4
strata

Sample
Cluster Sampling
 Divide population into several “clusters” (e.g.
regions), each representative of the population
 One-stage cluster sampling: randomly selected k clusters
 Two-stage cluster sampling: randomly select k clusters and
then choose a random sample of elements within each cluster.

Population
divided into
16 clusters. Randomly selected
clusters for sample
Quiz

Professor Hardtack chose a sample of 7 students from his


statistics class of 35 students by picking every student who
was wearing red that day. Which kind of sample is this?
Stratified

Thirty work orders are selected from a filing cabinet containing


500 work order folders by choosing every 15th folder. Which
sampling method is this?
Systematic
 
 
Quiz

A manager chose two people from his team of eight to give an


oral presentation because she felt they were representative of
the whole team's views. What sampling technique did she use
in choosing these two people? Focus group
 
From its 32 regions, the FAA selects 6 regions, and then
randomly audits 25 departing commercial flights in each
region for compliance with legal fuel and weight requirements.
This is an example of what sampling technique? Clusters

 
Survey
Basic Steps of Survey Research
 Step 1: State the goals of the research
 Step 2: Develop the budget (time, money, staff)
 Step 3: Create a research design (target population,
frame, sample size).
 Step 4: Choose a survey type and method.
 Step 5: Design a data collection instrument
(questionnaire).
 Step 6: Pretest the survey instrument and revise as
needed.
 Step 7: Conduct the survey.

 Step 8: Code the data and analyze the data.
Questionnaire Design
 Begin with short, clear instructions.
 State the survey purpose.
 Assure anonymity.
 Instruct on how to submit the completed survey.
 Break survey into naturally occurring sections
 Let respondents bypass sections that are not applicable
(e.g., “if you answered no to question 7, skip directly to
Question 15”).
Types of Questions
 Open-ended
 Fill-in-the-blank
 Check boxes
 Ranked choices
 Pictograms
 Likert scale
Likert Scales
Likert Scales (examples)
Question Wording
 The way a question is asked has a profound
influence on the response. For example,
1. Shall state taxes be cut?
2. Shall state taxes be cut, if it means reducing
highway maintenance?
3. Shall state taxes be cut, if it means firing teachers
and police?
Question Wording
 Make sure you have covered all the
possibilities, for example:
Are you married?  Yes  No
 Avoid overlapping classes or unclear
categories, for example:
How old is your father?

 35 – 45
 45 – 55
 55 – 65
 65 or older
Sources of Errors
Source of Error Characteristics
Respondents differ from non-
Nonresponse bias
respondents
Self-selected respondents are
Selection bias
atypical
Respondents give false
Response error
information
Incorrect specification of frame or
Coverage error
population
Unclear survey instrument
Measurement error
wording
Responses influenced by
Interviewer error
interviewer
Sampling error Random and unavoidable
Coding and Data Screening
 Responses are usually coded numerically
(e.g., 1 = male 2 = female).
 Missing values are typically denoted by special
characters (e.g., blank, “.” or “*”).
 Discard questionnaires that are flawed or
missing many responses.
 Watch for multiple responses or inconsistent
replies or range answers.
Online Survey Tools
 www.surveymonkey.com
 www.sogosurvey.com

You might also like