Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views38 pages

Survey Design and Data Analysis Guide

Business Statistics II

Uploaded by

emjay2010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views38 pages

Survey Design and Data Analysis Guide

Business Statistics II

Uploaded by

emjay2010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 38

General Objective 1: Understand survey planning and designs.

List sources of data.


1.2 Systematize principles of data.
1.3 Define coding and processing
1.4 Design questionnaires

General Objective 2: Understand sampling theory


2.1 Define sampling theory.
2.2 Systematize sampling methods
2.3 Solve problems involving samples.
2.4 Define small samples.
2.5 Define large samples.
2.6 Solve problems involving 2.4 and 2.5 above.

General Objective 3: Know inferential statistics


3.1 Define univariate distributions.
3.2 Define Binomial distributions.
3.3 Define Poisson distribution.
3.4 Solve problems involving 3.1-3.3 above.
3.5 Define Normal distribution.
3.6 Explain the use of tables
3.7 Define hypothesis.
3.8 Test hypothesis for small samples.
3.9 Test hypothesis for large samples.
3.10 Define univariate distributions.
3.11 Define Binomial distributions.
3.12 Define Poisson distribution.
3.13 Solve problems involving 3.1-3.3 above.

General Objective 4: Understand bivariate distributions


4.1 Define bivariate distributions.
4.2 Solve problems involving bivariate distributions.

General Objective 5: Understand point and interval estimates


5.1 Define point and interval estimates
5.2 Solve problems involving point and interval estimate.

General Objective 6: Know mathematical expectation


6.1 Define mathematical expectations.
6.2 Solve problems involving mathematics expectations.

General Objective 7: Understand moments generating functions (GMF)


7.1 .Define moments generating functions (GMF).
7.2 Systematize GMF characteristics.
7.3 Solve problems involving GMF.

General Objective 8: Know the tests of Linear Regression


8.1 Explain tests of linear regression
8.2 Solve problems involving simple regression.
8.3 Solve problems involving multiple regression.
8.4 Test for the regression.

General Objective 9: Understand analysis of variance (ANOVA)


9.1 Define analysis of Variance.
9.2 Solve problems involving ANOVA (one way)

General Objective 10: Understand measures of welfare


10.1 Define welfare measures.
10.2 Define index numbers
10.3 Define income
MODULE ONE
Understand survey planning and designs
Types of Data
Data can be classified into types based on different criteria viz:
(1) Based on sources – Data can be classified base on the sources from which they are
obtained. In this regards, we have:
(a) Primary data – These are data collected directly from the field of enquiries by the user(s)
themselves.
Advantages
– They are always relevant to the subject under study because they are collected primarily for
the purpose.
- They are more accurate and reliable
- Provide opportunity for the researcher to interact with study population.
- Information on other relevant issues can be obtained

Disadvantages – Always costly to collect


- Inadequate cooperation from the study population
- Wastes a lot of time and energy
(b) Secondary Data: These are data, which have been collected by someone else or some
organization either in published or unpublished forms.
Advantages: - It is easier to get
- It is less expensive
Disadvantages:-May not completely meet the need of the research at hand because it was not
collected primarily for that purpose
- There is always a problem of missing periods
(2) Classification based on form of the data: Sometimes, data are classified based on the form
of the data at hand and may be classified as:
(a) Cross-sectional data – These are data collected for cross-section of subjects (population
under study) at a time. For example, data collected on a cross-section of household on demand
for recharge card for the month of August 2018.
(b) Time-series data – These are data collected on a particular variable or set of variables over
time e.g a set Nigeria’s Gross Domestic Product (GDP) values from 1970 to2012.
(c) Panel Data – These combine the features of cross-sectional and time-series data. They are
type of data collected from the same subjects over time. For example, a set of data collected
on monthly recharge card expenditure from about 100 households in Lagos from January to
December 2013 will form a panel data.
Note that Social and Economic data of national importance are collected routinely as by-
product of governmental activities e.g. information on trade, wages, prices, education, health,
crime, aids and grants etc.
Sources of Data
1. Source of Primary data:
(i) Census
(ii) Surveys
2. Sources of Secondary data:
(i) Publications of the Federal Bureau of statistics
(ii) Publications of Central Bank of Nigeria
(iii) Publications of National population commission
(iv) Nigerian Custom Service
(v) Nigeria Immigration Service
(vi) Nigerian Port Authority
(iv) Federal and State Ministries, Departments and Agencies
Some of the publications referred to above are:
(i) Annual Digest of statistics (by NBS)
(ii) Annual Abstract of statistics (by NBS)
(iii) Economic and Financial Review (by CBN)
(iv) Population of Nigeria (by NPC)

Data Coding
Data coding in research methodology is a preliminary step to analyzing data. The data that is
obtained from surveys, experiments or secondary sources are in raw form. This data needs to
be refined and organized to evaluate and draw conclusions. Data coding is not an easy job and
the person or persons involved in data coding must have knowledge and experience of it.
What is a code?
A code in research methodology is a short word or phrase describing the meaning and context
of the whole sentence, phrase or paragraph. The code makes the process of data analysis
easier. Numerical quantities can be assigned to codes and thus these quantities can be
interpreted. Codes help quantify qualitative data and give meaning to raw data.
What is data coding?
Data coding is the process of driving codes from the observed data. In qualitative research the
data is either obtained from observations, interviews or from questionnaires. The purpose of
data coding is to bring out the essence and meaning of the data that respondents have
provided. The data coder extract preliminary codes from the observed data, the preliminary
codes are further filtered and refined to obtain more accurate precise and concise codes. Later,
in the evaluation of data the researcher assigns values, percentages or other numerical
quantities to these codes to draw inferences. It should be kept in mind that the purpose of data
coding is not to just to eliminate excessive data but to summarize it meaningfully. The data
coder should ascertain that none of the important points of the data have been lost in data
coding.
Coding Examples
Few examples are mentioned here to understand the data coding in a better manner.
“I prefer to shop from a store that provides a large inventory of the same product, every brand
and every style in that product range. Usually in these stores you get maximum range of
products you want to purchase. You get profits through deals and sales.”
The data coder can assign different codes to what the respondent narrated above. These codes
might be as following;
“Preference for horizontal markets”
“Horizontal integration”
“Shopping preference”
Preliminary codes
When data coder assigns codes to the observed data, he cannot manage to assign well-refined
codes in the first instance. He has to assign some preliminary codes first so that the data has
become concise. He later on, further refines the codes to get the final codes. It must be kept in
mind that codes are not the final words or phrases on the basis of which evaluation will be
made. The researcher will filter the preliminary codes and then the final codes. He needs a
pattern on the basis of which he can categorize the human behavior, action or likes and
dislikes.
Final codes
The final codes will help you observe a better pattern in the data. This pattern is necessary to
reach the final evaluation or analysis stage of the data. The final codes in data coding
mean finding out meaningful words and phrases from the observed data. The respondents
often do not choose meaningful words in their responses. The coder needs to extract the
meaning out of the respondent’s wording. The codes in their final stage are like topics and
themes, these themes generate a whole discussion to get the final results. Sometimes the
interviewer or the observer writes down some codes as he observes the behavior of the
respondent. Such codes are really worthy in the research because these codes cannot be
derived from the written responses that the respondents provide. The data coder should look
for the verbs and the actions that the respondent has mentioned in the text. He should also
observe the behavior and where ever possible derive codes. One thing should be kept in mind
that qualitative data analysis is all about finding out the meanings and interpretations, so the
coder should have an eye for such things.
Categories
The codes are given meaningful names and they are put in categories. These categories help
refine the research a lot. When data is coded again and again, it get refined. The refined data
itself leads to patterns and themes. The patterns are the key to find out the true results of the
research. These patterns or categories determine where does the large amount of the data
inclines.

Why Coding?
Steps in data management
Prepare the data collection instrument and collect the data;
Prepare the data dictionary or codebook;
Tips on Coding:
Prepare the data matrix worksheets;
Prepare instructions for data entry and data analysis.

Why Coding?
All research collects data of some sort. In order to make sense of the data, it must be
analyzed. Analysis begins with the labeling of data as to its source, how it was collected, the
information it contains, etc.
Working with original data, however, can be very cumbersome, whether it is hundreds of
mailed questionnaires, figures on yearly accident rates for the fifty states, or observations of
classroom behavior of school children. For this reason, data are often coded.
Coded allow the researcher to reduce large quantities of information into a form than can be
more easily handled, especially by computer programs. Not all data need to be coded. For
example, the accident rates for the fifty states would not be coded, but each state could be
assigned a number (1 through 50) instead of using the state name. There are also content
analysis computer programs that help researchers to code textual data for qualitative or
quantitative analysis.

Steps in data management


a) prepare the data collection instrument and collect the data;
b) prepare the data dictionary or codebook;
c) prepare the data matrix worksheets;
d) prepare instructions for data entry and data analysis.

A. Prepare the data collection instrument and collect the data. Example: Quality of
Work Life Questionnaire
1. Name of Division where you work: _____________________________
2. How long have you been an employee in this company? _______years
3. How many county-sponsored training sessions have you attended? _____
4. What is your job classification?
_____Management
_____Technical
_____Administrative
_____Clerical
5. Is your position
_____supervisory
_____non-supervisory
6. Sex
_____male
_____female
7. In what area would you like to receive additional training? ___________

B. Prepare the data dictionary or codebook.


If data are to be entered into a computer program, whether a spreadsheet, data base, or
statistical program, they must be entered in exactly the same way for each person,
questionnaire, state, or other unit of analysis.
Many computer programs have limits on way data can be entered, stored, and retrieved.
These limits should be reflected in the codebook. For example, the names of your variables
often cannot exceed eight characters. Use short variable names, preferably all letters. You
generally can use numbers as well as letters in variable names, but you cannot use spaces,
punctuation, or other special characters.
The variable names you assign to the data should reflect the nominal definitions of the
variables themselves, such as "age," "jobclass," "seniority," and so forth. You may want to
adopt a rule such as using only lower case letters for any alphanumeric data that you enter, or
only uppercase letters. This will make typing variable names easier later when you must tell
the computer program which variables to analyze.
Data can be stored in many ways. The most common form for variables is numeric data,
consisting only of numbers. Usually this allows for fractions to be stored as decimals, for
example, 2.3 or 0.888
Data can also be stored as letters, called alpha-numeric format. This allows the variable to
be stored as either letters or numbers or a combination of the two. For example, you could
store first names, such as "Amy," "Brad," "Caroline," etc. or combinations such as apartment
numbers (102b), or license plate numbers (3XGJ429), etc.
In neither case should data ever be entered with spaces, punctuation marks, or any special
characters of any kind. Large numbers should not have any commas placed in them; names
should not have any periods, dashes, quotation marks, etc.
The codebook tells the coder how each questionnaire will be coded for data entry. It
specifies the question on the questionnaire from which the data is taken, the variable name,
the operational definition of the variable, the coding options, and the type of variable
(numeric or alpha-numeric) and the number of columns the variable requires.
Example: Quality of Work Life Codebook

Q. Variable Operational Definition Coding Col.


No. Name type
ID Questionnaire Number 001-999 1-3
num
1 DIVISION Name of Division where you work? Planning=1 4
Traffic=2 num
Engineering=3
Enforcement=4
missing=9
2 LENGTH How long have you been an employee 01-98 5-6
in this company? missing=99 num
3 TRAINING How many county-sponsored training 00-98 7-8
sessions have you attended? missing=99 num
4 JOBCLASS What is your job classification? Management=1 9
Management, Technical, Technical=2 num
Administrative, Clerical Administrative=3
Clerical=4
missing=9
5 SUPER Is your position supervisory or non- non-supervisory=0 10
supervisory? supervisory=1 num
missing=9
6 SEX Sex: male, female male=0 11
female=1 num
missing=9
7 NEEDS In what area would you like to receive supervising=1 12
additional training? budgeting=2 num
computers=3
personnel=4
other=5
missing=9

Tips on Coding:
1. Use numbers to represent response categories. For example,

on a scale of attitudes about on a survey of where city on a survey of college


work, residents live, majors,
5=Very satisfied Central=1 Business=1
4=Satisfied Eastside=2 Education=2
3=Neutral North=3 Engineering=3
2=Dissatisfied Westside=4 Health=4
1=Very Dissatisfied Liberal Arts=5
Science=6

2. Use zero and one to code variables with binary response categories, such as:
Are you a supervisor? No=0 Yes=1
Sex: Male=0 Female=1
Are you at headquarters or in the field? Headquarters=0 Field=1
(Be sure to use the number zero, and not the letter "O"; and the number one, not the letter
"L").
3. The same data can be coded in more than one way. For example, the following data on
what materials the library should acquire can be coded in two different ways:

data: Code for Subject Matter, Code for type of material,


-books on the middle ages e.g.: e.g.:
-data bases History reference works
-journals in criminal justice Business electronic media
-videos & films Art books
-reference works Government journals
-business reports
-government documents reports
-Internet contacts

4. One question on a questionnaire can yield more than one variable. For example: What type
of training would you like to receive?
_____supervising _____budgeting _____computers _____personnel

This can be coded as one Or as two variables, Or as four variables,


variable, indicating first and second indicating a yes/no
TRAINING choices: preference for each type:
1=supervising TRAIN1 TSUPER
2=budgeting 1=supervising 0=no 1=yes
3=computers 2=budgeting TBUDGET
4=personnel 3=computers 0=no 1=yes
4=personnel TCOMPUT
TRAIN2 0=no 1=yes
1=supervising TPERS
2=budgeting 0=no 1=yes
3=computers
4=personnel

The researcher has to try to anticipate how the data will look. A good idea of this can be
gained from doing a pilot test of the instrument, and a dry run of the data collection process. It
is important to be sure to leave enough columns to properly code the information for each
variable, and to provide enough variables to capture all the richness, complexity, and variety
of data that has been collected.
If a sample of college students is asked about barriers they encounter is attempting to use
the campus library, will students be asked to list the one main barrier, to rank order all the
barriers, or to choose only the barriers relevant to them? And what if the students do not
follow the instructions? Depending on what shape the data come in, the researcher will have
to decide how to code this information, using one, two, or many variables.

C. Prepare the data matrix worksheets;


When data are to be entered into a computer program for statistical analysis, usually this
takes the form of a matrix. The variable names are entered at the tops of the columns which
will contain the data for that variable, and the case records are entered across the rows.
Example:
Data Entry Worksheets Quality of Work Life Codebook

Id Division Length Training Jobclass Super Sex Needs


1-3 4 5-6 7-8 9 10 11 12
001 3 22 15 4 0 1 4
002 1 1 3 2 1 0 1
003 2 9 99 3 0 0 3

Each single numeral or character that is entered into a computer program takes up one column
of space. Each datum can be found by knowing its location by column number in the matrix.
Columns 1 through 3 taken together represent the person's employee ID number.
Column 4 represents the division worked in.
Columns 5-6 represent the length of time employed.
Columns 7-8 represent the number of training classes taken (note that the information on
number of classes taken is missing for person number 003).
Column 9 represents the person's job classification.
Column 10 indicates whether the person is a supervisor or not.
Column 11 indicates whether the person is male or female.
Column 12 indicates what type of training the person wants in the future.
Each record, case, questionnaire, or other unit of analysis is represented by a single row of
data across the matrix. For example, person 001 is found in row 1; person 002 in row 2; and
person 003 in row 3.
Each record must be entered in exactly the same way. If the position of the data are to be
entered in fixed-columns, this is referred to as fixed-field format. If data are missing for a
record on any of the variables, something must still be entered into that field. Usually this is a
number indicating that the data is missing. For a 1-column field, use the number 9; for a two-
column field, use 99; and so forth. Just make sure that "9" or "99" is not also a valid response.
In that case, use some other number; some computer programs will allow you to use a period
(".") as a placeholder that is also an indicator of missing data.
When you ask the computer, for example, the compute the average length of time employed
of all the employees in your survey, the computer will look in columns 5-6 of each record. It
will take whatever it finds there, and attempt to compute an average. It is important, therefore,
that all length of employment data be in columns 5-6 for every record, and that no other type
of data be in columns 5-6. The computer will disregard missing data codes (i.e., values of
"99") in computing the average.
Many computer programs have a limitation of a total of 80 columns of data per record. This is
a holdover from when data were punched on cardboard cards that were fed into card readers,
rather than entering data directly into the computer. If your data require more than 80
columns, you will have to construct additional data matrices to record the remainder of the
information for each record.

D. Prepare instructions for data entry and data analysis.


Data coding may be done directly on the data collection instrument (e.g., questionnaire)
and then transferred to the data coding sheets, or entered directly into the computer. It is
important to prepare detailed instructions for data coding and data entry, especially if these
tasks are shared among or performed by several different people.
There are a number of statistical, spreadsheets, and data base programs that can be used for
data entry. Most programs will save the data and allow it to be output as a plain text or ASCII
file, which is accepted by most statistical programs, such as SAS, SPSS, or STATA. Most of
these programs are available in a desktop version, and many also come in cheaper student
versions as well, such as Student Stata and Mystat.
There are also a number of stand-alone products such as DataPerfect, which can be easily
programmed to look just like the data collection instrument, making data entry quite easy and
eliminating the need for a data entry matrix to be filled in. These programs also have built-in
safeguards, so that, for example, alpha-numeric data cannot be entered into a variable that is
for numeric data only; data are constrained to a limited number of columns so that four digits
can't be entered into a three-digit variable; etc.
MODULE TWO
MODULE THREE
INFERENTIAL STATISTICS
In statistics, a univariate distribution is a probability distribution of only one random
variable. This is in contrast to a multivariate distribution, the probability distribution of a
random vector (consisting of multiple random variables).
One of the simplest examples of a discrete univariate distribution is the discrete uniform
distribution, where all elements of a finite set are equally likely. It is the probability model for
the outcomes of tossing a fair coin, rolling a fair die, etc. The univariate continuous uniform
distribution on an interval [a, b] has the property that all sub-intervals of the same length are
equally likely.
Other examples of discrete univariate distributions include the binomial, geometric, negative
binomial, and Poisson distributions. At least 750 univariate discrete distributions have been
reported in the literature. Examples of commonly applied continuous univariate
distributions include the normal distribution, Student's t distribution, chi-square distribution, F
distribution, exponential and gamma distributions.

A random variable is a characteristic, measurement, or count that changes randomly


according to some set of probabilities; its notation is X, Y, Z, and so on. A list of all possible
values of a random variable, along with their probabilities is called a probability distribution.
One of the most well known probability distributions is the binomial. Binomial means “two
names” and is associated with situations involving two outcomes: success or failure (hitting a
red light or not;
developing a side effect or not). This section focuses on the binomial distribution —when you
can use it, finding probabilities for it, and finding the expected value and variance.
Binomial Distribution
A binomial distribution can be thought of as simply the probability of a SUCCESS or
FAILURE outcome in an experiment or survey that is repeated multiple times. The binomial
is a type of distribution that has two possible outcomes (the prefix “bi” means two, or twice).
For example, a coin toss has only two possible outcomes: heads or tails and taking a test could
have two possible outcomes: pass or fail.

Characteristics of a Binomial
A random variable has a binomial distribution if all of following conditions are met:
1. There are a fixed number of trials (n).
2. Each trial has two possible outcomes: success or failure.
3. The probability of success (call it p) is the same for each trial.
4. The trials are independent, meaning the outcome of one trial does not influence that of any
other.
Let X equal the total number of successes in n trials; if all of the above conditions are met, X
has a binomial distribution with probability of success equal to p.
Checking the binomial conditions step by step
You flip a fair coin 10 times and count the number of heads.
Does this represent a binomial random variable? You can check by reviewing your responses
to the questions and statements in the list that follows:
1. Are there a fixed number of trials?
You’re flipping the coin 10 times, which is a fixed number. Condition 1 is met, and n = 10.
2. Does each trial have only two possible outcomes —success or failure?
The outcome of each flip is either heads or tails, and you’re interested in counting the number
of heads, so flipping a head represents success and flipping a tail is a failure. Condition 2 is
met.
3. Is the probability of success the same for each trial?
Because the coin is fair the probability of success (getting a head) is p = 1⁄2 for each trial. You
also know that 1 – 1⁄2 = 1⁄2 is the probability of failure (getting a tail) on each trial. Condition
3 is met.
4. Are the trials independent?
We assume the coin is being flipped the same way each time, which means the outcome of
one flip does not affect the outcome of subsequent flips. Condition 4 is met.

Finding Binomial Probabilities Using the Formula


After you identify that X has a binomial distribution (the four conditions are met), you will
likely want to find probabilities for X. The good news is that you do not have to find them
from scratch; you get to use previously established formulas for finding binomial
probabilities, using the values of n and p unique to each problem.
Probabilities for a binomial random variable X can be found using the formula ()
n p x (1− p)n−x
x
, where
✓ n is the fixed number of trials.
✓ x is the specified number of successes.
✓ n – x is the number of failures.
✓ p is the probability of success on any given trial.
✓ 1 – p is the probability of failure on any given trial. (Note: Some textbooks use the letter q
to denote the probability of failure rather than 1 – p.)
These probabilities hold for any value of X between 0 (lowest number of possible successes in
n trials) and n (highest number of possible successes).
The number of ways to arrange x successes among n trials is called “n choose x,” and the
notation is ()n
x ()
3
. For example , means, “3 choose 2” and stands for the number of ways to
2
get 2 successes in 3 trials. In general, to calculate “n choose x,” you use the formula

()n= n!
. The notation n! stands for n-factorial, the number of ways to rearrange n
x x ! ( n−x ) !
items. To calculate n!, you multiply n(n – 1)(n – 2) . . . (2)( 1). For example 3! is 3(2)(1) = 6;
2! is 2(1) = 2; and 1! is 1. By convention, 0! equals 1. To calculate “3 choose 2,” you do the
following:

()3= 3! 3∗2∗1 6
2 2 ! ( 3−2 ) !
= ( 2∗1 ) (1) 2
=3
Suppose you cross three traffic lights on your way to work, and the probability of each of
them being red is 0.30. (Assume the lights are independent.) You let X be the number of red
lights you encounter and you want to find the probability distribution for X. You know p =
probability of red light = 0.30; 1 – p = probability of a non-red light = 1 – 0.30 = 0.70; and the
number of non-red lights is 3 – X. Using the formula, you obtain the probabilities for X = 0, 1,
2, and 3 red lights:

The final probability distribution for X is shown in Table 4-1.


Notice they all sum to 1 because every possible value of X is listed and accounted for.
Table 4-1 Probability Distribution for X = Number of Red Traffic Lights (n = 3, p = 0.30)
X P(x)
0 0.343
1 0.441
2 0.189
3 0.027

Finding probabilities for X greater-than, less-than, or between two values


To find probabilities for X being less-than, greater-than, or between two values, just find the
corresponding values and add their probabilities. For the traffic light example where n = 3 and
p = 0.70, if you want P(X > 1), you find P(X = 2) + P(X = 3) and get 0.441 + 0.343 = 0.784.
The probability that X is between 1 and 3 (inclusive) is 0.189 + 0.441 + 0.343 = 0.973.
Two phrases to remember: “at-least” means that number or higher; “at-most” means that
number or lower. For example, the probability that X is at least 2 is P(X ≥ 2); the probability
that X is at most 2 is P(X ≤ 2).
The Expected Value and Variance of the Binomial
The mean of a random variable is the long-term average of its possible values over the entire
population of individuals (or trials). It is found by taking the weighted average of the x-values
multiplied by their probabilities. The mean of a random variable is denoted by μ. For the
binomial random variable, the mean is μp.
Example1: Suppose you flip a fair coin 100 times and let X be the number of heads; this is a
binomial random variable with n = 100 and p = 0.50. Its mean is np = 100(0.50) = 50.
The variance of a random variable X is the weighted average of the squared deviations
(distances) from the mean. The variance of a random variable is denoted by σ 2 . The variance
of the binomial distribution is σ 2 = np(1-p). The standard deviation of X is just the square root
of the variance, which in this case is σ = √ np(1− p) .
Example 2; Suppose you flip a fair coin 100 times and let X be the number of heads. The
variance of X is np(1 – p) = 100(0.50)(1 – 0.50) = 25, and the standard deviation is the square
root, which is 5.
The mean and variance of a binomial have intuitive meaning. The p is the probability of a
success, but it also represents the proportion of successes you can expect in n trials.
Therefore, the total number of successes you can expect — that is, the mean of X — equals
np. The only variability in the outcomes of each trial is between success (with probability p)
and failure (with probability 1 – p). Over n trials, it makes sense that the variance of the
number of successes/failures is measured by np(1 – p).

The Normal Distribution


There are two major types of random variables: discrete and continuous. Discrete random
variables count things (number of heads on 10 coin flips, number of female Democrats in a
sample, and so on). The most well known discrete random variable is the binomial. A
continuous random variable measures things and takes on values within an interval, or they
have so many possible values that they might as well be deemed continuous (for example,
time to complete a task, exam scores, and so on).
Basics of the Normal Distribution
We say that X has a normal distribution if its values fall into a smooth (continuous) curve with
a bell-shaped, symmetric pattern, meaning it looks the same on each side when cut down the
middle. The total area under the curve is 1. Each normal distribution has its own mean, and its
own standard deviation.
Figure 5-1 illustrates three different normal distributions with different means and standard
deviations.
Figure 5-1: Three normal distributions
Note that the saddle points (highlighted by arrows in Figure 5-1 on either side of the mean) on
each graph are where the graph changes from concave down to concave up. The distance from
the mean out to either saddle point is equal to the standard deviation for the normal
distribution. For any normal distribution, almost all its values lie within three standard
deviations of the mean.
The Standard Normal ( Z) Distribution
One very special member of the normal distribution family is called the standard normal
distribution, or Z-distribution. The types of problems when working with any normal
distribution.
The standard normal (Z ) distribution has a mean of zero and a standard deviation of 1; its
graph is shown in Figure 5-2. A value on the Z-distribution represents the number of standard
deviations the data is above or below the mean; these are called z-scores or z-values. For
example, z = 1 on the Z-distribution represents a value that is 1 standard deviation above the
mean. Similarly, z = –1 represents a value that is one standard deviation below the mean
(indicated by the minus sign on the z-value).
Figure 5-2: The Z-distribution has a mean of 0 and standard deviation of 1.
Because probabilities for any normal distribution are nearly impossible to calculate by hand,
we use tables to find them. All the basic results you need to find probabilities for any normal
distribution can be boiled down into one table based on the standard normal (Z) distribution.
This table is called the Z-table and is found in the appendix as Table A-1. All you need is one
formula to transform your normal distribution (X) to the standard normal (Z) distribution, and
you can use the Z-table to find the probability you need.
X−μ
The general formula for changing a value of X into a value of Z is Z= . You take your x-
σ
value, subtract the mean, and divide by the standard deviation; this gives you its
corresponding z-value.
For example, if X is a normal distribution with mean 16 and standard deviation 4, the value 20
on the X-distribution would transform into 20 – 16 divided by 4, which equals 1. So, the value
20 on the X-distribution corresponds to the value 1 on the Z-distribution. Now use the Z-table
to find probabilities for Z, which are equivalent to the corresponding probabilities for X. Table
A-1 (appendix) shows the probability that Z is less than any value between –3 and +3.

To use the Z-table to find probabilities, do the following:


1. Go to the row that represents the leading digit of your z-value and the first digit after
the decimal point.
2. Go to the column that represents the second digit after the decimal point of your z-
value.
3. Intersect the row and column.
That number represents P(Z < z).
For example, suppose you want to look at P(Z < 2.13). Using Table A-1 (appendix), find the
row for 2.1 and the column for 0.03. Put 2.1 and 0.03 together as one three-digit number to
get 2.13. Intersect that row and column to find the number: 0.9834. You find that P(Z < 2.13)
= 0.9834.

Finding Probabilities for X


Here are the steps for finding a probability for X:
1. Draw a picture of the distribution.
2. Translate the problem into one of the following:
P(X < a), P(X > b), or P(a < X < b). Shade in the area on your picture.
X−μ
3. Transform a (and/or b) into a z-value, using the Z-formula: Z = .
σ
4. Look up the transformed z-value on the Z-table (see the preceding section) and find its
probability.
5a. If you have a less-than problem, you’re done.
5b. If you have a greater-than problem, take one minus the result from Step 4.
5c. If you have a between-values problem, do Steps 1–4 for b (the larger of the two
values) and then for a (the smaller of the two values), and subtract the results.

You need not worry about whether to include an “equal to” in a less-than or greater-than
probability because the probability of a continuous random variable equaling one number
exactly is zero. (There is no area under the curve at one specific point.)
Suppose, for example, that you enter a fishing contest. The contest takes place in a pond
where the fish lengths have a normal distribution with mean = 16 inches and standard
deviation = 4 inches.
Problem 1: What is the chance of catching a small fish — say, less than 8 inches?
Problem 2: Suppose a prize is offered for any fish over 24 inches. What is the chance of
catching a fish at least that size?
Problem 3: What is the chance of catching a fish between 16 and 24 inches?
To solve these problems, first draw a picture of the distribution.
Figure 5-3 shows a picture of X’s distribution for fish lengths. You can see where each of the
fish lengths mentioned in each of the three fish problems falls.

Figure 5-3: The distribution of fish lengths in a pond.


Next, translate each problem into probability notation.
Problem 1 means find P(X < 8). For Problem 2, you want P(X > 24). And Problem 3 is asking
for P(16 < X < 24).

X−μ
Step 3 says change the x-values to z-values using the Z-formula Z = . For Problem 1 of
σ
8−16
the fish example, you have P(X<8) = P(Z< ) = P(Z < -2). Similarly for Problem 2, P(X
4
> 24) becomes P(Z > 2). Problem 3 translates from P(16 < X < 24) to P(0 < Z < 2). Figure 5-4
shows a comparison of the X-distribution and Z-distribution for the values x = 8,16, and 24,
which transform into z = –2, 0, and +2, respectively.

Figure 5-4: Transforming numbers on the normal distribution to numbers on the Z-


distribution.

Now that you have changed x-values to z-values, you move to Step 4 and find probabilities for
those z-values using the Z-table (Table A-1 in the appendix). In Problem 1 of the fish
example, you want P(Z < –2); go to the Z-table and look at the row for –2.0 and the column
for 0.00, intersect them, and you find 0.0228 — according to Step 5a you’re done. So, the
chance of a fish being less than 8 inches is equal to 0.0228.
For Problem 2, find P(Z > 2.00). Because it’s a “greater-than” problem, this calls for Step 5b.
To be able to use the Z-table you need to rewrite this in terms of a “less-than” statement.
Because the entire probability for the Z-distribution equals 1, we know P(Z > 2.00) = 1 – P(Z
< 2.00) = 1 – 0.9772 = 0.0228.
So, the chance that a fish is greater than 24 inches is 0.0228. (Note the answers to Problems 1
and 2 are the same because the Z-distribution is symmetric; see Figure 5-3.)
In Problem 3, you find P(0 < Z < 2.00); this requires Step 5c. First find P(Z < 2.00), which is
0.9772 from the Z-table, and then subtract off the part you don’t want, which is P(Z < 0) =
0.500
from the Z-table. This gives you 0.9772 – 0.500 = 0.4772. So the chance of a fish being
between 16 and 24 inches is 0.4772.

Normal Approximation to the Binomial


Suppose you flip a fair coin 100 times, and you let X equal the number of heads. What is the
probability that X is greater than 60? In previous section, you solve problems like this using
the binomial distribution. For binomial problems where n is small, you can either use the
direct formula or the binomial table (Table A-3 in the appendix). However, when n is large,
the calculations get unwieldy and the table runs out of numbers. What to do?
Turns out, if n is large enough, you can use the normal distribution to get an approximate
answer that’s very close to what you would get with the binomial distribution. To determine
whether n is large enough to use the normal approximation, two (not just one) conditions must
hold:
1. (n ∗ p) ≥ 10
2. n ∗ (1 – p) ≥ 10
In general, follow these steps to find the approximate probability for a binomial distribution
when n is large:
1. Verify whether n is large enough to use the normal approximation by checking the
two conditions.
For the coin-flipping question, the conditions are met since n ∗ p = 100 ∗ 0.50 = 50, and n ∗ (1
– p) = 100 ∗ (1 – 0.50) = 50, both of which are at least 10. So go ahead with the normal
approximation.
2. Write down what you need to find as a probability statement about X.
For the coin-flipping example, find P(X > 60).
X−μ
3. Transform the x-value to a z-value, using them Z-formula Z = .
σ
For the mean of the normal distribution, use μ= n ∗ p (the mean of the binomial), and for the
standard deviation σ, use √ np(1− p), use (the standard deviation of the binomial).

For the coin-flipping example, use = n ∗ p = 100 ∗0.50 = 50 and σ = √ np(1− p) =


√ 100∗0.50(1−0.50) = 5.
60−50
Now put these values into the Z-formula to get Z = = 2. Now find P(Z > 2).
5
4. Proceed as you usually would for any normal distribution.
That is, do Steps 4 and 5 described in the earlier section “Finding Probabilities for X.”
For the coin flips, P(X > 60) = P(Z > 2.00) = 1 – 0.9772 = 0.0228. The chance of getting more
than 60 heads in 100 flips of a coin is about 2.28 percent.
When you use the normal approximation to find a binomial probability, your answer is an
approximation (not exact), so be sure you state that. Also, show that you checked the
necessary conditions for using the normal approximation.

The Poisson distribution is the discrete probability distribution of the number of events
occurring in a given time period, given the average number of times the event occurs over that
time period.
A certain fast-food restaurant gets an average of 3 visitors to the drive-through per minute.
This is just an average, however. The actual amount can vary.
A Poisson distribution can be used to analyze the probability of various events regarding how
many customers go through the drive-through. It can allow one to calculate the probability of
a lull in activity (when there are 0 customers coming to the drive-through) as well as the
probability of a flurry of activity (when there are 5 or more customers coming to the drive-
through). This information can, in turn, help a manager plan for these events with staffing and
scheduling.
The Poisson distribution is applicable only when several conditions hold.
Conditions for Poisson Distribution:
 An event can occur any number of times during a time period.
 Events occur independently. In other words, if an event occurs, it does not affect the
probability of another event occurring in the same time period.
 The rate of occurrence is constant; that is, the rate does not change based on time.
 The probability of an event occurring is proportional to the length of the time period.
For example, it should be twice as likely for an event to occur in a 2 hour time period
than it is for an event to occur in a 1 hour period.
For example, the Poisson distribution is appropriate for modeling the number of phone calls
an office would receive during the noon hour, if they know that they average 4 calls per hour
during that time period.
 Although the average is 4 calls, they could theoretically get any number of calls
during that time period.
 The events are effectively independent since there is no reason to expect a caller to
affect the chances of another person calling.
 The occurrence rate may be assumed constant.
 It is reasonable to assume that (for example) the probability of getting a call in the first
half hour is the same as the probability of getting a call in the final half hour.
Of course, this situation isn't an absolute perfect theoretical fit for the Poisson distribution.
For instance, the office certainly cannot receive a trillion calls during the period, of time, as
there are less than a trillion people alive to be making calls. Practically speaking, the situation
is close enough that the Poisson distribution does a good job of modeling the situation's
behavior.
Probabilities with the Poisson Distribution
Given that a situation follows a Poisson distribution, there is a formula, which allows one to
calculate the probability of observing k events over a period of time for any non-negative
integer value of k.
Let XX be the discrete random variable that represents the number of events observed over a
given time period. Let λ be the expected value (average) of XX. If XX follows a Poisson
distribution, then the probability of observing k events over the time period is
k −λ
λ e
P(X=k) = , where e is Euler's number.
k!
Example: In the World Cup, an average of 2.5 goals are scored each game. Modeling this
situation with a Poisson distribution, what is the probability that k goals are scored in a game?
In this instance, λ=2.5. The above formula applies directly:
k −λ 0 −2.5
λ e 2.5 e
P(X=0) = = = 0.082
k! 0!
k −λ 1 −2.5
λ e 2.5 e
P(X=1) = = = 0.205
k! 1!
k −λ 2 −2.5
λ e 2.5 e
P(X=2) = = = 0.257
k! 2!
k −λ 3 −2.5
λ e 2.5 e
P(X=3) = = = 0.213
k! 3!
k −λ 4 −2.5
λ e 2.5 e
P(X=4) = = = 0.133
k! 4!
A fast food restaurant gets an average of 2.8 customers approaching the register every minute.
Assuming the number of customers approaching the register per minute follows a Poisson
distribution, what is the probability that 4 customers approach the register in the next minute?
Round your answer to 3 decimal places.
The Poisson distribution can be used to calculate the probabilities of "less than" and "more
than" using the rule of sum and complement probabilities.
Example: A statistician records the number of cars that approach an intersection. He finds that
an average of 1.6 cars approach the intersection every minute. Assuming the number of cars
that approach this intersection follows a Poisson distribution, what is the probability that 3 or
more cars will approach the intersection within a minute?
For this problem, .λ=1.6. The goal of this problem is to find P(X≥3), the probability that there
are 3 or more cars approaching the intersection within a minute. Since there is no upper limit
on the value of k, this probability cannot be computed directly. However, its complement
(X≤2), can be computed to give P(X≥3):
k −λ 0 −1.6
λ e 1.6 e
P(X=0) = = ~ 0.202
k! 0!
k −λ 1 −1.6
λ e 1.6 e
P(X=1) = = ~ 0.323
k! 1!
k −λ 2 −1.6
λ e 1.6 e
P(X=2) = = ~ 0.258
k! 2!
≡ P(X<=2) = P(X=0) + P(X=1) + P(X=2) ~ 0.783
≡ P(X>=3) = 1 - P(X<=2)
= 1 – 0.783 ~ 0.217
Therefore, the probability that there are 3 or more cars approaching the intersection within a
minute is approximately 0.217.
When a computer disk manufacturer tests a disk, it writes to the disk and then tests it using a
certifier. The certifier counts the number of missing pulses or errors. The number of errors in
a test area on a disk has a Poisson distribution with λ=0.2. What percentage of test areas have
two or fewer errors?
There are other applications of the Poisson distribution that come from more open-ended
problems. For example, it can be used to help determine the amount of staffing that is needed
in a call center.
Example: A call center receives an average of 4.5 calls every 5 minutes. Each agent can
handle one of these calls over the 5 minute period. If a call is received, but no agent is
available to take it, then that caller will be placed on hold. Assuming that the calls follow a
Poisson distribution, what is the minimum number of agents needed on duty so that calls are
placed on hold at most 10% of the time?
In order for all calls to be taken, the number of agents on duty should be greater than or equal
to the number of calls received. If X is the number of calls received and k is the number of
agents, then k should be set such that P(X > k) ≤0.1, or equivalently, .P(X ≤ k) > 0.9.
The average number of calls is 4.5, so λ=4.5:
k −λ 0 −4.5
λ e 4.5 e
P(X=0) = = ~ 0.011
k! 0!
k −λ 1 −4.5
λ e 4.5 e
P(X=1) = = ~ 0.050 ≡ P(X <1) ~0.061
k! 1!
k −λ 2 −4.5
λ e 4.5 e
P(X=2) = = ~ 0.112 ≡ P(X <2) ~ 0.173
k! 2!
k −λ 3 −4.5
λ e 4.5 e
P(X=3) = = ~ 0.169 ≡ P(X <3) ~ 0.342
k! 3!
k −λ 4 −4.5
λ e 4.5 e
P(X=4) = = ~ 0.190 ≡ P(X <4) ~ 0.532
k! 4!
k −λ 5 −4.5
λ e 4.5 e
P(X=5) = = ~ 0.171 ≡ P(X <5) ~ 0.703
k! 5!
k −λ 6 −4.5
λ e 4.5 e
P(X=6) = = ~ 0.128 ≡ P(X <6) ~ 0.831
k! 6!
k −λ 7 −4.5
λ e 4.5 e
P(X=7) = = ~ 0.082 ≡ P(X <7) ~ 0.913
k! 7!
If the goal is to make sure that less than 10% of calls are placed on hold, then 7 agents should
be on duty.
Properties of the Poisson Distribution
The expected value of a Poisson distribution should come as no surprise, as each Poisson
distribution is defined by its expected value.
Expected Value of Poisson Random Variable:
Given a discrete random variable X that follows a Poisson distribution with parameter λ, the
expected value of this variable is .E[X] = λ.

Variance of Poisson Random Variable:


Given a discrete random variable XX that follows a Poisson distribution with parameter ,λ, the
variance of this variable is Var[X] = λ.

HYPOTHESIS
Hypothesis testing is a decision-making process for evaluating claims about a population.
 We must define the population under study
 state the particular hypotheses that will be investigated
 give the significance level
 select a sample from the population
 collect the data
 perform the calculations required for the statistical test
 reach a conclusion.

Steps in Hypothesis Testing


 A Statistical hypothesis is a conjecture about a population parameter. This conjecture
may or may not be true.
 The null hypothesis, symbolized by H0, is a statistical hypothesis that states that there
is no difference between a parameter and a specific value or that there is no difference
between two parameters.
 The alternative hypothesis, symbolized by H1, is a statistical hypothesis that states a
specific difference between a parameter and a specific value or states that there is a
difference between two parameters.
Example 1
A medical researcher is interested in finding out whether a new medication will have any
undesirable side effects. The researcher is particularly concerned with the pulse rate of the
patients who take the medication.
 What are the hypotheses to test whether the pulse rate will be different from the
mean pulse rate of 82 beats per minute?
 H0: µ = 82 H1: µ ≠ 82
 This is a two-tailed test.
Example 2
A chemist invents an additive to increase the life of an automobile battery. If the mean
lifetime of the battery is 36 months, then his hypotheses are
 H0: µ ≤ 36 H1: µ > 36
 This is a right-tailed test.
Example 3
A contractor wishes to lower heating bills by using a special type of insulation in houses. If
the average of the monthly heating bills is ₦78, her hypotheses about heating costs will be
 H0: µ ≥ ₦78 H0: µ < ₦78
 This is a left-tailed test.

 A statistical test uses the data obtained from a sample to make a decision about whether or
not the null hypothesis should be rejected.
 The numerical value obtained from a statistical test is called the test value.
 In the hypothesis-testing situation, there are four possible outcomes.
 In reality, the null hypothesis may or may not be true, and a decision is made to reject or
not to reject it on the basis of the data obtained from a sample.

H0 True H0 False
Reject H0 Error Type I Correct decision
Do not Reject H0 Correct decision Error Type II
 A type I error occurs if one rejects the null hypothesis when it is true.
 A type II error occurs if one does not reject the null hypothesis when it is false.
 The level of significance is the maximum probability of committing a type I error. This
probability is symbolized by α (Greek letter alpha). That is, P(type I error)=α.
 P(type II error) = β (Greek letter beta).
 Typical significance levels are: 0.10, 0.05, and 0.01.
 For example, when α = 0.10, there is a 10% chance of rejecting a true null hypothesis.
 The critical value(s) separates the critical region from the noncritical region.
 The symbol for critical value is C.V.
 The critical or rejection region is the range of values of the test value that indicates that
there is a significant difference and that the null hypothesis should be rejected.
 The noncritical or non-rejection region is the range of values of the test value that indicates
that the difference was probably due to chance and that the null hypothesis should not be
rejected.
 A one-tailed test (right or left) indicates that the null hypothesis should be rejected when the
test value is in the critical region on one side of the mean.
 In a two-tailed test, the null hypothesis should be rejected when the test value is in either of
the two critical regions.

Large Sample Mean Test (z test)


 The z test is a statistical test for the mean of a population. It can be used when n ≥ 30,
or when the population is normally distributed and σ is known.
 The formula for the z test is given as
X−μ
z=
σ /√ n

where X = sample mean


μ = hypothesized population mean
σ = population deviation
n = sample size
Steps for hypothesis-testing:
 State the hypotheses and identify the claim.
 Find the critical value(s).
 Compute the test value.
 Make the decision to reject or not reject the null hypothesis.
 Summarize the results.
Example 1:
A researcher reports that the average salary of assistant professors is more than ₦42,000. A
sample of 30 assistant professors has a mean salary of ₦43,260. At α = 0.05, test the claim
that assistant professors earn more than ₦42,000 a year. The standard deviation of the
population is ₦5230.
Solution:
Step 1: State the hypotheses and identify the claim.
H0: µ ≤ ₦42,000 H1: µ > ₦42,000 (claim)
Step 2: Find the critical value. Since α = 0.05 and the test is a right-tailed test, the critical
value is z = +1.65.
Step 3: Compute the test value.
Step 3: z = [43,260 – 42,000]/[5230/√30] = 1.32.
Step 4: Make the decision. Since the test value, +1.32, is less than the critical value, +1.65,
and not in the critical region, the decision is “Do not reject the null hypothesis.”
Step 5: Summarize the results. There is not enough evidence to support the claim that
assistant professors earn more on average than ₦42,000 a year.

Example 2:
A national magazine claims that the average college student watches less television than the
general public. The national average is 29.4 hours per week, with a standard deviation of 2
hours. A sample of 30 college students has a mean of 27 hours. Is there enough evidence to
support the claim at α = 0.01?
Solution:
Step 1: State the hypotheses and identify the claim. H0: µ ≥ 29.4 H1: µ < 29.4 (claim)
Step 2: Find the critical value. Since α = 0.01 and the test is a left-tailed test, the critical value
is z = –2.33.
Step 3: Compute the test value.
z = [27– 29.4]/[2/√30] = – 6.57.
Step 4: Make the decision. Since the test value, – 6.57, falls in the critical region, the decision
is to reject the null hypothesis.
Step 5: Summarize the results. There is enough evidence to support the claim that college
students watch less television than the general public.

Example 3:
The Medical Rehabilitation Education Foundation reports that the average cost of
rehabilitation for stroke victims is ₦24,672. To see if the average cost of rehabilitation is
different at a large hospital, a researcher selected a random sample of 35 stroke victims and
found that the average cost of their rehabilitation is ₦25,226.
Solution:
The standard deviation of the population is ₦3,251. At α = 0.01, can it be concluded that the
average cost at a large hospital is different from ₦24,672?
Step 1: State the hypotheses and identify the claim. H0: µ = ₦24,672 H1: µ ≠ ₦24,672
(claim)
Step 2: Find the critical values. Since α = 0.01 and the test is a two-tailed test, the critical
values are z = –2.58 and +2.58.
Step 3: Compute the test value.
z = [25,226 – 24,672]/[3,251/√35] = 1.01.
Step 4: Make the decision. Do not reject the null hypothesis, since the test value falls in the
noncritical region.
Step 5: Summarize the results. There is not enough evidence to support the claim that the
average cost of rehabilitation at the large hospital is different from ₦24,672.

Small Sample Mean Test (t test)


 When the population standard deviation is unknown and n < 30, the z test is
inappropriate for testing hypotheses involving means.
 The t test is used in this case.
 Properties for the t distribution are
Formula for t test
X−μ
t=
s/ √ n
where X is sample mean
μ is hypothesized population mean
s = sample standard deviation
n = sample size
degrees of freedom = n-1

Example1:
A job placement director claims that the average starting salary for nurses is ₦24,000. A
sample of 10 nurses has a mean of ₦23,450 and a standard deviation of ₦400. Is there enough
evidence to reject the director’s claim at α = 0.05?
Solution:
Step 1: State the hypotheses and identify the claim. H0: µ = ₦24,000 (claim) H1: µ ≠
₦24,000 Step 2: Find the critical value. Since α = 0.05 and the test is a two-tailed test, the
critical values are t = –2.262 and +2.262 with d.f. = 9.
Step 3: Compute the test value. t = [23,450 – 24,000]/[400/√10] = – 4.35.
Step 4: Reject the null hypothesis, since – 4.35 < – 2.262.
Step 5: There is enough evidence to reject the claim that the starting salary of nurses is
₦24,000.

Confidence Intervals and Hypothesis Testing


Example 1:
Sugar is packed in 5-pound bags. An inspector suspects the bags may not contain 5 pounds. A
sample of 50 bags produces a mean of 4.6 pounds and a standard deviation of 0.7 pound. Is
there enough evidence to conclude that the bags do not contain 5 pounds as stated, at α =
0.05? Also, find the 95% confidence interval of the true mean.
Solution:
 H0: µ = 5 H1: µ ≠ 5 (claim)
 The critical values are +1.96 and – 1.96
4.6−5.0
 The test value is z = = -4.04
7 / √50
 Since – 4.04 < –1.96, the null hypothesis is rejected.
 There is enough evidence to support the claim that the bags do not weigh 5 pounds.

The 95% confidence interval for the mean is given by


Notice that the 95% confidence interval of µ does not contain the hypothesized value µ = 5.
Hence, there is agreement between the hypothesis test and the confidence interval.

Using the z or t test


MODULE FOUR
BIVARIATE DISTRIBUTIONS
What Is a Bivariate Distribution?
When one measurement is made on each observation, univariate analysis is applied. If more
than one measurement is made on each observation, multivariate analysis is applied.
In this section, we focus on bivariate analysis, where exactly two measurements are made on
each observation. The two measurements will be called X and Y . Since X and Y are obtained
for each observation, the data for one observation is the pair (X, Y ).

A bivariate distribution, put simply, is the probability that a certain event will occur when
there are two independent random variables in your scenario. For example, having two bowls,
each filled with two different types of candies, and pulling one candy from each bowl gives
you two independent random variables, the two different candies. Since you are pulling one
candy from each bowl at the same time, you have a bivariate distribution when calculating
your probability of ending up with particular kinds of candies.
Some examples:
– Height (X) and weight (Y ) are measured for each individual in a sample.
– Stock market valuation (X) and quarterly corporate earnings (Y ) are recorded for each
company in a sample.
– A cell culture is treated with varying concentrations of a drug, and the growth rate (X) and
drug concentration (Y ) are recorded for each trial.
– Temperature (X) and precipitation (Y ) are measured on a given day at a set of weather
stations.
Be clear about the difference between bivariate data and two sample data. In two sample data,
the X and Y values are not paired, and there aren’t necessarily the same number of X and Y
values.
Two-sample data:
Sample 1: 3,2,5,1,3,4,2,3
Sample 2: 4,4,3,6,5
What Does It Look Like?
So, what does a bivariate distribution look like? Such a distribution actually doesn't have a
standard look. You can create a table with these distributions or you can list each probability
out one by one. In any case, you always have two independent random variables in any given
scenario.
Here is what a bivariate distribution looks like in table form.
This bivariate distribution shows you the probability of picking red or blue candies from a red
bowl and a blue bowl if you pick one candy from each bowl and there are an equal number of
red and blue candies in each bowl.
MODULE FIVE
POINT AND INTERVAL ESTIMATES
Estimation theory
Estimation theory is a branch of statistics that deals with estimating the values of parameters
based on measured/empirical data that has a random component.
An estimate is a single value that is calculated based on samples and used to estimate a
population value.
An estimator is a function that maps the sample space to a set of estimates.
The entire purpose of estimation theory is to arrive at an estimator, which takes the sample as
input and produces an estimate of the parameters with the corresponding accuracy.

There are two types of estimators


 Point estimator
 Interval estimator

 Point Estimator
 A point estimator is a statistic (that is, a function of the data) that is used to infer the value of
an unknown parameter in a statistical model.
 A point estimate is one of the possible values a pointer estimator can assume.
 Mathematically, suppose there is a fixed parameter θ that needs to be estimated and X is a
random variable corresponding to the observed data. Then an estimator of θ, usually denoted
by the symbol θ^, is a function of the random variable X, and hence itself a random variable θ
^(X).
 A point estimate for a particular observed dataset (i.e. for X = x) is then θ ^(x), which is a
fixed value.
MODULE SIX
MATHEMATICAL EXPECTATION
Mathematical expectation, also known as the expected value, is the summation or
integration of possible values from a random variable. It is also known as the product of the
probability of an event occurring, denoted P(x), and the value corresponding with the actual
observed occurrence of the event. The expected value is a useful property of any random
variable. Usually notated as E(X), the expect value can be computed by the summation
overall the distinct values that the random variable can take. The mathematical expectation
will be given by the mathematical formula as, E(X) =? (x 1p1, x2p2, …, xnpn), where x is a
random variable with the probability function, f(x), p is the probability of the occurrence, and
n is the number of all possible values in the case. The mathematical expectation of an
indicator variable can be zero if there is no occurrence of an event A, and the mathematical
expectation of an indicator variable can be one if there is an occurrence of an event A. Thus, it
is a useful tool to find the probability of event A.

Properties and Assumptions:


The first property is that if X and Y are the two random variables, then the mathematical
expectation of the sum of the two variables is equal to the sum of the mathematical
expectation of X and the mathematical expectation of Y, provided that the mathematical
expectation exists. In other words, E(X+Y)=E(X)+E(Y).

The second property is that the mathematical expectation of the product of the two random
variables will be the product of the mathematical expectation of those two variables, provided
that the two variables are independent in nature. In other words, E(XY)=E(X)E(Y).
The generalization of this property states that the mathematical expectation of the product of
the n number of independent random variables is equal to the product of the mathematical
expectation of the n independent random variables.

The third property states that the mathematical expectation of the product of a constant and
the function of a random variable is equal to the product of the constant and the mathematical
expectation of the function of that random variable provided that their mathematical
expectation exists. The third also states that the mathematical expectation of the sum of a
constant and the function of a random variable is equal to the sum of the constant and the
mathematical expectation of the function of that random variable provided that their
mathematical expectation exists. In other words, E(a *f(X))=a E(f(X)) and
E(a+f(X))=a+E(f(X)), where a is a constant and f(X) is the function.
The fourth property states that the mathematical expectation of the sum of the product
between a constant and the function of a random variable and the other constant is equal to the
sum of the product between the constant and the mathematical expectation of the function of
that random variable and the other constant provided that their mathematical expectation
exists. In other words, E(aX+b)=aE(X)+b, where a and b are constants.
The fifth property states that the mathematical expectation of the linear combination of the
random variables is equal to the sum of the product between the ‘n’ constant and the
mathematical expectation of the ‘n’ number of variables. In other words, E(?aiXi)=? ai E(Xi).
Here, ai, (i=1…n) are constants.

Example 1
A casino is considering a dice game that would pay the winner of the game $10. The game is
similar to craps, the participant would roll two fair, 6-sided dice and if they sum to 7 or 11,
the participant wins; otherwise they lose. What is the expected payout the casino will make as
each game is played?
Solution:
One first needs to identify the probability distribution f(x) of the sum of two dice.
Below is a table that identifies these probabilities:
Sum 2 3 4 5 6 7 8 9 10 11 13
Probability 1 1 1 1 5 1 5 1 1 1 1
36 18 12 9 36 6 36 9 12 18 36
Since the casino stands to lose $10 each time a contestant roles a 7 or 11, the mathematical
expected value (or expected cost) of this game to the casino is:
1 1 1 1 5 1 5 1 1 1 1
x0+ x0+ x 0 + x 0 + x 0 + x 10 + x 0 + x 0 + x 0 + x 10 +
36 18 12 9 36 6 36 9 12 18 36
x 0 = $2.22. the expected value of this game is $2.22

Variance, covariance, and correlation


The variance of a random variable X is a measure of how spread out it is. Are the values of X
clustered tightly around their mean, or can we commonly observe values of X a long way
from the mean value? The variance measures how far the values of X are from their mean, on
average.
Definition: Let X be any random variable. The variance of X is
Var(X) = E(X − μX)2
= E(X2) – (E(X))2.
The variance is the mean squared deviation of a random variable from its own mean.
If X has high variance, we can observe values of X a long way from the mean.
If X has low variance, the values of X tend to be clustered tightly around the mean value.

Example2: Let X be a continuous random variable with p.d.f.

Find E(X) and Var(X).


Example3
Investing in Sharma Furniture Co. has a 60% chance of resulting in a ₦10,000 gain and a 40%
chance of resulting in a ₦3,000 loss. What is the expected value of investing in Sharma
Furniture Co.?
E(X) = .6 · 10000 + .4 · −3000 = 6000 − 1200 = 4800
Thus, the expected value is ₦4,800.

Example 4:
In a contest sponsored by 7Up Bottling co, you win a prize if the cap on your bottle of
minerals says “WINNER”; however, you may only claim one prize. Eager to win, you blow
all your savings on minerals; as a result, you have a 0.05% chance of winning ₦1,000,000, a
1% chance of winning ₦20,000, and a 90% chance of winning ₦10. Ignoring the money you
spent on minerals, what is your expected value and standard deviation?

First, we calculate the mean.


μ = 1000000 · 0.0005 + 0.01 · 20000 + 0.9 · 10 = 500 + 200 + 9 = 709
Now, the variance.
σ2= 10000002 · 0.0005 + 200002 · 0.01 + 102 · 0.9 − 7092
= 500000000 + 4000000 + 90 − 502681
= 503497409
Finally, the standard deviation.
σ = √ 503497409 = 22438
The standard deviation is over ₦22,000! Although the expected value looks nice, there is a
good chance that you will get a not-so-good amount of money.

You might also like