PREFACE
This instructional material, From data to decisions: A Practical
Guide to Engineering Data Analysis, was developed to guide
engineering students in understanding and applying statistical methods
to real-world engineering problems. As future engineers, students must
not only know how to compute but also how to interpret and use data
to make informed, evidence-based decisions.
Because of my almost eight years of teaching ES 214 ( Engineering
Data Analysis) at the College of Engineering, University of Eastern
Philippines, I have been both motivated and deeply interested in
creating this instructional material. Over the years, I have observed the
challenges students face in understanding statistical concepts and
applying them to engineering contexts. This experience has inspired
me to design a resource that bridges the gap between theory and
application — one that presents concepts in simple and clear language,
explains procedures step-by-step, and uses practical engineering
examples that connect learning directly to professional practice.
The lessons in this material cover both fundamental and advanced
topics — from descriptive statistics and probability distributions to
hypothesis testing, correlation, and regression analysis — all designed
to build students’ confidence and analytical thinking. Step-by-step
solutions, clear explanations, and worked problems are provided to
help learners apply statistical tools effectively in design, quality control,
research, and decision-making.
This work is dedicated to my students at the College of Engineering,
University of Eastern Philippines, whose curiosity and determination
continue to inspire me. It is my hope that this material will serve not
only as a course reference but also as a companion in your journey
toward becoming analytical, critical-thinking, and solution-driven
engineers.
Merewina Llanie A. Tapong
I. THE INTRODUCTION
Learning Outcomes:
At the end of the topic, students should be able to:
1. demonstrate an in-depth understanding of the key concepts,
principles, symbols, techniques and procedures in statistical
analysis
2. apply appropriate statistical approaches to analyze and interpret
data effectively.
1.1 Statistics and its Definition
The term Statistics refers to the word “data” in a general
sense but it also refers to the statistical techniques which are
concerned on the collection, organization, presentation, analysis,
interpretation and drawing conclusions from the data.
There are several reasons why we should study statistics.
Among the most important reasons are the following:
1. Knowledge in statistics helps us use the proper methods to collect
the data, employ the correct analyses, and effectively present the
results. Statistics is a crucial process behind how to make
discoveries in science, make decisions based on data and make
predictions.
2. Another reason to study statistics are to be able to effectively
conduct research, to be able to read and evaluate journal articles
to further develop critical thinking and analytical skills.
3. We may have to make decisions based on the data and
information of statistical studies such as what product to purchase
based on consumer studies, how much budget should be allotted
by a company for advertisement expense etc.
1
1.2 Descriptive and Inferential Statistics
The approach to statistical analysis involves two aspects (1)
the collection of numerical information in terms of a set of numbers
called data for a particular phenomena to be studied, and (2) the
drawing together of these data into meaningful relationship/theories.
Statistics are essentially of two main branches: (1) descriptive
statistics and (2) inferential statistics. Descriptive statistics consists
of the collection, organization, presentation and analysis of data.
These aims to summarize raw data of any size or value. These also
facilitate accurate description of an observation and also
comparison. Thus, descriptive statistics aims in ordering and
summarizing a given set of data without any direct reference to any
interference may be drawn otherwise.
If the sample is drawn from a total set of observations, some
method is required to draw conclusions about the characteristics of
the total population from the characteristics of the sample. The
statistics of drawing such inferences from the numerical data is
known an the inferential statistics. Thus inferential statistics
consists of higher degree of analysis, interpretation and inferences.
1.3 Variables and Types of Data
A variable is a characteristics of a population or sample which
makes one different from the other. It is a quantity that can be
counted. A variable may also be called a data item. Age, sex,
business income and expenses, place of birth, capital expenditures,
class grades and vehicle type are examples of variables.
There are different ways variables can be described according
to the ways they can be studied, measured and presented.
2
Numeric Variables
Numeric variables have values that describe a measurable
quantity as a number, like “how many” or “how much”. Therefore
numeric variable are quantitative variables.
Numeric variables may be further described as either
continuous or discrete
➢ A continuous variable is one foe which all values are possible
including fraction, within the total range of data. Examples of
continuous variable includes height, time , age, rainfall and
temperatures.
➢ A discrete variable is one for which measurements are in
whole units or integers only ( including zero). It cannot the
value of a fraction between one value and the next closest
value. Examples of discreet variables include the number of
registered cars, number of business locations, number of
persons in the household, number of children in the family. All
of which measured as whole units.
Categorical Variables
Categorical variables have values that describe a “quality” or
characteristics of a data unit, like “what type” or “which category”.
Categorical variables are qualitative variables and tend to be
represented by a non- numeric value.
Categorical variables may be further measured and described
as nominal and ordinal:
➢ Nominal scale is the most elementary form of measurement
where data exist only in the form of categories in terms of
present or absent, male or female, rural or urban, religion, and
brand.
3
➢ Ordinal scale. At this level we have sufficient information not
only to establish differences between objects but also to place
our data in rank order either individually or in classes.
Example of categorical variables include academic grades
(i,e. 75, 80, 85,), clothing size (i,e. small, medium, large, extra
large) and attitudes (i,e. strongly agree, agree, disagree,
strongly disagree).
Types of Variable Flowchart
4
II. COLLECTION, ORGANIZATION AND
PRESENTATION OF DATA
Learning Outcomes
By the end of this lesson, students should be able to:
1. Identify various methods of data collection and presentation.
2. Differentiate between probability and non-probability sampling
techniques.
3. Organize raw data into a frequency distribution table.
4. Illustrate data using different types of graphical presentations.
2.1 Data Collection
Data collection is important in statistics since it gives the raw
data needed for research, analysis, and decision making. It serves
as the basis for creating meaningful insights, drawing conclusions,
and making evidenced-based decisions for individuals, companies,
and organizations. For instance, in our choice of career or partner
in life, we make decisions based on the data and information that
we have gathered.
Data may be gathered in two (2) types. Primary data is the
first-hand information collected by a researcher. It is collected for
the first time, original and more reliable. For example, the population
census conducted by the government every 3 years is primary data.
Secondary data on the other hand , refers to the second hand
information. It is not originally collected and rather obtained from
already published or unpublished source like newspapers, journals
and magazines. For instance, a reporter who goes directly to the
crime scene to interview the victim and witnesses around has
gathered a primary data, while the readers who read the news item
of the scene have received the secondary data.
Here are some of the most common data collection method:
5
1. Interview Method. This is a direct method of data collection . It
is simply a process in which the interviewers asks questions and
the interviewee responds to them. It provides a high degree of
flexibility because questions can be adjusted and changed
anytime according to the situation.
2. Survey and Questionnaire Method. This method provide a
broad perspective from large groups of people. They can be
conducted face-to-face, mailed, or even posted on the internet to
get respondents from anywhere in the world. The answers can be
yes or no, true or false, multiple choice, and even open -ended
questions. However , a drawback of surveys and questionnaires
is delayed response and the possibility of ambiguous answers.
6
3. Registration Method. This method of collecting data is governed
by our existing laws. The researcher gather data from offices
concerned, e.g. the Philippine Statistics Authority (PSA), the
Commission on Election (COMELEC), Municipal/City Hall or
Barangay Offices. The PSA takes care of keeping the complete
records of birth and death of the population. The COMELEC takes
care of the list of registered voters.
4. Observation Method. In this method, researchers observe a
situation around them and record the findings. It can be used to
evaluate the behavior or different people in controlled (everyone
knows they are being observed) and uncontrolled (no one knows
they are being observed) situations. This method is highly
effective because it is straightforward and not directly dependent
on other participants.
7
5. Experimental method. This method of data collection involves the
manipulation of the samples by applying some form of treatment
prior to data collection. It refers to manipulating one variable to
determine its changes on another variable.
6. Focus Groups. This is similar to an interview, but it is conducted
with group of people who all have something in common. The data
collected is similar to in-person interviews, but they offer a better
understanding of why a certain group of people thinks in particular
way. However some drawbacks of this method are lack of privacy
and domination of the interview by one or two participants. Focus
groups can also be time-consuming and challenging, but they
help reveal some of the best information for complex situation.
7.
8
2.2 Determining the Sample Size
Most surveys conducted are done on a sample basis because
of time and cost involve if the population is used. Sample size is a
research term used for defining the number of individuals included
in a research study to represent the population. The sample size
references the total number of respondents included in the study,
and the number is often broken down into subgroups by
demographic such as age, gender and location so that the total
sample achieves represents the entire population.
Determining the appropriate sample size is one of the most
important factors in statistical analysis . If the sample size is too
small, it will not yield valid results or adequately represents the
realities of the population being studied. On the other hand, while
larger sample size yield small margin of errors and are more
representative, a sample size that is too large may significantly
increase the cost and time to conduct the research.
Slovin’s Formula id used to calculate the sample size
necessary to achieve a certain confidence interval when sampling a
population. This formula is used when you don’t have enough
information about a population’s behavior to otherwise know the
appropriate sample size.
𝑁
n = (1+ 𝑁𝑒 2 )
where : N = population size
e = margin of error
Margin of error is the error we expect to commit in getting the
sample. If for instance we want to conduct a survey on the average
income of the families in the province of Northern Samar, then we
can only probably use 5 municipalities. This is due to the difficulty of
obtaining data on the income of families from all the municipalities
of the province. Hence, we cannot avoid having an error in the
results of the study since we are using only a sample of the
population.
9
Example 1
An Statistics student is conducting an inquiry regarding the
reaction of the students from the College of Engineering of a certain
university to the recent tuition fee increase. If there are 3,500
engineering students and the research wants to have a 99%
accuracy, then determine the sample size that should be taken as
respondents.
Solution to the Problem.
a. Determine the value of the Population N from the problem,
N = 3,500 (engineering students population)
b. Determine the value of the Margin of Error, “e”, to have a 99%
accuracy ( 100% - 99% )
e = 0.01
c. Substitute the value of “N” and “e” in the Slovin’s Formula
𝑁 3500 3500 3500
n = (1+ 𝑁𝑒 2 ) = 1+(3500 𝑥 0.012 ) = 1+(0.35) = = 2592.59
1.35
Therefore , the sample size “n” that should be taken as respondents = 2593
engineering students
2.3 Sampling Techniques
A sample should not be selected in haphazard way
because the information obtained from the study might be
unbelievable and unrealistic. When you conduct research about a
group of people , it’s rarely possible to collect data from every
person in the group. Instead, you select sample. The sample is the
group of individuals who will actually participate in the research. To
draw valid conclusions from your results, you have to carefully
decide how to select sample that is representative of the group as
10
a whole. This is called a sampling technique. There are two
primary types of sampling methods that you can use in your
research.
➢ Probability Sampling means that every member of the
population has a chance of being selected. It is mainly used in
quantitative research. If you want to produce results that are
representative of the whole population, probability sampling
techniques or random sampling technique are the most valid
choice.
Among the types of probability sampling techniques:
1. Simple random Sampling. In this type of random
sampling, every member of the population has an equal
chance of being selected. Example is the lottery sampling.
Each member of the population is numbered on a piece of
paper. This piece of paper shall be identical (equal in size and
weight) and rolled evenly. They are placed in a lottery box and
shaken very well. The desired number of samples are drawn
one after the other.
11
2. Systematic Sampling. This is similar to simple random
sampling, but it is slightly easier to conduct. Every member
of the population is listed with a number, but instead of
randomly generating numbers, individuals are chosen at
regular intervals. Example, there are 1000 (N) employees
in the University of Eastern Philippines and 50 samples are
needed. We divide 1000 by 50 and obtained n =20. We
then select one number from 1-20 by lottery. If the number
6 happens to come out, then the first sample is 6. The
second sample is 6 + n = 20 and so on. The process is
continued and you end up with a sample of 50.
3. Stratified Sampling. To use this sampling technique, you
divide the population into subgroups called strata, based
on relevant characteristics (e.g. gender identity, age range,
income bracket… If the desired sample is 50 and there are
10 subgroups, then we obtained the sample proportional
from each subgroup. Then you use random or systematic
sampling to select a sample from each group. Example, the
University of Eastern Philippines has 800 female
employees and 200 male employees. You want to ensure
that the sample reflects the gender balance of the
company, so you sort the population into strata based on
the gender. Then you use random sampling on each group,
12
selecting 80 women and 20 men which gives you a
representative sample of 100 people.
4. Cluster Sampling. This involves dividing the population
into subgroups, but each subgroup should have similar
characteristics to the whole sample. Instead of sampling
individual from each subgroup, you randomly select entire
subgroups. This is sometimes called area sampling
because it is used for large population. For instance, a
certain company has 10 offices in the cities across the
country (all with the same number of employees in similar
roles). You don’t have the capacity to travel to every office
to collect your data, so you use random sampling to select
3 cities- these are your clusters.
13
➢ Non – Probability Sampling – means individual are selected
on non-random criteria and not every individual has a chance
of being included. This is easier and cheaper, but it has a
higher risk of sampling bias, and therefore not reliable such as
those sample drawn by research base on their own
judgement.
Among the type of non-probability sampling:
1. Convenience Sampling. This is used because it is
convenient to the researcher. A convenience sample
simply includes the individual who happen to be most
accessible to the researcher. Convenience samples are at
risk for both sampling bias and selection bias. Example, a
researcher may find out which hair shampoo is the most
popular among households by making phone calls using
the phone numbers found in the telephone directory. While
the data may easily be obtained, the accuracy of the data
may not be reliable since not all households have
telephone connections.
2. Purposive Sampling. This type of sampling is also known
as judgement sampling, involves the researcher using their
expertise to collect a sample that is most useful to the
purpose of the researcher. The researcher usually gets this
14
sample from the respondents purposely related or close to
him. For instance, you want to know more about the
opinions and experiences of disabled students at your
university, so you purposefully select a number of students
with different support needs in order to gather a varied
range of data on their experiences with student services.
3. Quota Sampling. This sampling relies on the non-random
selection of a predetermined number of proportion. In this
method, the researcher uses the proportion of different
strata; and from the strata, selection are done using quota.
Quota sampling is quick, easy and inexpensive way to get
survey result. The drawback is that because of the lack of
randomization, there is a greater potential for survey bias.
For example, a store determines its customer base of 1000
is comprised of 600 women and 400 men. Sample based
on proportion. The quota size should be representative of
the collective subgroup population. In the example above,
he should select 60 women and 40 men.
15
2.4 Presentation of Data.
As soon as the data collection is over, the researcher needs
to find a way of presenting the data in a meaningful, efficient and
easily understood way to identify the main features of the data at a
glance using a suitable presentation method. Generally, the data in
statistics can be presented in three different ways, such as textual
method, tabular method, and graphical method.
1. Textual Method. Also called the paragraph method , is
used to present purely qualitative data or if there are only
few numerical data. This method is desirable and effective
when data are presented in paragraph form using small
columns like those in newspaper. One has to read through
the whole text in order to understand and comprehend the
main point of the data. For example, there are 50 students
in a class, among them, 30 are boys and 20 are girls. This
is the data that can be understood with the help of a simple
text and no table or pie diagram is required for the same.
2. Tabular Method. Statistically, tables are effective devices
of presenting both qualitative and quantitative data. It is a
systematic and logical arrangement of data in the form of
rows and tables with respect to the characteristics of the
data. The table can be used conveniently to make
comparison and draw relationship between and among the
variables. It presents the data in a simple form, save space,
facilitate comparison, facilitate statistical analysis and
reduce chances of error.
Among the most commonly used tabular method is
Frequency Distribution Table. This is a way to organize
data so that it makes data more meaningful. Data requires
to be organized and summarized for carrying out statistical
analysis. Thus the first step in a statistical analysis of a set
of raw data often consists of frequency distribution in a form
16
of a frequency distribution table. This involves grouping the
data on the basis of class intervals or class limits , specified
by the lowest and highest values in the frequency table.
To construct a frequency distribution table, the following
rules shall be followed:
1. The general rule should be the number of classes
k = 1 + 3.3 log n, where n is the total number of samples.
2. There should be no overlapping of samples in the class
intervals.
3. Include all classes. A class interval with no frequency and
located between the first and last class interval should be
included.
4. There should be enough classes to accommodate all the
data.
5. The classes must equal in size, except when the class are
open ended such as the classes below.
75 and below
76-80
81-85
86-90
91 and above
Example : Construct a frequency distribution table of the set
of data below on the percentage of students of the total
population of the 54 universities of the Philippines.
63.5 31.5 26.6 33.5 35.0 30.4
58.0 30.5 27.3 32.3 53.5 35.2
51.5 54.4 27.5 30.0 51.7 32.4
45.4 56.9 27.1 28.7 53.9 30.2
40.1 32.7 28.6 32.7 61.5 28.7
38.8 34.8 27.8 34.2 59.4 26.8
33.5 29.8 27.4 33.2 34.4 27.9
28.1 28.7 29.4 31.9 31.8 29.6
29.2 27.6 27.3 31.2 31.3 29.0
17
Solution:
Step 1. Find the Range
From the above data it is found that the highest value
HV =63.5 and the lowest value LV = 26.6, the range therefore
is :
Range (R) = HV – LV
= 63.5 – 26.6
= 36.9
Step 2. Determine the number of classes (usually between
5 and 20). The number of classes has be rounded
up to the whole number.
The number of classes can be
k = 1 + 3.3 log n
= 1 + 3.3 log 54
k = 6.7≈ 7
Step 3. Find the class width or class size
𝑅 36.9
Class size (c) = = = 5.50
𝑘 6.7
Step 4. Select a starting point. It must be equal or lower
than the smallest value
Table 1. Frequency Distribution Table of Percentage of Students
> cumulative < cumulative Percentage
frequency frequency Relative
Class interval Tally Frequency
Frequency
26.6 - 32.0 //\//, //\//, //\//, //\//, //\//,//// 29 29 54 53.7
32.1 - 37.5 //\//, //\//, // 12 41 25 22.2
37.6 - 43.0 // 2 43 13 3.7
43.1 - 48.5 / 1 44 11 1.9
48.6 - 54.0 //// 4 48 10 7.4
54.1 - 59.5 //// 4 52 6 7,4
59.6 - 65.0 // 2 54 2 3.7
TOTAL 54 100
18
3. Graphical Method. This are visual tools for presenting
statistical data in an organized, easily interpretable manner.
They help simplify complex datasets, reveal patterns, trends,
and distributions, and make comparisons easier. Graphs
enhance communication of data findings, making them
essential tools for analysis, reporting, and decision-
making.Below are common graphical methods, how to use
them, and their concepts:
3.1. BAR GRAPH
The purpose of this graph is to compare categories of
data using rectangular graph. Each bar represents a
category. The length/height of the bar corresponds to the
value or frequency of the category. This is used for
discrete or categorical data. Can be vertical or
horizontal and allows easy comparison between groups.
Example Scenario:
You conducted a survey asking 100 people about their
favorite type of fruit. The results are as follows:
Categories Values
Apple 30
Bananas 25
Oranges 20
Grapes 15
Mangoes 10
19
3.2. Histogram.
The purpose of his graph is to represent the
frequency distribution of continuous data. This divides data
into intervals and is ideal for showing data distribution (e.g.,
normal distribution, skewness
Example Scenario
You conducted a test for 50 students and recorded their
scores out of 100. The scores are as follows:
Scores:
45, 56, 67, 48, 90, 72, 65, 59, 82, 78, 91, 66, 47, 64, 68, 74,
80, 87, 92, 55, 60, 70, 62, 77, 85, 95, 40, 58, 63, 75, 52, 61,
83, 69, 71, 88, 50, 57, 53, 81, 79, 76, 89, 54, 49, 46, 73, 84,
93, 51.
➢ Divide the data into continuous interval
Minimum score: 40
Maximum score: 95
Range: 95 - 40 = 55
Suggested interval: 10 (you can adjust based on preference).
Interval Frequency
40-49 7
50-59 10
60-69 10
70-79 8
80-89 9
90-99 6
20
3.3. Pie Chart
The purpose of this graph is to show proportions of a
whole. This divide a circle into slices, where each slice
represents a category. The size of each slice is proportional
to the percentage or frequency of the category. This
graph is best for showing relative contributions.
Example Scenario for a Pie Chart:
A class of 40 students participated in a school election. The
number of votes received by each candidate is as follows:
Candidate A: 16 votes
Candidate B: 12 votes
Candidate C: 8 votes
Candidate D: 4 votes
Candidate Frequency(Votes) Percentage
Candidate 1 16 40
Candidate 2 12 30
Candidate 3 8 20
Candidate 4 4 10
Total 40 100
21
Percentage Vote Received by Each
Candidate
Candidate 1 Candidate 2 Candidate 3 Candidate 4
3.4. Line Graph
The purpose of the line graph is to display trends over
time or continuous variables. To use this you have to plot
data points on the graph and connect the points with a line.
This is useful for observing changes, patterns and
trends and commonly used in time-series data.
Example Scenario for a Line Graph:
You are tracking the temperature of the municipality of
Catarman over a week to observe trends. Below are the
daily temperature readings (in degrees Celsius):
Day 1 (Monday): 25°C
Day 2 (Tuesday): 28°C
Day 3 (Wednesday): 30°C
Day 4 (Thursday): 27°C
Day 5 (Friday): 26°C
Day 6 (Saturday): 29°C
Day 7 (Sunday): 31°C
22
Temperature Reading of the Week
35
30
25
20
15
10
5
0
Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7
Temperature Column2
General Concepts of Using Graphical Methods:
1. Clarity: Choose a method that simplifies interpretation
and avoids clutter.
2. Accuracy: Ensure scales and proportions accurately
reflect data.
3. Relevance: Match the graph type to the data and
research objectives.
4. Accessibility: Label axes, include a legend (if
necessary), and use consistent scales.
5. Interpretation: Use graphs to draw attention to key
findings or trends.
6. Comparison: When comparing datasets, use consistent
scales and formats.
23
III. MEASURES OF CENTRAL TENDENCY (Mean,
Median and Mode)
Learning Outcomes
By the end of this lesson, students should be able to:
1. Explain the concept and importance of measures of central tendency
in statistics.
2. Differentiate among the mean, median, and mode, and describe their
characteristics.
3. Compute the mean, median, and mode for both ungrouped and
grouped data.
4. Analyze real-life datasets to determine the most appropriate measure
of central tendency.
5. Evaluate the strengths and limitations of each measure, considering
the impact of outliers.
6. Solve practical and theoretical problems involving measures of
central tendency.
3.1 Mean
The mean, often referred to as the average, is one of the
most widely used measures of central tendency in statistics. It
provides a single value that represents the center or typical
value of a dataset, offering a quick summary of the data.
The mean is calculated by summing all the values in a
dataset and dividing this total by the number of observations.
For example, if a dataset consists of the numbers 5, 10, and
15, the mean would be:
5+10+15
Mean= = 10
3
This measure is particularly useful when all data points are of
equal importance and the dataset is symmetrically distributed. It
helps identify the "balance point" of the data, making it a powerful
tool for comparing groups or trends.
24
Characteristics of the Mean
1. Sensitive to Every Value
The mean considers all values in the dataset, making it a
comprehensive summary of the data. However, this also makes it
sensitive to outliers (extremely high or low values), which can skew
the mean and make it less representative of the central tendency.
2. Applicability Across Fields
The mean is used in various disciplines, including economics,
education, and science, to analyze data such as average
income, test scores, and experimental results.
3. Ease of Interpretation
As a single number, the mean is easy to interpret and
communicate, making it ideal for presenting findings to diverse
audiences.
The mean is a fundamental statistical concept that simplifies
complex data into a single, interpretable value. By
understanding its calculation, uses, and limitations, one can
effectively apply it to summarize and analyze datasets in a variety
of real-world contexts.
Formula for Computing the Mean
1. Mean for Ungrouped Data
The formula for the mean ( 𝑥̅ ) of ungrouped data is:
∑𝑥
𝑥̅ = 𝑛
Where: ∑x = the sum of all data values.
n = the number of data values.
2. Mean for Grouped Data
The formula for the mean (𝑥̅ ) of grouped data is:
25
∑ 𝑓𝑥
𝑥̅ = ∑𝑓
Where:
f = the frequency of each class interval.
x = the midpoint of each class interval.
∑fx = the sum of the products of frequencies and midpoints.
∑f = the total frequency.
Example 1: Ungrouped Data
A student scored the following marks in 5 subjects:
Marks: 85,90,78,92,88
Solution:
∑𝑥
𝑥̅ = 𝑛
85+90+78+92+88
𝑥̅ = 5
The mean score is 86.6.
Example 2: Grouped Data
A frequency table of test scores is given below:
Class Interval Frequency (f)
40-49 5
50-59 8
60-69 10
70-79 6
80-89 4
Total 33
26
Solution:
1. Find the midpoints (x) of each class interval:
𝑙𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡+𝑢𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡
x= 2
40+49
For example : x = = 44.5
2
Class Interval frequency (f) midpoint (x) fx
40-49 5 44.5 222.5
50-59 8 54.5 436
60-69 10 64.5 645
70-79 6 74.5 447
80-89 4 84.5 448
Total 33 2088.5
2. Compute the mean
∑ 𝑓𝑥
𝑥̅ = ∑𝑓
2088.5
= = 63.6
33
The mean score is 63.5
3.2 Median
The median is a measure of central tendency that represents
the middle value of a dataset when it is arranged in ascending or
descending order. Unlike the mean, the median is not influenced by
extreme values, making it particularly useful for datasets that
contain outliers or are skewed.
Median of Ungrouped Data
• For an odd number of data points, the median is the exact
middle value. For example, in the dataset 3,7,9, the median is 7,
as it is the middle number when the data is ordered.
• For an even number of data points, the median is calculated as
the average of the two middle values. For instance, in the dataset
4,6,8,10, the median is:
27
𝟔+𝟖
Median = =7
𝟐
Median in Grouped Data
In grouped data, the median is determined using the
median class, which is the class interval that contains the
middle value of the cumulative frequency distribution. The
formula for the median is:
𝑛
+𝐶𝐹
2
Median = L + ( )
𝑓
Where:
• L= Lower boundary of the median class.
• N= Total frequency.
• CF Cumulative frequency before the median class.
• f Frequency of the median class.
• h Width of the class interval.
Characteristics of the Median
1. Resistant to Outliers:
Since the median depends only on the order of data values, it is
unaffected by extreme values that could distort the mean.
2. Represents the Center of a Distribution:
The median divides the dataset into two equal parts, with 50% of
the data lying below it and 50% above.
3. Applicable Across Various Scenarios:
The median is widely used in fields such as economics (e.g.,
median income), healthcare (e.g., median survival time), and
education (e.g., median test scores).
The median is a valuable measure of central tendency,
especially for analyzing skewed data or distributions with outliers.
By providing the "middle value," it offers insights into the central
location of data, ensuring a fair representation even when
extreme values are present.
28
Example of Computing Median
1. For Ungrouped Data
Find the median of the following dataset:
8, 12, 15, 9, 11
Solution:
1. Arrange the Data in Ascending Order:
8,9,11,12,15
2. Count the Total Number of Observations (n):
n=5 (odd number of data points)
3. Find the Median Position:
The median is the middle value when n is odd:
4. Identify the Median Value:
The 3rd value is the middle value in the ordered data set is 11
Media = 11
2. Grouped Data
A teacher recorded the scores of 50 students in a
mathematics test and organized them into a frequency
distribution table as follows:
Cumulative
Score Interval Frequency (f) Frequency
(CF)
40 – 49 4 4
50 – 59 6 10
60 – 69 10 20
70 – 79 15 35
80 – 89 9 44
90 – 99 6 50
Question:
Using the given frequency distribution table, determine the
median score.
29
Step-by-Step Solution
1. Find the median class:
⚫ The total number of students is n = 50.
⚫ The median position is at n/2 = 50/2 = 25.
⚫ Locate the cumulative frequency (CF) where 25 is
found or first exceeded.
⚫ Looking at the CF column, 35 (corresponding to the
class 70–79) is the first cumulative frequency greater
than 25.
⚫ So, the median class is 70–79.
2. Identify the values needed for the median formula:
𝒏
+𝑪𝑭
𝟐
Median = L + ( )
𝒇
⚫ L = Lower boundary of median class = 69.5 (Since class interval is 70–
79, the lower boundary is 70 – 0.5 = 69.5)
⚫ n = Total number of students = 50
⚫ CF = Cumulative frequency before median class = 20
⚫ f = Frequency of median class = 15
⚫ h = Class width = 10 (Since the intervals are 40–49, 50–59, etc.)
⚫ Modal class
3. Substituting values
25 – 20
Median = 69.5 + ( ) x 10
15
5
= 69.5 +( 15 ) x 10
= 69.5 + 3.33
= 72.83
30
4. Final Answer :
The median score is 72.83
3.2 Mode
The mode is the value that appears most frequently in a
dataset. In grouped data, the modal class is the class interval with
the highest frequency. Since the mode is not directly observable in
grouped data, we use an interpolation formula to estimate it.
Mode of the Grouped Data:
𝑓1 – 𝑓0
Mode = L + ( (2𝑓 – 𝑓0 – 𝑓2
)xh
1
Where:
• L = Lower boundary of the modal class
• f₁ = Frequency of the modal class
• f₀ = Frequency of the class before the modal
class
• f₂ = Frequency of the class after the modal class
• h = Class width
• Modal class: The class interval with the highest
frequency in the frequency distribution table.
Characteristics of the Mode
1. If the dataset has a single mode, it is called unimodal.
2. If it has two modes, it is bimodal.
3. If it has more than two, it is multimodal.
4. If all values appear with similar frequency, the data is uniform
and has no mode.
Example Problem:
A teacher recorded the test scores of 60 students and
organized them into a frequency distribution table:
31
Score Interval Frequency (f)
30 – 39 5
40 – 49 8
50 – 59 12
60 – 69 20
70 – 79 10
80 – 89 5
Step-by-Step Solution:
1. Identify the modal class
⚫ The class with the highest frequency is 60–69 (frequency f₁ = 20).
2. Identify the values for the formula
⚫ L = Lower boundary of modal class = 59.5 (60 – 0.5)
⚫ f₀ = Frequency before modal class = 12
⚫ f₁ = Frequency of modal class = 20
⚫ f₂ = Frequency after modal class = 10
⚫ h = Class width = 10
3. Apply the formula
20 – 12
Mode = 59.5 + ( ) x 10
2 𝑥 20 – 12 – 10
8
= 59.5 + ( 40−22) x 10
= 59.5 + (0.444) x 10
= 63.94
4. Final Answer of the dataset is 63.94
32
IV. FUNDAMENTALS OF PROBABILITY
Learning Outcomes
By the end of this lesson, students should be able to:
1. Define probability and explain its importance in
real-world applications.
2. Describe sample space, events, and their
relationships.
3. Apply counting techniques in probability problems.
4. Use probability rules to compute probabilities of
different events.
4.1 Sample Space and Relationship Among Events
4.1.1 Probability is a measure of how likely an event is to occur.
It is expressed as a number between 0 and 1, where 0
means the event is impossible and 1 means the event is
certain.
4.1.2 Sample Space and Events.
• Sample Space (S) is the set of all possible outcomes of
an experiment.
Example : Tossing a coin --- S = [ Heads, Tails ]
• Events (E) is a subset of the sample space
Example : Getting Heads in a coin toss --- E = [Head]
4.1.3 Types of Events
• Mutually Exclusive Events. Events that cannot
happen at the same time.
Example : Rolling a die and getting a 3 or a 5 (Cannot
be both)
33
• Independent Events. Events where the outcome of
one does affect the other.
Example : Flipping a coin and rolling a die
• Complementary Events. Events where one event
occurring means the other cannot occur.
Example : If A is the event of rolling a 6, its complement
A’ is rolling anything except 6
4.2 Counting Rules Useful in Probability
4.2.1 Fundamental Counting Principles. If an event can occur
in m ways and another can occur in n ways, then the
total ways both can occur is : m x n
Example : If you have 3 shirts and 2 pants, the number
of outfits you can make : 3 x 2 = 6
4.2.2 Permutations ( Ordered Arrangements). The number of
ways to arrange n items when order matters is:
𝑛!
P(n,r) = (𝑛−𝑟)!
Example : Arranging 3 letters out of 5
5! 5! 5𝑥4𝑥3
P(5,3) = (5−3)! = 2! = = 30 ways
2𝑥1
4.2.3 Combination ( Selection without Order). The number
of ways to choose r items from n when order NOT
matter is :
𝑛!
C(n,r) = 𝑟! 𝑛−𝑟 )!
(
Example: How many number of ways in choosing 3
students from a group of 5.
5! 5! 5𝑥4𝑥3𝑥2𝑥1 5𝑥4
C(5,3) = 3!(5−3)! = 3!(2)! = 3𝑥2𝑥1(2𝑥1) =2𝑥1 = 10 ways
34
4.3 Rules of Probability
4.3.1 Probability of a Single Event. If an event E has favorable
outcomes and the sample space 5 has total outcomes,
then :
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑟𝑎𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
PE = 𝑇𝑜𝑡𝑎𝑙 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
Example: Rolling a die and getting a 4
1
P(4) = 6
4.3.2 Addition Rule (for “OR” Events). If A and B are two
events, then :
P(A∪B) = P(A) +P(B) – P(A∩B)
Example: Rolling a die and getting a 3 OR an even
number
1 3
P(3) = 6 , P(even) = 6
P(3∩even) = 0
1 3 4 2
P(3∪even) = 6 + 6 = 6 = 3
4.3.3 Multiplication Rule (for “AND” Events). If A and B are
independent events, then :
P(A∩B) = P(A) x P(B)
Example: Tossing a coin and rolling a die
1 1
P(H) = 2 , P(5) = 6
1 1 1
P(H∩5) = 2 x 6 = 12
4.3.4 Complement Rule. The probability that an event does not
occur is :
P(A’) = 1 – P(A)
Example : If P(rain) = 0,3, then P(no rain) = 1-0.3=0.
35
V. DISCRETE PROBABILITY DISTRIBUTIONS
Learning Outcomes :
By the end of this lesson, students should be able to:
1. Define discrete random variables and construct their
probability distribution.
2. Interpret and compute cumulative distribution functions
3. Calculate expected values, variance and standard
deviation of discrete random variables.
4. Apply binomial and Poisson distribution in solving real
world problems.
5.1 Random Variables and Their Probability Distributions.
5.1.1Random Variable (RV): A variable whose values depend on
the outcomes of a random experiment.
5.1.1.1 Discrete Random Variables : This takes countable
values (e,g., 0,1,2….)
5.1.1.2 Continuous Random Variables: This takes
uncountable values within an interval
Probability Distribution Table
x(Number of Heads in 2 P(x)
Tosses)
0 0.25
1 0.50
2 0.25
Properties:
• 0 ≤ P(x) ≤ 1
• ∑ 𝑃(𝑥) = 1
Example: Two fair coins are tossed at the same time. Let the
random variable x represent the number of heads
that appear.
Requirement :
1. List the sample space
2. Define the random variable x
3. Construct the probability distribution x
4. verify that the distribution is valid
5. Compute the expected value E(x)
36
Solutions:
1. Sample Space (S) :
S = {𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇 }
There are 4 equally likely outcomes when tossing two
coins.
2. Define the random variable (x)
Let x = the number of heads observed
__________________________________
Outcome x(Number of Heads)
HH 2
HT 1
TH 1
TT 0
___________________________________
3. Probability Distribution Table:
We now determine the probability of each
value of x
x P(x)
0 1 outcome – TT – ¼ = 0.25
1 2 outcome – HT,TH- 2/4 =0.50
2 1 outcome – HH – ¼ = 0.25
Probability Distribution
P(x=0) = 0.25
P(x=1) = 0.50
P(x=2) = 0.25
4. Validity Check
• All probability are between 0 and 1
• Sum of all probabilities
= 0.25 + 0.50 + 0.25
= 1.0 (probability distribution is
valid)
37
5. Expected Value E(x)
E(x) = ∑ 𝑥 . P(x) =(0)(0.25)+1(0.50)+2(0.5)= 1.0
The expected number of heads when tossing two
coins is 1
5.2 Cumulative Distribution Functions (CDF). The cumulative
distribution function gives the probability that the
random variable x is less than or equal to a value x.
F(x) = P (x ≤ 𝑥)
Example CDF Table
x P(x) F(x) = P (x ≤ 𝑥)
0 0.25 0.50
1 0.50 075
2 0.25 1.00
5.3 Expected Values and Variance.
5.3.1 Expected value ( mean). This is E(x) = ∑ 𝑥 . P(x)
5.3.2 Variance and Standard Deviation:
Var(x) = ∑ 𝑥 2 . P(x) - [E(x)]2
Example : From the above table
E(x) = ∑ 𝑥 . P(x) =(0)(0.25)+1(0.50)+2(0.5)= 1.0
5.4 The Binomial Distribution. This is used when there is a fixed
number trials(n); two outcomes (success/failures);
independent trial and a constant probability of success
(p)
𝑛
P(x =r) =( 𝑟 )pr (1-p)n-r
Where : r = outcomes
n = fixed number of trials
p = constant probability of success
38
Example :
An engineer tests electronic components from a
production line. Based on past data, the probability that
a component is defective is 0.1 (or 10%). If the engineer
randomly selects 5 components, what is the probability
that exactly 2 components are defective?
Solution:
This is a binomial experiment because:
• There are a fixed number of trials: n=5n = 5n=5
• Each trial has only two outcomes: defective
(success) or not defective (failure)
• The probability of success (defective component)
is constant: p=0.
• The trials are independent
Using the Binomial Probability Formula
𝑛
P(x =r) =( 𝑟 )pr (1-p)n-r
Where: n = 5, r = 2, p= 0.1
5
P(x =2) =( )(0.1)2 (1-0.1)5-2
2
P(x= 2) = 0.07029
Answer: The probability that exactly 2 out of 5
components are defective is 0.0729 or 7.29%.
5.5 The Poisson Distribution
The Poisson Distribution is a discrete probability
distribution that models the number of times an event occurs in a
fixed interval of time, area, volume, or distance, given that the
events occur independently and at a constant average rate. Poisson
Distribution is used when you are counting number of events in a
fixed interval (time, area, volume). Also when the event occurs
randomly and independently.
39
Probability Mass Function (PMF):
𝑒 −𝜆 𝜆𝑘
P(X=k)=
𝑘!
Where:
• P(X=k): probability of observing k events in the interval
• λ: average number of occurrences in the
interval(mean)
• e: Euler’s number (approximately 2.71828)
• k: number of events (0, 1, 2, …)
• k! : factorial of k
Characteristics of Poisson Distribution
• Discrete: The variable takes on whole number values
(0, 1, 2, …)
• Events are independent
• The mean and variance of a Poisson distribution are
both equal to λ
• Appropriate when events happen rarely but at a
constant average rate
Example :
A machine produces metal parts, and on average, 2 defective
parts are found per hour. What is the probability that exactly
3 defective parts are found in an hour?
Given: λ=2 , k=3
Using the formula
𝑒 −𝜆 𝜆𝑘
P(X=k)=
𝑘!
𝑒 −2 23 (0.1353)(8)
P(x =k) = = = 0.1804
3! 6
So, there’s an 18.04% chance of finding exactly 3 defective
parts in one hour.
40
VI. TEST OF HYPOTHESIS FOR A SINGLE SAMPLE
Learning Outcomes :
By the end of this lesson, students should be able to:
1. define hypothesis testing, differentiate between null and
alternative hypotheses, and explain the significance of Type I and
Type II errors in engineering decision-making.
2. apply the Z-test and t-test to compare population means,
determine when to use each test, and interpret the results in
engineering and scientific studies.
3. use Analysis of Variance (ANOVA) to compare means across
multiple groups, assess statistical significance, and apply the
method in engineering experiments and quality control.
4. perform the Chi-square test to analyze categorical data, test for
independence and goodness-of-fit, and interpret results in real-
world engineering applications.
6.1 Hypothesis Testing
In engineering, making data-driven decisions is
essential—whether it's improving product quality, testing new
designs, or evaluating system performance. Hypothesis
testing is a statistical method that allows engineers to make
objective decisions based on sample data. It provides a
structured way to determine whether observed results are due
to random chance or if they reflect true differences or effects.
What is Hypothesis Testing?
Hypothesis testing is a process of making inferences or
judgments about a population parameter based on sample
data. It involves formulating two competing hypotheses and
using statistical evidence to decide which one is more likely
to be true.
Null Hypothesis (H0) vs. Alternative Hypothesis (H1)
• The null hypothesis (H0) is a statement of no effect,
no difference, or status quo. It assumes that any
observed variation is purely due to chance.
41
Example: The mean tensile strength of steel rods is
500 MPa.
• The alternative hypothesis (H1) is a statement that
contradicts the null hypothesis. It suggests that there is
an effect or a significant difference.
Example: The mean tensile strength is not 500 MPa
(i.e., it has changed due to a new
manufacturing process).
In hypothesis testing, we assume H0 is true and use
statistical evidence to decide whether we should reject it in
favor of H1.
Types of Errors in Hypothesis Testing
Since decisions are based on sample data (not the
entire population), there's always a risk of making a wrong
conclusion. These risks are classified into Type I and Type II
errors:
Type I Error (α\alphaα): Rejecting the null hypothesis when
it is actually true.
Implication: In engineering, this could mean rejecting a
reliable design or manufacturing process based
on misleading sample data.
Example: You conclude that a new batch of materials is
defective when it is actually within acceptable
limits.
Type II Error (β\betaβ): Failing to reject the null hypothesis
when the alternative is actually true.
Implication: This could result in missing a real problem,
such as accepting a weak material that should
have been rejected.
Example: You accept that two machines produce the same
quality output when one is actually
underperforming.
42
Z-Test in Hypothesis Testing
A Z-test is a statistical method used to determine
whether there is a significant difference between sample data
and a population parameter (mean or proportion), or between
two population means/proportions, when the population
variance is known or the sample size is large (n ≥ 30).
When to use Z-test
Use Z-test if all of these conditions are met:
1. The population standard deviation (σ) is known.
2. The sample size is large (n ≥ 30), or the population is
normally distributed.
3. The data are quantitative and randomly sampled.
Types of Z-test
▪ One-sample Z-test for mean
▪ Two-sample Z-test for comparing means
▪ Z-test for proportions (one or two samples)
One-sample Z-test for mean formula :
𝑋̅ − 𝜇
Z= 𝜎
√𝑛
Where:
• 𝑋̅ = sample mean
• μ = population mean
• σ = population standard deviation
• n = sample size
Example :
A mechanical engineer claims the average thickness of a
machine part is 10 mm. A sample of 36 parts has a mean
thickness of 9.6 mm. The population standard deviation is 1.2
mm. At a 0.05 level of significance, test the claim.
Given : x
̅ = 9.6 mm, n = 36
μ = 10 mm
σ = 1.2 mm
43
Solution:
Step 1. Hypothesis testing
Null hypothesis, Ho: 𝜇 = 10
Alternative hypothesis, Ha: 𝜇 ≠ 10
Step 2. Compute the Z-statistics using the formula:
𝑋̅− 𝜇
Z= 𝜎
√𝑛
9.6−10 −0.4
Z= 1.2 = 0.2
√36
Z = - 2.0
Step 3. Critical Value (𝜶= 0.05, two-tailed)
Zcritical = ± 1.96
Step 4. Conclusion
Since Z = - 2.0 < 1.96, reject Ho
Interpretation: The average thickness is significantly
different from 10 mm.
Two-sample Z-test for comparing mean formula:
̅̅̅1̅+ ̅̅̅
𝑥 𝑥2̅
Z=
𝜎 2𝜎 2
√ 1+ 2
𝑛1 𝑛2
Where: ̅̅̅̅
𝑋1 = mean of sample 1
̅̅̅̅ = mean of sample 2
𝑋2
𝜎1 = population standard deviation of sample 1
𝜎2 = population standard deviation of sample 2
n1 = sample size of the first sample
n2 = sample size of the second sample
44
Example :
Two different suppliers provide aluminum rods used in
aircraft construction. An engineer wants to compare the
average tensile strength of rods from Supplier A and
Supplier B.
A random sample is taken from each supplier:
Supplier A:
Sample size (n1) = 40
Sample mean ( ̅̅̅̅
𝑋1) = 310 MPa
Population standard deviation (𝜎1 ) = 15 MPa
Supplier B:
Sample size (n2) = 40
Sample mean ( ̅̅̅̅
𝑋2) = 310 MPa
Population standard deviation (𝜎2 ) = 15 MPa
At a 0.05 level of significance, test if there is a significant
difference in the mean tensile strengths.
Solution :
Step 1 : State the hypothesis
Null hypothesis, H0: μ1=μ2 (no difference in mean tensile
strengths)
Alternative hypothesis, Ha: μ1≠μ2 (there is a difference) — two-
tailed test
Step 2. Use the formula :
̅̅̅1̅+ 𝑥
𝑥 ̅̅̅2̅
Z=
𝜎2 𝜎2
√ + 2
1
𝑛1 𝑛2
Substitute the values:
45
310+ 305 5
Z=
2 2
=Z=
225 324
√15 +18 √ +
40 50 40 50
5 5 5
Z= = =
√5.625+6.48 √12.105 3.48
Z ≈ 1.44
Step 3. Critical Value
For a two-tailed test at α = 0.05, the critical Z-values are:
Zcritical = ± 1.96
Step 4. Conclusion
Since Z=1.44 is within the range of −1.96< Z <1.96, we
fail to reject the null hypothesis.
There is no significant difference in the average
tensile strength of aluminum rods from the two suppliers at the
0.05 significance level.
Z- test for Proportions (one or two samples)
Z – test for Proportions Formula:
𝑝̆−𝑝
Z=
𝑝(1−𝑝)
√
𝑛
Where : 𝑝̆ = sample proportion
p = hypothesized population proportion
n = sample size
Example :
An electronics manufacturer claims that no more than 5% of
products are defective. In a recent batch of 200 items, 15 were
found defective. Test the claim at the 0.05 level.
46
Solution :
Step 1. Hypotheses testing
Null hypothesis, Ho: p = 0.05
Alternative hypothesis, Ha: p > 0.05
Step 2. Compute the Z-statistic
Using the formula :
𝑝̆−𝑝
Z=
𝑝(1−𝑝)
√
𝑛
15
𝑝̆ = 200 = 0.075
0.075−0.05 0.025
Z= = ≈ 1.62
0.05(1−0.05) 0.0154
√
200
Step 3. Critical Value (right-tailed test, α = 0.05)
Zcritical = 1.645
Step 4. Conclusion
Since Z=1.62 < 1.645 fail to reject Ho
Interpretation: There is no significant evidence that the
defect rate exceeds 5%.
t-Test in Hypothesis Testing
In many engineering applications, professionals make
inferences about a population based on sample data. When
the population standard deviation is unknown and the sample
size is small (n < 30), the t-Test is used. The t-Test is a
valuable statistical tool in engineering for determining if
differences in sample means are statistically significant.
Choosing the correct type of t-Test and following the
47
hypothesis testing procedure ensures reliable decisions
based on sample data.
Types of t-Test
1. One-Sample t-Test:
Used to compare the sample mean to a known or
hypothesized population mean.
One -Sample t-test formula:
𝑥̅ −𝜇
t= 𝑠
√𝑛
where:
𝑥̅ : sample
mean
μ: hypothesized population mean
s: sample standard deviation
n: sample size
Example :
A quality control engineer is testing whether a new type of
cement mix meets the required compressive strength of 30
MPa. A random sample of 10 concrete cylinders yielded the
following compressive strengths (in MPa):
28.5, 30.2, 29.8, 27.9, 31.1, 30.0, 29.5, 30.3, 28.9, 29.7
At α = 0.05, can we conclude that the average compressive
strength of the new mix is different from the required 30 MPa?
Solution:
Step 1. State the hypothesis
Null Hypothesis (H₀): μ = 30 (The population mean is
30MPa)
Alternative Hypothesis (H₁): μ ≠ 30 (The population mean
is not 30 MPa)
48
This is a two-tailed test.
Step 2. Compute the Test Statistics
Using the formula :
𝑥̅ −𝜇
t= 𝑠
√𝑛
Step 2.1 Find the sample mean 𝑥̅
28.5+30.2+29.8+27.9+31.1+30.0+29.5+30.3+28.9+29.7
𝑥̅ =
10
𝑥̅ = 29.6
Step 2.2. Find the sample standard deviation
First compute the squared difference
x x – 𝑥̅ (x – 𝑥̅ )2
28.5 -1.1 1.21
30.2 0.6 0.36
29.8 0.2 0.04
27.9 -1.7 2.89
31.1 1.5 2.25
30.0 0.4 0.16
29.5 -0.1 0.01
30.3 0.7 0.49
28.9 -0.7 0.49
29.7 0.1 0.01
total 7.91
𝑥)2
∑(𝑥− ̅̅̅ 7.91 7.91
s2 = 𝑛−1
= 10−1 = 9
= 0.879
s = √0.879 = 0.938
Step 2.3 Compute t-statistics
𝑥̅ −𝜇 29.6−30 −0.4
t= 𝑠 = 0.938 = 0.2967 = -1.348
√𝑛 √10
49
Step 3. Determine the Critical Value
Degrees of freedom:
df = n – 1 = 10 – 1 = 9
From the t-table, the critical t-value for a two-tailed test at
α = 0.05 and df = 9 is:
tcritical = ± 2.262
Step 4. Decision Rule
If t<−2.262 or t >2.262, reject H₀.
The computed t = –1.348 lies within the acceptance region
(–2.262 < t < 2.262), so: we fail to reject H0
Step 5. Conclusion
At a 5% level of significance, there is not enough evidence
to conclude that the average compressive strength of the
new cement mix is different from 30 MPa.
2. Independent Two- Sample Test:
This is used when comparing the means of two unrelated
groups
Independent Two-Sample Test Formula(Equal Variance)
𝑥̅ 1 + 𝑥̅ 2
t= 1 1
√𝑠𝑝2 (𝑛 +𝑛 )
1 2
where 𝑠𝑝2 is the pooled variance
(𝑛1−1)𝑠12+(𝑛2−1)𝑠22
𝑠𝑝2 = 𝑛1 +𝑛2 −2
50
Example :
An engineer wants to determine if two different welding
methods produce significantly different tensile strengths of
steel joints. Two independent samples were tested:
Method A Tensile Method B Tensile
Strength (MPa) Strength (MPa)
552 545
548 538
563 549
559 541
549 536
555 542
n1 = 6 n2 = 6
Assume that the population variances are equal, and test
at α = 0.05 if the two methods produce significantly different
results.
Solution :
Step 1. State the Hypotheses
H0 (null): 𝜇1 = 𝜇2 (The two welding methods produce the
same mean tensile strength)
Ha (alternative) : 𝜇1 ≠ 𝜇2 ( The mean tensile strength are
different)
Step 2. Compute the Test statistics
We will use the formula for the t-statistics with pooled
variance.
𝑥̅ 1 + 𝑥̅ 2
t= 1 1
√𝑠𝑝2 (𝑛 +𝑛 )
1 2
where 𝑠𝑝2 is the pooled variance
(𝑛1−1)𝑠12+(𝑛2−1)𝑠22
𝑠𝑝2 = 𝑛1 +𝑛2 −2
51
Method A Tensile Method B Tensile
Strength (MPa) Strength (MPa)
552 545
548 538
563 549
559 541
549 536
555 542
∑ 𝐴= 3,326 ∑ 𝐵= 3,251
𝑥𝑎 = 554.33
̅̅̅ 𝑥𝑏 = 541.83
̅̅̅
Find the squared deviation from the mean
Method A Method B
x1 (x1 – ̅̅̅)
𝑥𝑎 (x1 – ̅̅̅)
𝑥𝑎 2 x2 ̅̅̅)
(x2 -𝑥 𝑏 ̅̅̅)
(x2 -𝑥 𝑏
2
552 - 2.33 5.43 545 3.17 10.04
548 - 6.33 40.07 538 - 3.83 14.67
563 8.67 74.17 549 7.17 51.41
559 4.67 21.81 541 - 0.83 0.69
549 - 5.33 28.41 536 - 5.83 33.99
555 0.67 0.44 542 0.17 0.0289
Total 170.33 110.83
𝑥)2
∑(𝑥− ̅̅̅ 170.33
𝑠12 = = = 34.066
𝑛−1 6−1
𝑠1 = 5.836
𝑥)2
∑(𝑥− ̅̅̅ 110.83
𝑠22 = = = 22.166
𝑛−1 6−1
𝑠2 = 4.708
Compute the pooled variance using the formula
(𝑛1−1)𝑠12+(𝑛2−1)𝑠22
𝑠𝑝2 = 𝑛1 +𝑛2 −2
(6−1)(34.066)+(6−1)(22.166) 5(34.066)+5(22.166)
𝑠𝑝2 = =
6+6−2 10
52
170.33+110.83
𝑠𝑝2 = = 28.116
10
Compute the t-statistics for two-sample test using:
𝑥̅ 1 + 𝑥̅ 2 554.33− 541.83
t= 1 1
= 1 1
√𝑠𝑝2 (𝑛 +𝑛 ) √28.116 ( + )
1 2 6 6
12.5
t = 3.061 = 4.083
Step 3. Determine the critical value
Degree of Freedom, df = n1 + n2 – 2 = 6 + 6 – 2 = 10
From t-table, critical value at 𝛼 = 0.05 (two tailed) and df =10:
tcritical = ± 2.228
Step 4. Decision Rule:
Since the computed t-value 4.083 > 2.228, we reject the null
hypothesis.
Step 5. Conclusion
There is sufficient evidence at the 5% level of significance to
conclude that the two welding methods produce significantly
different tensile strengths.
Analysis of Variance (ANOVA) in Hypothesis Testing
Analysis of Variance (ANOVA) is a statistical method
used to compare the means of three or more independent
groups to determine if at least one group mean is significantly
different from the others.
In engineering, ANOVA is commonly used in quality
control, process optimization, and design of experiments to
analyze how different factors or treatments affect outcomes.
ANOVA is used when you want to compare three or
more group means; the data are quantitative (numerical) and
53
collected from independent samples, and the populations are
assumed to be normally distributed with equal variances.
Types of ANOVA
Type Purpose Example in
Engineering
One-Way ANOVA Tests differences among Testing if the mean
means of one factor strength of a material
(independent variable) differs across three curing
temperatures
Two-Way ANOVA Tests the effects of two Studying how machine
factors and their type and operator skill
interaction affect production output
Repeated Measures Used when the same Measuring engine
ANOVA subjects are tested under emissions before, during,
different conditions and after a modification
For most undergraduate engineering courses, One-Way ANOVA is the
primary focus.
One-Way ANOVA
Let’s say we want to compare the mean breaking
strength of a wire across 3 different manufacturers.
The core idea is to partition the total variability in the data
into two parts:
• Between-group variability: How different the group means
are from the overall mean
• Within-group variability: The variability among observations
within each group
Total Variation=Variation Between Groups+Variation Within
Groups
We then form an F-ratio
𝑴𝒆𝒂𝒏 𝑺𝒒𝒖𝒂𝒓𝒆 𝑩𝒆𝒕𝒘𝒆𝒆𝒏 𝑮𝒓𝒐𝒖𝒑𝒔 (𝑴𝑺𝑩)
F= 𝑴𝒆𝒂𝒏 𝑺𝒒𝒖𝒂𝒓𝒆 𝑾𝒊𝒕𝒉𝒊𝒏 𝑮𝒓𝒐𝒖𝒑𝒔 (𝑴𝑺𝑾)
If the F-ratio is large, it suggests that the group means are not all
equal, indicating a statistically significant difference.
54
Example :
A materials engineer wants to know whether the mean
tensile strength (in MPa) of a metal specimen differs when
subjected to three different heat treatment temperatures.
Four specimens are tested at each temperature.
The Tensile strength data (MPa)
Temperature A Temperature B Temperature C
72 80 77
75 82 76
78 79 78
74 81 75
Test at α = 0.05 whether there is a statistically significant
difference in mean tensile strength among the three
temperatures.
Solution:
Step 1. State the Hypothesis
Null hypothesis(Ho) : All group means are equal
𝜇𝐴 = 𝜇𝐵 = 𝜇𝐶
Alternative hypothesis (Ha): At least one group mean is
different
Step 2: Compute Group Means and Grand Mean
Mean of Temperature A :
72+75+78+74
𝑥̅ A = = 74.75
4
80+82+79+81
𝑥̅ B = = 80.50
4
77+76+78+75
𝑥̅ C = = 76.50
4
55
Grand Mean :
72+75+78+74+80+82+79+82+79+81+77+76+78+75
𝑥̅ = 12
927
𝑥̅ = = 77.25
12
Step 3: Compute Sum of Squares
Between-Groups Sum of Squares (SSB):
SSB = ∑𝑘𝑖=1 𝑛𝑖 ( 𝑥̅ I - 𝑥̅ )2 =4[(74.74 − 77.25)2 +(80.8 - 77.25)2+ ( 76.50-77.25)2]
= 4[(2.5)2 +(3.25)2 + (0.75)2]
SSB = 4(17.375) = 69.5
Within Groups Sum of Squares (SSW)
Compute deviations within each group
For A : (72-74.75)2 + ( 75 – 74.75)2 + (78 – 74.75)2 + (74 -74.75)2
= 7.6875
For B : (80 -80.5)2 + (82 – 80.5)2 + (79 – 80.5)2 + (81 – 80.5)2
= 11.25
For C : (77 – 76.5)2 + ( 76 – 76.5)2 + (78 – 76.5)2 + (75 -76.5)2
= 9.8125
SSW = 7.6875 + 11.25 + 9.8125 = 28.75
Total Sum of Squares (SST):
SST = SSB + SSW = 69.5 + 28.75
SST = 98.25
Step 4: Degrees of Freedom
Between groups dfB = k – 1 = 3-1 = 2
Within groups dfw = N – k = 12 – 3 = 9
Total dft = N – 1 = 12-1 = 11
56
Step 5: Mean Squares
Mean Square Between (MSB)
𝑆𝑆𝐵 69.5
MSB = = = 34.75
𝑑𝑓𝐵 2
Mean Square Within (MSW)
𝑆𝑆𝑊 28.75
MSW = = = 3.19
𝑑𝑓𝑤 9
Step 6: Compute the F-Statistic/ ANOVA
𝑀𝑆𝐵 34.75
F = 𝑀𝑆𝑊 = = 10.88
3.19
Step 7: Decision Rule
Degree of Freedom: dfB = 2, dfw = 9
At 𝛼 = 0.05, the critical F (from F-distribution table) is
approximately Fcritical =4.256
Since Fcomputed = 10.88 is greater than Fcritical= 4.256, we have to
reject the null hypothesis.
Step 8: Conclusion
There is sufficient evidence at the 5% significance level to
conclude that not all mean tensile strengths are equal; such that, heat
treatment temperature has a statistically significant effect on tensile
strength.
ANOVA Summary Table
Source of SS df MS F
Variation
Between Groups 69.50 2 34.75
Within Groups 28.75 9 3.19 c10.88
Total 98.25 11
57
Chi- Square Test in Hypothesis Testing
The Chi-square (χ²) test is a non-parametric statistical
method used to determine whether there is a significant
difference between observed frequencies and expected
frequencies in categorical data. It is especially useful when
dealing with qualitative (nominal or ordinal) variables
rather than numerical measurements.
In engineering applications, the Chi-square test can help
evaluate:
• Goodness-of-Fit: How well an observed frequency
distribution matches a theoretical or expected distribution.
This used to determine if the distribution of a categorical
variable matches a hypothesized distribution. For instance:
testing whether defects in manufactured parts occur equally
across different machine shifts.
Formula:
(𝑂𝑖 −𝐸𝑖 )2
X2 = ∑
𝐸𝑖
Where : Oi = observed frequency for category i
Ei = expected frequency for category i
• Test for Independence: Whether two categorical variables
are associated or independent. For instance: testing
whether the type of welding method is independent of defect
occurrence.
Formula :
(𝑂𝑖𝑗−𝐸𝑖𝑗 )2
X =∑
2
𝐸𝑖𝑗
Where : Oij = Observed frequency in cell i , j of the
contingency table
(𝑅𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 𝑥 𝐶𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙)
Eij = 𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙
58
Example: Goodness-of-fit
A quality engineer wants to know if defects are equally
distributed across three production lines, at 5% level of
significance.
Production lines Observed defects(O) Expected defects
A 18 20
B 22 20
C 20 20
Solution :
Step 1. State the hypothesis
Null hypothesis (Ho): The defects are equally distributed
among the three production lines.
Alternative hypothesis(Ha): The defects are not equally
distributed among the three
production lines.
Step 2. Set the significance level.
𝜶 = 0.05
Step 3. Compute the expected frequencies
It is stated in the problem that the defects is expected to
be equally distributed. Since there are total of 60 observed
60
defects, therefore; 3 𝑙𝑖𝑛𝑒𝑠 = 20 is the expected frequency(refer to
table above).
Step 4. Calculate the Chi-square using the formula:
(𝑂𝑖 −𝐸𝑖 )2
X2 = ∑
𝐸𝑖
(18−20)2 (22−20)2 (20−20)2
X2 = + +
20 20 20
4 4
X2 = 20 + 20 + 0
X2 = 0.40
59
Step 5. Determine the degree of freedom
df = k-1 = 3-1= 2
Step 6. Find the critical value
From the Chi-square distribution table with df =2 and
𝛼=0.05
X2citical = 5.991
Step 7. Decision Rule
Since X2computed = 04 is less than X2citical = 5.991, fail to reject Ho,
therefore defects are equally distributed among the three production
lines.
Example- Test of Independence
An engineer investigates whether the type of welding
method is related to defect occurrence.
Welding Defective Non-defective Total
Method
A 12 18 50
B 8 42 50
Total 20 80 100
Step 1. State the Hypothesis
Null hypothesis (Ho): The type of welding method is
independent of defect occurrence.
Alternative hypothesis(Ha) : The type of welding is not
independent of defect occurrence or
there is an association between
welding method and defect
occurrence.
Step 2. Set the significance level.
𝛼 = 0.05
Step 3. Compute the expected frequencies.
60
Using the formula:
𝑅𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 𝑥 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙
E= 𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙
50 𝑥 20
For defective: E = = 10
100
50 𝑥 80
For non-defective: E = = 40
100
Step 4. Calculate the Chi-square using the formula:
(𝑂𝑖𝑗−𝐸𝑖𝑗 )2
X2 = ∑
𝐸𝑖𝑗
(12−10)2 (38−40)2 (8−10)2 (42−40)2
X2 = + + +
10 40 10 40
X2 = 0.4 + 0.1 + 0.4 +0.1 = 1.0
Step 5. Determine the degree of freedom
df = (r-1)(c-1), where r=rows, c=columns
df = (2-1)(2-1) =1
Step 6. Find the critical value
From the Chi-square distribution table with df =1 and
𝛼=0.05, X2critical= 3.841
Step 7. Decision Rule
Since X2computed = 1 is less than X2citical = 3.841, fail to
reject Ho, therefore welding methods and defect are
independent or welding method is independent of defect
occurrence.
61
VII. REGRESSION ANALYSIS
Learning Outcomes :
By the end of this lesson, students should be able to:
1. Define the purpose of regression analysis in engineering
context
2. Differentiate between simple and multiple linear regression
models.
3. Compute the regression coefficient(intercept and slope) for a
simple linear regression from raw data.
4. Interpret the meaning of the slope, intercept, correlation
coefficient (r) and coefficient of determination (R2) in real
engineering scenario.
5. Apply multiple linear regression to model a dependent
variable using two or more predictors
7.1 Regression Analysis
In engineering, we often encounter situations where one
variable depends on another. Regression analysis is a
statistical method used to examine the relationship between
a dependent variable (response) and one or more
independent variables (predictors). This helps engineers
predicts future values; understand the strength and direction
of relationships and optimize processes and improve design.
Example in Engineering:
• Predicting tensile strength from material composition
• Estimating fuel consumption based on vehicle speed
• Relating temperature to electrical resistance in a
component
7. 2 Type of Regression
• Simple Linear Regression- This consists one
independent variable and one dependent variable. A
mathematical equation that allows us to predict values
of dependent variable from known value one
independent variable is called regression equation.
𝑦̂ = bo + b1x
62
Where :
𝑦̂ = predicted value of y
bo= intercept
b1 = slope
x = independent variable
• Multiple Linear Regression. This is an extension of
simple linear regression that allows the modeling of a
dependent variable using two or more independent
variables. In engineering applications, many factors
often influence a response simultaneously, and
analyzing them together provides a more realistic and
accurate model. The general form of a multiple linear
regression equation is
𝑦̂ = b0 + b1x1 + b2x2 + ……bkxk
where 𝑦̂ is the predicted value of the dependent
variable, b0 is the intercept, b1,b2,…,bk are the
regression coefficients that represent the change in 𝑦̂
for a one-unit change in the corresponding predictor
variable x1 ,x2 ,…,xk holding the other predictors
constant. This “holding other variables constant”
property is important because it enables engineers to
assess the individual effect of each factor while
controlling for the influence of the others. For example,
in predicting the heat loss from a pipe, multiple
regression can be used to analyze the combined effects
of temperature difference, insulation thickness, and pipe
diameter. The strength of the model is measured by the
coefficient of determination (R2), which indicates the
proportion of variation in the dependent variable
explained by all predictors together. Multiple regression
also provides p-values for each coefficient, allowing
hypothesis testing to determine which factors
significantly affect the response. When applied
correctly, multiple linear regression is a powerful tool for
engineering decision-making, enabling accurate
predictions, optimization of designs, and identification of
key process variables.
63
Example- Simple Linear Regression
An engineer measures the load (in KN) to steel bar and
the resulting elongation(in mm). The goal is to develop a
predative model.
Load (KN) x Elongation (mm) y
10 0.21
20 0.45
30 0.69
40 0.92
50 1.15
Step 1. Compute the means
10+20+30+40+50
𝑥̅ = 5
= 30 , 𝑦̅ = 0.21+0.45+0.69+0.92+1.15
5
= 0.684
Step 2. Compute regression coefficients
Slope:
∑(𝑥𝑖 − 𝑥̅ )( 𝑦𝑖 − 𝑦̅)
b1= ∑(𝑥𝑖 −𝑥̅ )2
b1 = (10−30)(0.21−0.684)+(20−30)(0.45−0.684)+(30−30)(0.69−0.684)+(40−30)(0.92−0.684)+(50−30)(1.15−0.684)_
(10−30)2 +(20−30)2 +(30−30)2 +(40−30)2 +(50−30)2
b1 = 0.0235
Intercept:
b0 = 𝑦̅ – b1𝑥̅ = 0.684 – 0.0235(30)
b0 = - 0.021
So, the regression equation is :
̂ = - 0.021 + 0.0235x
𝒚
Step 3. Strength of linear relationship
Correlation Coefficient (r): In computing the r we use the
Pearson correlation coefficient formula :
∑(𝑥𝑖 −𝑥̅ )(𝑦𝑖 −𝑦̅)
r=
√∑(𝑥𝑖 −𝑥̅ )2 .∑(𝑦𝑖 −𝑦̅)2
64
Means : 𝑥̅ =30, 𝑦̅ = 0.684
xi y1 (𝑥𝑖 − 𝑥̅ ) (𝑦𝑖 − 𝑦̅) (𝑥𝑖 − 𝑥̅ )2 (𝑦𝑖 − 𝑦̅)2
10 0.21 -20 -0.474 400 0.224676
20 0.45 -10 -0.234 100 0.054756
30 0.69 0 0.006 0 0.000036
40 0.92 10 0.236 100 0.055696
50 1.15 20 0.466 400 0.217156
1000 0.55232
∑(𝑥𝑖 − 𝑥̅ ) . ∑(𝑦𝑖 − 𝑦̅) = 23.5
Applying the above formula for r:
23.5 23.5 23.5
r= = = 23.515 = 0.99994
√(1000)(0.55232) √ 552.32
Computing R2
For simple linear regression: R2 = r2
R2 = ( 0.99994)2
R2 = 0.99987
This means that 99.987% of the variation in elongation
is explained by the variation in load.
Step 4. Engineering interpretation
• r close to + 1: Very strong positive linear relationship.
This means that as load increases, elongation increases
proportionally.
• R2 near 1: The regression line fits the almost perfectly.
65
This Figure explains the simple linear regression of elongation
versus load. The red line is the best-fit regression line 𝒚 ̂
=−0.021+0.0235x, showing a strong positive linear relationship. The
Pearson correlation coefficient r≈0.99994 quantifies the strength and
direction of the linear association, and R2≈ 0.99987 indicates that about
99.987% of the variability in elongation is explained by the applied load.
66
LEARNING EXERCISES
Chapter 1
1. In your own words, define statistics and explain why it is important in
everyday decision making.
2. Give three real-life examples where statistical analysis is applied in your
community.
3. Identify whether each statement refers to Descriptive or Inferential statistics.
a. A survey found that 70% of residents in a town own motorcycle
b. Based on a sample, predicting that the average monthly income of all
farmers is P10,500.
c. A graph showing the monthly rainfall for the past year.
d. A company tests a product in 3 stores and concludes it will sell well
nationwide.
4. Classify each as Continuous, Discrete, Nominal or Ordinal variables and
create two examples of your for each type of variable
a. Body temperature of patients
b. Number of Engineering graduates in 2024
c. Religion of survey respondents
d. Clothing size( S,M,L,XL)
e. Rainfall in millimeters
5. Draw a simple flowchart categorizing variables into Quantitative/Qualitative
and their subtypes
Chapter II
1. Match each scenario to the most appropriate data collection method
a. PSA conducts a national census.
b. A teacher records the behavior of students during group activities.
c. A food company tests a new recipe in two different cities before
launching nationwide.
67
d. Researchers gather the opinions of 8 farmers about new irrigation
system.
e. Customers answer an online questionnaire about a store’s service.
2. From the following sources, identify if the data is Primary or Secondary:
a. Reading a newspaper article about an election
b. Interviewing a barangay captain about local projects.
c. Using PSA records for birth rates in a province.
d. Conducting your own traffic survey.
3. A researcher wants to know the opinion of residents on building a new
public market. There are 5,000 households in the municipality. If the
desired margin of error is 5%, compute the sample size using Slovin’s
formula.
4. Identify the sampling technique used in the following situations:
a. Selecting students by drawing lots.
b. Picking every 5th person from the list.
c. Selecting equal numbers of males and females from each college
departments.
d. Choosing three barangays at random and surveying all households in
them.
e. Interviewing only friends and classmates.
5. Given the dataset below, construct a frequency distribution table using the
steps provided in the lesson.
15, 18, 20, 22, 25, 25, 27, 28, 30, 31
16, 18, 19, 21, 23, 24, 26, 28, 29, 32
6. Decide which type of graph (Bar graph, histogram, Pie chart, Line graph) is
most appropriate for each situation:
a. Showing the monthly electricity consumption of a household for 1 year.
b. Comparing the number of male and female students in intervals of 10.
68
c. Showing the percentage share of different types of transport used by
employees.
7. Using the data below, draw a bar graph
Barangay Number of
Household
San Jose 120
Del Pilar 95
Mabini 85
San Roque 60
8. The scores of 40 students in a mathematics test are
45, 56, 67, 48, 90, 72, 65, 59, 82, 78
91, 66, 47, 64, 68, 74, 80, 87, 92, 55
60, 70, 62, 77, 85, 95, 40, 58, 63, 75
52, 61, 83, 69, 71, 88, 50, 57, 53, 81
Construct a histogram using an appropriate class interval.
9. The daily sales (in pesos) of a store for one week are:
Monday : 1,200 Friday : 2,000
Tuesday : 1,450 Saturday : 2,200
Wednesday : 1,800 Sunday : 1,750
Thursday : 1,650
Plot these data on a line graph and describe the trend.
Chapter III
1. A survey of monthly allowances (in pesos) of 40 college students produced
the following distributions:
Allowance Interval Frequency (f)
500 - 999 6
1,000 – 1,499 9
1,500 – 1,999 12
2,000 – 2,499 8
2,500 – 2,999 5
a. Identify the modal class
b. Compute the mode using the formula
2. The following are the daily sales (in pesos) of a food stall over 7 days: 1,500;
1,800; 1750; 2,000; 1,900; 1,600; 15,000.
69
a. Compute the mean sales
b. Compute the median sales
c. Which measures better represents the typical sales and why?
3. A researcher recorded the ages of 20 farmers in a rural community.
a. Organize the data into a grouped frequency table using a class
width of 5
b. Compute the mean, median and mode.
c. Interpret which measures best describes the central age of the
farmers.
4. Explain why the main is more affected by outliers than the median. Give an
example.
5. In what situations is the mode the most appropriate measure of central
tendency? Provide two real-life example.
Chapter IV
1. Define the following terms:
a. Sanple space
b. Event
c. Mutually Exclusive event
d. Independent event
e. Complementary event
2. List the sample space for each of the following
a. Tossing 2 coins
b. Rolling a die
c. Drawing a card from a standard deck (no joker)
3. Determine if the following pairs of events are mutually exclusive,
independent, or complementary.
a. Drawing a heart and drawing a red card from a deck
b. Rolling a 4 and rolling an even number on a die
c. Event A: Tossing a head, Event B: Tossing a tail
d. A: Student passes, A: Student fails
4. A menu offers 3 types of burgers and 4 types of drinks.How many different
meals can be formed consisting of one burger and one drink?
5. A password is made of 4 letters followed by 2 digits. How many different
passwords are possible if:
a. Repetition is allowed
b. Repetition is not allowed
6. Find the number of permutations:
a. Arranging 4 students in a line
70
b. Choosing and ordering 3 books from a shelf of 6.
7. Find the number of combinations:
a. Choosing 3 students from a group of 5
b. Choosing 2 toppings from 6 available pizza toppings
8. A card is drawn from a standard 52-card deck. Find the probability of :
a. Drawing an Ace
b. Drawing a red card or a face card
c. Not drawing a spade
9. A bag contains 4 red balls and 6 blue balls. One ball is drawn:
a. What is the probability of drawing a red ball?
b. What is the probability of not drawing a red ball
c. If two balls are drawn with replacement, what is the probability that
both are blue?
10. A die is rolled. Let A be the event of getting an odd number, and B be the
event of getting a number greater than 3. Find :
a. P(A), P(B)
b. P(A∩ B), P(A∪ B)
Chapter V
1. Two dice are rolled. Let the random variable x be the sum of the numbers
on the two dice.
a. List the sample space
b. Determine the possible value of x
c. Construct a probability distribution table for x = 2 to x = 12.
d. Verify if the distribution is valid
2. A coin is tossed 3 times. Let x be the number of heads.
a. List the sample space
b. Create the probability distribution of x
c. Compute e(x)
3. Given the following probability distribution of x;
x P(x)
0 0.2
1 0.3
2 0.4
3 0.1
a. Construct the cumulative distribution function table
b. Find F(2)
c. Interpret what F(2) means in the context of probability.
71
4. The probability distribution of the number of defective bulbs in a box of 3 is
:
x P(x)
0 0.5
1 0.3
2 0.1
3 0.1
a. Compute the expected value E(x)
b. Compute the variance and standard deviation of x.
5. A coin is tossed 5 times. What is the probability of getting;
a. Exactly 2 heads
b. At most 2 heads
c. At least 1 head
6. In a class, 60% of students bring their own calculator. If 8 students are
randomly selected, what is the probability that ;
a. Exactly 5 bring a calculator
b. All 8 bring a calculator
c. None bring calculator
7. A small bakery receives an average of 3 phone orders per hour. What is the
probability that;
a. Exactly 2 phone order are received in one hour.
b. More than 2 phone orders are received
c. No orders are received in an hour.
8. A call center receives on average 5 complaints per day. Use the Poisson
distribution to compute the probability that :
a. Exactly 3 complaints are received
b. Fewer than 3 complaints are received
c. At least 1 complaints is received
Chapter VI
1. The average tensile strength of a type of steel rod is claimed to be
500MPa. A sample of 36 rods has a mean strength of 492 MPa. The
population standard deviation is known to be 18 MPa.
a. At 5% level of significance, test the claim that the average tensile
strength is still 500 MPa.
b. State the null and alternative hypothesis
c. Compute the test statistics
d. Draw your conclusion
72
2. A researcher believes that the average weekly working hours of engineers
is more than 40 hours. A sample of 10 engineers showed the following
working hours last week: 44, 42, 39, 46, 41, 43, 40, 44, 42
a. Use a 0.05 significance level to test the claim
b. State Ho and Ha
c. Calculate the sample mean and standard deviation.
d. Find the t-statistics and make a conclusion
3. Two types of fertilizer are being compared. The yield (in kg) of crops using
each fertilizer are shown below:
Fertilizer A: 45, 47, 50, 44, 46
Fertilizer B: 51, 53, 52, 54, 55
Assuming equal variances:
a.Test whether the two fertilizer results in different yield at 0.01 level
of significance
a. Compute the pooled variance
b. Find the t-statistics and the critical value
c. State your decision
4. Three different machines are tested for the time (in minute) it takes to
produce a certain component. The data are:
Machine A: 20, 22, 29
Machine B: 24, 23, 25
Machine C: 21, 20, 22
a. Perform the one-way ANOVA at α = 0.05
b. State the null and alternative hypotheses
c. Show the computation of SST, SSB and SSE
d. Draw a conclusion
5. A study is conducted to examine the relationship between gender and preference
for a new product
Gender Like Neutral Dislike
Male 20 10 5
Female 15 25 15
a. Test the independence of gender and product preference at
α = 0.05
b. State Ho and Ha
c. Compute the expected frequencies
d. Calculate the chi-square statistics and make a conclusion
73
Chapter VII
1. Given the following dataset :
Study Hours(x) Exam Score(y)
2 65
3 70
5 75
7 85
8 90
a. Compute the Mean, slope and the intercept
b. Write the regression equation
c. What does the slope means in this context?
d. Predict the exam score of a student who studied for 6 hours.
2. Based on the regression equation from Problem 1
a. Compute the predicted 𝑦̂
b. Compute the residual e = y - 𝑦 ̂
c. Calculate SSR, SSE, SST
𝑆𝑆𝑅
d. Compute R2 =
𝑆𝑆𝑇
e. How well does the model explain the variability
f. Is the model a good fit
3. An engineer wants to predict the fuel consumption of a machine based on
the load (kg) applied and operating time (hrs.). Data from 10 trials are
collected.
a. Perform multiple linear regression
b. Interpret the model output
c. Predict fuel consumption when load = 100kg and time = 4 hours
d. Validate the model using residual analysis
74
REFERENCES
Montgomery, D. C., & Runger, G. C. (2014). Applied Statistics and Probability
for Engineers.
Walpole, R. E., Myers, R. H., Myers, S. L., & Ye, K. (2012). Probability and
Statistics for Engineers and Scientists.
Devore, J. L. (2011). Probability and Statistics for Engineering and the
Sciences (8th Edition).
Navidi, W. (2015). Statistics for Engineers and Scientists (4th Edition).
McGraw-Hill Education.
NIST/SEMATECH e-Handbook of Statistical Methods
https://www.itl.nist.gov/div898/handbook/
Khan Academy – Statistics and Probability
https://www.khanacademy.org/math/statistics-probability
MIT OpenCourseWare – Probability and Statistics in Engineering
https://ocw.mit.edu
75
76