Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
7 views77 pages

ES-214-IMs Data Analysis For Engineering

It is for Engineering data analysis it has a 5 chapters that you can use in your engineering subjects
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views77 pages

ES-214-IMs Data Analysis For Engineering

It is for Engineering data analysis it has a 5 chapters that you can use in your engineering subjects
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

PREFACE

This instructional material, From data to decisions: A Practical


Guide to Engineering Data Analysis, was developed to guide
engineering students in understanding and applying statistical methods
to real-world engineering problems. As future engineers, students must
not only know how to compute but also how to interpret and use data
to make informed, evidence-based decisions.

Because of my almost eight years of teaching ES 214 ( Engineering


Data Analysis) at the College of Engineering, University of Eastern
Philippines, I have been both motivated and deeply interested in
creating this instructional material. Over the years, I have observed the
challenges students face in understanding statistical concepts and
applying them to engineering contexts. This experience has inspired
me to design a resource that bridges the gap between theory and
application — one that presents concepts in simple and clear language,
explains procedures step-by-step, and uses practical engineering
examples that connect learning directly to professional practice.

The lessons in this material cover both fundamental and advanced


topics — from descriptive statistics and probability distributions to
hypothesis testing, correlation, and regression analysis — all designed
to build students’ confidence and analytical thinking. Step-by-step
solutions, clear explanations, and worked problems are provided to
help learners apply statistical tools effectively in design, quality control,
research, and decision-making.

This work is dedicated to my students at the College of Engineering,


University of Eastern Philippines, whose curiosity and determination
continue to inspire me. It is my hope that this material will serve not
only as a course reference but also as a companion in your journey
toward becoming analytical, critical-thinking, and solution-driven
engineers.

Merewina Llanie A. Tapong


I. THE INTRODUCTION

Learning Outcomes:
At the end of the topic, students should be able to:

1. demonstrate an in-depth understanding of the key concepts,


principles, symbols, techniques and procedures in statistical
analysis

2. apply appropriate statistical approaches to analyze and interpret


data effectively.

1.1 Statistics and its Definition

The term Statistics refers to the word “data” in a general


sense but it also refers to the statistical techniques which are
concerned on the collection, organization, presentation, analysis,
interpretation and drawing conclusions from the data.

There are several reasons why we should study statistics.


Among the most important reasons are the following:

1. Knowledge in statistics helps us use the proper methods to collect


the data, employ the correct analyses, and effectively present the
results. Statistics is a crucial process behind how to make
discoveries in science, make decisions based on data and make
predictions.

2. Another reason to study statistics are to be able to effectively


conduct research, to be able to read and evaluate journal articles
to further develop critical thinking and analytical skills.

3. We may have to make decisions based on the data and


information of statistical studies such as what product to purchase
based on consumer studies, how much budget should be allotted
by a company for advertisement expense etc.

1
1.2 Descriptive and Inferential Statistics

The approach to statistical analysis involves two aspects (1)


the collection of numerical information in terms of a set of numbers
called data for a particular phenomena to be studied, and (2) the
drawing together of these data into meaningful relationship/theories.

Statistics are essentially of two main branches: (1) descriptive


statistics and (2) inferential statistics. Descriptive statistics consists
of the collection, organization, presentation and analysis of data.
These aims to summarize raw data of any size or value. These also
facilitate accurate description of an observation and also
comparison. Thus, descriptive statistics aims in ordering and
summarizing a given set of data without any direct reference to any
interference may be drawn otherwise.

If the sample is drawn from a total set of observations, some


method is required to draw conclusions about the characteristics of
the total population from the characteristics of the sample. The
statistics of drawing such inferences from the numerical data is
known an the inferential statistics. Thus inferential statistics
consists of higher degree of analysis, interpretation and inferences.

1.3 Variables and Types of Data

A variable is a characteristics of a population or sample which


makes one different from the other. It is a quantity that can be
counted. A variable may also be called a data item. Age, sex,
business income and expenses, place of birth, capital expenditures,
class grades and vehicle type are examples of variables.

There are different ways variables can be described according


to the ways they can be studied, measured and presented.

2
Numeric Variables

Numeric variables have values that describe a measurable


quantity as a number, like “how many” or “how much”. Therefore
numeric variable are quantitative variables.

Numeric variables may be further described as either


continuous or discrete

➢ A continuous variable is one foe which all values are possible


including fraction, within the total range of data. Examples of
continuous variable includes height, time , age, rainfall and
temperatures.

➢ A discrete variable is one for which measurements are in


whole units or integers only ( including zero). It cannot the
value of a fraction between one value and the next closest
value. Examples of discreet variables include the number of
registered cars, number of business locations, number of
persons in the household, number of children in the family. All
of which measured as whole units.

Categorical Variables

Categorical variables have values that describe a “quality” or


characteristics of a data unit, like “what type” or “which category”.
Categorical variables are qualitative variables and tend to be
represented by a non- numeric value.

Categorical variables may be further measured and described


as nominal and ordinal:

➢ Nominal scale is the most elementary form of measurement


where data exist only in the form of categories in terms of
present or absent, male or female, rural or urban, religion, and
brand.

3
➢ Ordinal scale. At this level we have sufficient information not
only to establish differences between objects but also to place
our data in rank order either individually or in classes.
Example of categorical variables include academic grades
(i,e. 75, 80, 85,), clothing size (i,e. small, medium, large, extra
large) and attitudes (i,e. strongly agree, agree, disagree,
strongly disagree).

Types of Variable Flowchart

4
II. COLLECTION, ORGANIZATION AND
PRESENTATION OF DATA

Learning Outcomes
By the end of this lesson, students should be able to:

1. Identify various methods of data collection and presentation.

2. Differentiate between probability and non-probability sampling


techniques.

3. Organize raw data into a frequency distribution table.

4. Illustrate data using different types of graphical presentations.

2.1 Data Collection

Data collection is important in statistics since it gives the raw


data needed for research, analysis, and decision making. It serves
as the basis for creating meaningful insights, drawing conclusions,
and making evidenced-based decisions for individuals, companies,
and organizations. For instance, in our choice of career or partner
in life, we make decisions based on the data and information that
we have gathered.

Data may be gathered in two (2) types. Primary data is the


first-hand information collected by a researcher. It is collected for
the first time, original and more reliable. For example, the population
census conducted by the government every 3 years is primary data.
Secondary data on the other hand , refers to the second hand
information. It is not originally collected and rather obtained from
already published or unpublished source like newspapers, journals
and magazines. For instance, a reporter who goes directly to the
crime scene to interview the victim and witnesses around has
gathered a primary data, while the readers who read the news item
of the scene have received the secondary data.

Here are some of the most common data collection method:

5
1. Interview Method. This is a direct method of data collection . It
is simply a process in which the interviewers asks questions and
the interviewee responds to them. It provides a high degree of
flexibility because questions can be adjusted and changed
anytime according to the situation.

2. Survey and Questionnaire Method. This method provide a


broad perspective from large groups of people. They can be
conducted face-to-face, mailed, or even posted on the internet to
get respondents from anywhere in the world. The answers can be
yes or no, true or false, multiple choice, and even open -ended
questions. However , a drawback of surveys and questionnaires
is delayed response and the possibility of ambiguous answers.

6
3. Registration Method. This method of collecting data is governed
by our existing laws. The researcher gather data from offices
concerned, e.g. the Philippine Statistics Authority (PSA), the
Commission on Election (COMELEC), Municipal/City Hall or
Barangay Offices. The PSA takes care of keeping the complete
records of birth and death of the population. The COMELEC takes
care of the list of registered voters.

4. Observation Method. In this method, researchers observe a


situation around them and record the findings. It can be used to
evaluate the behavior or different people in controlled (everyone
knows they are being observed) and uncontrolled (no one knows
they are being observed) situations. This method is highly
effective because it is straightforward and not directly dependent
on other participants.

7
5. Experimental method. This method of data collection involves the
manipulation of the samples by applying some form of treatment
prior to data collection. It refers to manipulating one variable to
determine its changes on another variable.

6. Focus Groups. This is similar to an interview, but it is conducted


with group of people who all have something in common. The data
collected is similar to in-person interviews, but they offer a better
understanding of why a certain group of people thinks in particular
way. However some drawbacks of this method are lack of privacy
and domination of the interview by one or two participants. Focus
groups can also be time-consuming and challenging, but they
help reveal some of the best information for complex situation.
7.

8
2.2 Determining the Sample Size

Most surveys conducted are done on a sample basis because


of time and cost involve if the population is used. Sample size is a
research term used for defining the number of individuals included
in a research study to represent the population. The sample size
references the total number of respondents included in the study,
and the number is often broken down into subgroups by
demographic such as age, gender and location so that the total
sample achieves represents the entire population.

Determining the appropriate sample size is one of the most


important factors in statistical analysis . If the sample size is too
small, it will not yield valid results or adequately represents the
realities of the population being studied. On the other hand, while
larger sample size yield small margin of errors and are more
representative, a sample size that is too large may significantly
increase the cost and time to conduct the research.

Slovin’s Formula id used to calculate the sample size


necessary to achieve a certain confidence interval when sampling a
population. This formula is used when you don’t have enough
information about a population’s behavior to otherwise know the
appropriate sample size.

𝑁
n = (1+ 𝑁𝑒 2 )

where : N = population size


e = margin of error

Margin of error is the error we expect to commit in getting the


sample. If for instance we want to conduct a survey on the average
income of the families in the province of Northern Samar, then we
can only probably use 5 municipalities. This is due to the difficulty of
obtaining data on the income of families from all the municipalities
of the province. Hence, we cannot avoid having an error in the
results of the study since we are using only a sample of the
population.
9
Example 1

An Statistics student is conducting an inquiry regarding the


reaction of the students from the College of Engineering of a certain
university to the recent tuition fee increase. If there are 3,500
engineering students and the research wants to have a 99%
accuracy, then determine the sample size that should be taken as
respondents.

Solution to the Problem.

a. Determine the value of the Population N from the problem,

N = 3,500 (engineering students population)

b. Determine the value of the Margin of Error, “e”, to have a 99%


accuracy ( 100% - 99% )

e = 0.01

c. Substitute the value of “N” and “e” in the Slovin’s Formula

𝑁 3500 3500 3500


n = (1+ 𝑁𝑒 2 ) = 1+(3500 𝑥 0.012 ) = 1+(0.35) = = 2592.59
1.35

Therefore , the sample size “n” that should be taken as respondents = 2593
engineering students

2.3 Sampling Techniques

A sample should not be selected in haphazard way


because the information obtained from the study might be
unbelievable and unrealistic. When you conduct research about a
group of people , it’s rarely possible to collect data from every
person in the group. Instead, you select sample. The sample is the
group of individuals who will actually participate in the research. To
draw valid conclusions from your results, you have to carefully
decide how to select sample that is representative of the group as

10
a whole. This is called a sampling technique. There are two
primary types of sampling methods that you can use in your
research.

➢ Probability Sampling means that every member of the


population has a chance of being selected. It is mainly used in
quantitative research. If you want to produce results that are
representative of the whole population, probability sampling
techniques or random sampling technique are the most valid
choice.

Among the types of probability sampling techniques:

1. Simple random Sampling. In this type of random


sampling, every member of the population has an equal
chance of being selected. Example is the lottery sampling.
Each member of the population is numbered on a piece of
paper. This piece of paper shall be identical (equal in size and
weight) and rolled evenly. They are placed in a lottery box and
shaken very well. The desired number of samples are drawn
one after the other.

11
2. Systematic Sampling. This is similar to simple random
sampling, but it is slightly easier to conduct. Every member
of the population is listed with a number, but instead of
randomly generating numbers, individuals are chosen at
regular intervals. Example, there are 1000 (N) employees
in the University of Eastern Philippines and 50 samples are
needed. We divide 1000 by 50 and obtained n =20. We
then select one number from 1-20 by lottery. If the number
6 happens to come out, then the first sample is 6. The
second sample is 6 + n = 20 and so on. The process is
continued and you end up with a sample of 50.

3. Stratified Sampling. To use this sampling technique, you


divide the population into subgroups called strata, based
on relevant characteristics (e.g. gender identity, age range,
income bracket… If the desired sample is 50 and there are
10 subgroups, then we obtained the sample proportional
from each subgroup. Then you use random or systematic
sampling to select a sample from each group. Example, the
University of Eastern Philippines has 800 female
employees and 200 male employees. You want to ensure
that the sample reflects the gender balance of the
company, so you sort the population into strata based on
the gender. Then you use random sampling on each group,

12
selecting 80 women and 20 men which gives you a
representative sample of 100 people.

4. Cluster Sampling. This involves dividing the population


into subgroups, but each subgroup should have similar
characteristics to the whole sample. Instead of sampling
individual from each subgroup, you randomly select entire
subgroups. This is sometimes called area sampling
because it is used for large population. For instance, a
certain company has 10 offices in the cities across the
country (all with the same number of employees in similar
roles). You don’t have the capacity to travel to every office
to collect your data, so you use random sampling to select
3 cities- these are your clusters.

13
➢ Non – Probability Sampling – means individual are selected
on non-random criteria and not every individual has a chance
of being included. This is easier and cheaper, but it has a
higher risk of sampling bias, and therefore not reliable such as
those sample drawn by research base on their own
judgement.

Among the type of non-probability sampling:

1. Convenience Sampling. This is used because it is


convenient to the researcher. A convenience sample
simply includes the individual who happen to be most
accessible to the researcher. Convenience samples are at
risk for both sampling bias and selection bias. Example, a
researcher may find out which hair shampoo is the most
popular among households by making phone calls using
the phone numbers found in the telephone directory. While
the data may easily be obtained, the accuracy of the data
may not be reliable since not all households have
telephone connections.

2. Purposive Sampling. This type of sampling is also known


as judgement sampling, involves the researcher using their
expertise to collect a sample that is most useful to the
purpose of the researcher. The researcher usually gets this

14
sample from the respondents purposely related or close to
him. For instance, you want to know more about the
opinions and experiences of disabled students at your
university, so you purposefully select a number of students
with different support needs in order to gather a varied
range of data on their experiences with student services.

3. Quota Sampling. This sampling relies on the non-random


selection of a predetermined number of proportion. In this
method, the researcher uses the proportion of different
strata; and from the strata, selection are done using quota.
Quota sampling is quick, easy and inexpensive way to get
survey result. The drawback is that because of the lack of
randomization, there is a greater potential for survey bias.
For example, a store determines its customer base of 1000
is comprised of 600 women and 400 men. Sample based
on proportion. The quota size should be representative of
the collective subgroup population. In the example above,
he should select 60 women and 40 men.

15
2.4 Presentation of Data.

As soon as the data collection is over, the researcher needs


to find a way of presenting the data in a meaningful, efficient and
easily understood way to identify the main features of the data at a
glance using a suitable presentation method. Generally, the data in
statistics can be presented in three different ways, such as textual
method, tabular method, and graphical method.

1. Textual Method. Also called the paragraph method , is


used to present purely qualitative data or if there are only
few numerical data. This method is desirable and effective
when data are presented in paragraph form using small
columns like those in newspaper. One has to read through
the whole text in order to understand and comprehend the
main point of the data. For example, there are 50 students
in a class, among them, 30 are boys and 20 are girls. This
is the data that can be understood with the help of a simple
text and no table or pie diagram is required for the same.

2. Tabular Method. Statistically, tables are effective devices


of presenting both qualitative and quantitative data. It is a
systematic and logical arrangement of data in the form of
rows and tables with respect to the characteristics of the
data. The table can be used conveniently to make
comparison and draw relationship between and among the
variables. It presents the data in a simple form, save space,
facilitate comparison, facilitate statistical analysis and
reduce chances of error.

Among the most commonly used tabular method is


Frequency Distribution Table. This is a way to organize
data so that it makes data more meaningful. Data requires
to be organized and summarized for carrying out statistical
analysis. Thus the first step in a statistical analysis of a set
of raw data often consists of frequency distribution in a form

16
of a frequency distribution table. This involves grouping the
data on the basis of class intervals or class limits , specified
by the lowest and highest values in the frequency table.

To construct a frequency distribution table, the following


rules shall be followed:

1. The general rule should be the number of classes


k = 1 + 3.3 log n, where n is the total number of samples.
2. There should be no overlapping of samples in the class
intervals.
3. Include all classes. A class interval with no frequency and
located between the first and last class interval should be
included.
4. There should be enough classes to accommodate all the
data.
5. The classes must equal in size, except when the class are
open ended such as the classes below.
75 and below
76-80
81-85
86-90
91 and above

Example : Construct a frequency distribution table of the set


of data below on the percentage of students of the total
population of the 54 universities of the Philippines.

63.5 31.5 26.6 33.5 35.0 30.4


58.0 30.5 27.3 32.3 53.5 35.2
51.5 54.4 27.5 30.0 51.7 32.4
45.4 56.9 27.1 28.7 53.9 30.2
40.1 32.7 28.6 32.7 61.5 28.7
38.8 34.8 27.8 34.2 59.4 26.8
33.5 29.8 27.4 33.2 34.4 27.9
28.1 28.7 29.4 31.9 31.8 29.6
29.2 27.6 27.3 31.2 31.3 29.0

17
Solution:

Step 1. Find the Range


From the above data it is found that the highest value
HV =63.5 and the lowest value LV = 26.6, the range therefore
is :
Range (R) = HV – LV
= 63.5 – 26.6
= 36.9

Step 2. Determine the number of classes (usually between


5 and 20). The number of classes has be rounded
up to the whole number.

The number of classes can be


k = 1 + 3.3 log n
= 1 + 3.3 log 54
k = 6.7≈ 7

Step 3. Find the class width or class size

𝑅 36.9
Class size (c) = = = 5.50
𝑘 6.7

Step 4. Select a starting point. It must be equal or lower


than the smallest value

Table 1. Frequency Distribution Table of Percentage of Students


> cumulative < cumulative Percentage
frequency frequency Relative
Class interval Tally Frequency
Frequency
26.6 - 32.0 //\//, //\//, //\//, //\//, //\//,//// 29 29 54 53.7
32.1 - 37.5 //\//, //\//, // 12 41 25 22.2
37.6 - 43.0 // 2 43 13 3.7
43.1 - 48.5 / 1 44 11 1.9
48.6 - 54.0 //// 4 48 10 7.4
54.1 - 59.5 //// 4 52 6 7,4
59.6 - 65.0 // 2 54 2 3.7
TOTAL 54 100

18
3. Graphical Method. This are visual tools for presenting
statistical data in an organized, easily interpretable manner.
They help simplify complex datasets, reveal patterns, trends,
and distributions, and make comparisons easier. Graphs
enhance communication of data findings, making them
essential tools for analysis, reporting, and decision-
making.Below are common graphical methods, how to use
them, and their concepts:

3.1. BAR GRAPH


The purpose of this graph is to compare categories of
data using rectangular graph. Each bar represents a
category. The length/height of the bar corresponds to the
value or frequency of the category. This is used for
discrete or categorical data. Can be vertical or
horizontal and allows easy comparison between groups.

Example Scenario:

You conducted a survey asking 100 people about their


favorite type of fruit. The results are as follows:

Categories Values
Apple 30
Bananas 25
Oranges 20
Grapes 15
Mangoes 10

19
3.2. Histogram.

The purpose of his graph is to represent the


frequency distribution of continuous data. This divides data
into intervals and is ideal for showing data distribution (e.g.,
normal distribution, skewness

Example Scenario

You conducted a test for 50 students and recorded their


scores out of 100. The scores are as follows:

Scores:
45, 56, 67, 48, 90, 72, 65, 59, 82, 78, 91, 66, 47, 64, 68, 74,
80, 87, 92, 55, 60, 70, 62, 77, 85, 95, 40, 58, 63, 75, 52, 61,
83, 69, 71, 88, 50, 57, 53, 81, 79, 76, 89, 54, 49, 46, 73, 84,
93, 51.

➢ Divide the data into continuous interval

Minimum score: 40
Maximum score: 95
Range: 95 - 40 = 55
Suggested interval: 10 (you can adjust based on preference).

Interval Frequency
40-49 7
50-59 10
60-69 10
70-79 8
80-89 9
90-99 6

20
3.3. Pie Chart

The purpose of this graph is to show proportions of a


whole. This divide a circle into slices, where each slice
represents a category. The size of each slice is proportional
to the percentage or frequency of the category. This
graph is best for showing relative contributions.

Example Scenario for a Pie Chart:

A class of 40 students participated in a school election. The


number of votes received by each candidate is as follows:

Candidate A: 16 votes
Candidate B: 12 votes
Candidate C: 8 votes
Candidate D: 4 votes

Candidate Frequency(Votes) Percentage


Candidate 1 16 40
Candidate 2 12 30
Candidate 3 8 20
Candidate 4 4 10
Total 40 100

21
Percentage Vote Received by Each
Candidate

Candidate 1 Candidate 2 Candidate 3 Candidate 4

3.4. Line Graph


The purpose of the line graph is to display trends over
time or continuous variables. To use this you have to plot
data points on the graph and connect the points with a line.
This is useful for observing changes, patterns and
trends and commonly used in time-series data.

Example Scenario for a Line Graph:

You are tracking the temperature of the municipality of


Catarman over a week to observe trends. Below are the
daily temperature readings (in degrees Celsius):

Day 1 (Monday): 25°C


Day 2 (Tuesday): 28°C
Day 3 (Wednesday): 30°C
Day 4 (Thursday): 27°C
Day 5 (Friday): 26°C
Day 6 (Saturday): 29°C
Day 7 (Sunday): 31°C

22
Temperature Reading of the Week
35
30
25
20
15
10
5
0
Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7

Temperature Column2

General Concepts of Using Graphical Methods:

1. Clarity: Choose a method that simplifies interpretation


and avoids clutter.

2. Accuracy: Ensure scales and proportions accurately


reflect data.

3. Relevance: Match the graph type to the data and


research objectives.

4. Accessibility: Label axes, include a legend (if


necessary), and use consistent scales.

5. Interpretation: Use graphs to draw attention to key


findings or trends.

6. Comparison: When comparing datasets, use consistent


scales and formats.

23
III. MEASURES OF CENTRAL TENDENCY (Mean,
Median and Mode)

Learning Outcomes

By the end of this lesson, students should be able to:

1. Explain the concept and importance of measures of central tendency


in statistics.

2. Differentiate among the mean, median, and mode, and describe their
characteristics.

3. Compute the mean, median, and mode for both ungrouped and
grouped data.

4. Analyze real-life datasets to determine the most appropriate measure


of central tendency.

5. Evaluate the strengths and limitations of each measure, considering


the impact of outliers.

6. Solve practical and theoretical problems involving measures of


central tendency.

3.1 Mean

The mean, often referred to as the average, is one of the


most widely used measures of central tendency in statistics. It
provides a single value that represents the center or typical
value of a dataset, offering a quick summary of the data.

The mean is calculated by summing all the values in a


dataset and dividing this total by the number of observations.
For example, if a dataset consists of the numbers 5, 10, and
15, the mean would be:
5+10+15
Mean= = 10
3

This measure is particularly useful when all data points are of


equal importance and the dataset is symmetrically distributed. It
helps identify the "balance point" of the data, making it a powerful
tool for comparing groups or trends.

24
Characteristics of the Mean

1. Sensitive to Every Value

The mean considers all values in the dataset, making it a


comprehensive summary of the data. However, this also makes it
sensitive to outliers (extremely high or low values), which can skew
the mean and make it less representative of the central tendency.

2. Applicability Across Fields

The mean is used in various disciplines, including economics,


education, and science, to analyze data such as average
income, test scores, and experimental results.

3. Ease of Interpretation

As a single number, the mean is easy to interpret and


communicate, making it ideal for presenting findings to diverse
audiences.

The mean is a fundamental statistical concept that simplifies


complex data into a single, interpretable value. By
understanding its calculation, uses, and limitations, one can
effectively apply it to summarize and analyze datasets in a variety
of real-world contexts.

Formula for Computing the Mean

1. Mean for Ungrouped Data

The formula for the mean ( 𝑥̅ ) of ungrouped data is:

∑𝑥
𝑥̅ = 𝑛

Where: ∑x = the sum of all data values.

n = the number of data values.

2. Mean for Grouped Data

The formula for the mean (𝑥̅ ) of grouped data is:


25
∑ 𝑓𝑥
𝑥̅ = ∑𝑓

Where:

f = the frequency of each class interval.

x = the midpoint of each class interval.

∑fx = the sum of the products of frequencies and midpoints.

∑f = the total frequency.

Example 1: Ungrouped Data

A student scored the following marks in 5 subjects:


Marks: 85,90,78,92,88

Solution:
∑𝑥
𝑥̅ = 𝑛
85+90+78+92+88
𝑥̅ = 5

The mean score is 86.6.

Example 2: Grouped Data

A frequency table of test scores is given below:

Class Interval Frequency (f)


40-49 5
50-59 8
60-69 10
70-79 6
80-89 4
Total 33

26
Solution:

1. Find the midpoints (x) of each class interval:

𝑙𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡+𝑢𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡


x= 2

40+49
For example : x = = 44.5
2

Class Interval frequency (f) midpoint (x) fx


40-49 5 44.5 222.5
50-59 8 54.5 436
60-69 10 64.5 645
70-79 6 74.5 447
80-89 4 84.5 448
Total 33 2088.5

2. Compute the mean


∑ 𝑓𝑥
𝑥̅ = ∑𝑓
2088.5
= = 63.6
33
The mean score is 63.5

3.2 Median
The median is a measure of central tendency that represents
the middle value of a dataset when it is arranged in ascending or
descending order. Unlike the mean, the median is not influenced by
extreme values, making it particularly useful for datasets that
contain outliers or are skewed.

Median of Ungrouped Data

• For an odd number of data points, the median is the exact


middle value. For example, in the dataset 3,7,9, the median is 7,
as it is the middle number when the data is ordered.
• For an even number of data points, the median is calculated as
the average of the two middle values. For instance, in the dataset
4,6,8,10, the median is:

27
𝟔+𝟖
Median = =7
𝟐

Median in Grouped Data

In grouped data, the median is determined using the


median class, which is the class interval that contains the
middle value of the cumulative frequency distribution. The
formula for the median is:
𝑛
+𝐶𝐹
2
Median = L + ( )
𝑓

Where:

• L= Lower boundary of the median class.


• N= Total frequency.
• CF Cumulative frequency before the median class.
• f Frequency of the median class.
• h Width of the class interval.

Characteristics of the Median

1. Resistant to Outliers:
Since the median depends only on the order of data values, it is
unaffected by extreme values that could distort the mean.

2. Represents the Center of a Distribution:


The median divides the dataset into two equal parts, with 50% of
the data lying below it and 50% above.

3. Applicable Across Various Scenarios:


The median is widely used in fields such as economics (e.g.,
median income), healthcare (e.g., median survival time), and
education (e.g., median test scores).

The median is a valuable measure of central tendency,


especially for analyzing skewed data or distributions with outliers.
By providing the "middle value," it offers insights into the central
location of data, ensuring a fair representation even when
extreme values are present.

28
Example of Computing Median

1. For Ungrouped Data

Find the median of the following dataset:


8, 12, 15, 9, 11
Solution:

1. Arrange the Data in Ascending Order:


8,9,11,12,15

2. Count the Total Number of Observations (n):


n=5 (odd number of data points)

3. Find the Median Position:


The median is the middle value when n is odd:

4. Identify the Median Value:


The 3rd value is the middle value in the ordered data set is 11

Media = 11

2. Grouped Data

A teacher recorded the scores of 50 students in a


mathematics test and organized them into a frequency
distribution table as follows:
Cumulative
Score Interval Frequency (f) Frequency
(CF)
40 – 49 4 4
50 – 59 6 10
60 – 69 10 20
70 – 79 15 35
80 – 89 9 44
90 – 99 6 50

Question:

Using the given frequency distribution table, determine the


median score.

29
Step-by-Step Solution

1. Find the median class:

⚫ The total number of students is n = 50.

⚫ The median position is at n/2 = 50/2 = 25.

⚫ Locate the cumulative frequency (CF) where 25 is


found or first exceeded.

⚫ Looking at the CF column, 35 (corresponding to the


class 70–79) is the first cumulative frequency greater
than 25.

⚫ So, the median class is 70–79.

2. Identify the values needed for the median formula:


𝒏
+𝑪𝑭
𝟐
Median = L + ( )
𝒇

⚫ L = Lower boundary of median class = 69.5 (Since class interval is 70–


79, the lower boundary is 70 – 0.5 = 69.5)

⚫ n = Total number of students = 50

⚫ CF = Cumulative frequency before median class = 20

⚫ f = Frequency of median class = 15

⚫ h = Class width = 10 (Since the intervals are 40–49, 50–59, etc.)

⚫ Modal class

3. Substituting values
25 – 20
Median = 69.5 + ( ) x 10
15

5
= 69.5 +( 15 ) x 10

= 69.5 + 3.33
= 72.83

30
4. Final Answer :

The median score is 72.83

3.2 Mode
The mode is the value that appears most frequently in a
dataset. In grouped data, the modal class is the class interval with
the highest frequency. Since the mode is not directly observable in
grouped data, we use an interpolation formula to estimate it.

Mode of the Grouped Data:

𝑓1 – 𝑓0
Mode = L + ( (2𝑓 – 𝑓0 – 𝑓2
)xh
1

Where:

• L = Lower boundary of the modal class


• f₁ = Frequency of the modal class
• f₀ = Frequency of the class before the modal
class
• f₂ = Frequency of the class after the modal class
• h = Class width
• Modal class: The class interval with the highest
frequency in the frequency distribution table.

Characteristics of the Mode

1. If the dataset has a single mode, it is called unimodal.


2. If it has two modes, it is bimodal.
3. If it has more than two, it is multimodal.
4. If all values appear with similar frequency, the data is uniform
and has no mode.

Example Problem:

A teacher recorded the test scores of 60 students and


organized them into a frequency distribution table:

31
Score Interval Frequency (f)

30 – 39 5

40 – 49 8

50 – 59 12

60 – 69 20

70 – 79 10

80 – 89 5

Step-by-Step Solution:

1. Identify the modal class

⚫ The class with the highest frequency is 60–69 (frequency f₁ = 20).

2. Identify the values for the formula

⚫ L = Lower boundary of modal class = 59.5 (60 – 0.5)


⚫ f₀ = Frequency before modal class = 12
⚫ f₁ = Frequency of modal class = 20
⚫ f₂ = Frequency after modal class = 10
⚫ h = Class width = 10

3. Apply the formula

20 – 12
Mode = 59.5 + ( ) x 10
2 𝑥 20 – 12 – 10

8
= 59.5 + ( 40−22) x 10

= 59.5 + (0.444) x 10

= 63.94

4. Final Answer of the dataset is 63.94

32
IV. FUNDAMENTALS OF PROBABILITY

Learning Outcomes

By the end of this lesson, students should be able to:

1. Define probability and explain its importance in


real-world applications.
2. Describe sample space, events, and their
relationships.
3. Apply counting techniques in probability problems.
4. Use probability rules to compute probabilities of
different events.

4.1 Sample Space and Relationship Among Events

4.1.1 Probability is a measure of how likely an event is to occur.


It is expressed as a number between 0 and 1, where 0
means the event is impossible and 1 means the event is
certain.

4.1.2 Sample Space and Events.

• Sample Space (S) is the set of all possible outcomes of


an experiment.

Example : Tossing a coin --- S = [ Heads, Tails ]

• Events (E) is a subset of the sample space

Example : Getting Heads in a coin toss --- E = [Head]

4.1.3 Types of Events

• Mutually Exclusive Events. Events that cannot


happen at the same time.

Example : Rolling a die and getting a 3 or a 5 (Cannot


be both)

33
• Independent Events. Events where the outcome of
one does affect the other.

Example : Flipping a coin and rolling a die

• Complementary Events. Events where one event


occurring means the other cannot occur.

Example : If A is the event of rolling a 6, its complement


A’ is rolling anything except 6

4.2 Counting Rules Useful in Probability

4.2.1 Fundamental Counting Principles. If an event can occur


in m ways and another can occur in n ways, then the
total ways both can occur is : m x n

Example : If you have 3 shirts and 2 pants, the number


of outfits you can make : 3 x 2 = 6

4.2.2 Permutations ( Ordered Arrangements). The number of


ways to arrange n items when order matters is:
𝑛!
P(n,r) = (𝑛−𝑟)!

Example : Arranging 3 letters out of 5

5! 5! 5𝑥4𝑥3
P(5,3) = (5−3)! = 2! = = 30 ways
2𝑥1

4.2.3 Combination ( Selection without Order). The number


of ways to choose r items from n when order NOT
matter is :

𝑛!
C(n,r) = 𝑟! 𝑛−𝑟 )!
(

Example: How many number of ways in choosing 3


students from a group of 5.
5! 5! 5𝑥4𝑥3𝑥2𝑥1 5𝑥4
C(5,3) = 3!(5−3)! = 3!(2)! = 3𝑥2𝑥1(2𝑥1) =2𝑥1 = 10 ways

34
4.3 Rules of Probability

4.3.1 Probability of a Single Event. If an event E has favorable


outcomes and the sample space 5 has total outcomes,
then :
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑟𝑎𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
PE = 𝑇𝑜𝑡𝑎𝑙 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠

Example: Rolling a die and getting a 4


1
P(4) = 6

4.3.2 Addition Rule (for “OR” Events). If A and B are two


events, then :

P(A∪B) = P(A) +P(B) – P(A∩B)

Example: Rolling a die and getting a 3 OR an even


number
1 3
P(3) = 6 , P(even) = 6
P(3∩even) = 0
1 3 4 2
P(3∪even) = 6 + 6 = 6 = 3

4.3.3 Multiplication Rule (for “AND” Events). If A and B are


independent events, then :

P(A∩B) = P(A) x P(B)

Example: Tossing a coin and rolling a die


1 1
P(H) = 2 , P(5) = 6

1 1 1
P(H∩5) = 2 x 6 = 12

4.3.4 Complement Rule. The probability that an event does not


occur is :
P(A’) = 1 – P(A)

Example : If P(rain) = 0,3, then P(no rain) = 1-0.3=0.

35
V. DISCRETE PROBABILITY DISTRIBUTIONS
Learning Outcomes :

By the end of this lesson, students should be able to:

1. Define discrete random variables and construct their


probability distribution.
2. Interpret and compute cumulative distribution functions
3. Calculate expected values, variance and standard
deviation of discrete random variables.
4. Apply binomial and Poisson distribution in solving real
world problems.

5.1 Random Variables and Their Probability Distributions.

5.1.1Random Variable (RV): A variable whose values depend on


the outcomes of a random experiment.

5.1.1.1 Discrete Random Variables : This takes countable


values (e,g., 0,1,2….)

5.1.1.2 Continuous Random Variables: This takes


uncountable values within an interval

Probability Distribution Table


x(Number of Heads in 2 P(x)
Tosses)
0 0.25
1 0.50
2 0.25

Properties:
• 0 ≤ P(x) ≤ 1
• ∑ 𝑃(𝑥) = 1

Example: Two fair coins are tossed at the same time. Let the
random variable x represent the number of heads
that appear.
Requirement :
1. List the sample space
2. Define the random variable x
3. Construct the probability distribution x
4. verify that the distribution is valid
5. Compute the expected value E(x)

36
Solutions:

1. Sample Space (S) :

S = {𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇 }

There are 4 equally likely outcomes when tossing two


coins.

2. Define the random variable (x)

Let x = the number of heads observed


__________________________________
Outcome x(Number of Heads)

HH 2
HT 1
TH 1
TT 0
___________________________________

3. Probability Distribution Table:


We now determine the probability of each
value of x
x P(x)
0 1 outcome – TT – ¼ = 0.25
1 2 outcome – HT,TH- 2/4 =0.50
2 1 outcome – HH – ¼ = 0.25

Probability Distribution

P(x=0) = 0.25
P(x=1) = 0.50
P(x=2) = 0.25

4. Validity Check

• All probability are between 0 and 1


• Sum of all probabilities
= 0.25 + 0.50 + 0.25
= 1.0 (probability distribution is
valid)

37
5. Expected Value E(x)

E(x) = ∑ 𝑥 . P(x) =(0)(0.25)+1(0.50)+2(0.5)= 1.0

The expected number of heads when tossing two


coins is 1

5.2 Cumulative Distribution Functions (CDF). The cumulative


distribution function gives the probability that the
random variable x is less than or equal to a value x.

F(x) = P (x ≤ 𝑥)

Example CDF Table

x P(x) F(x) = P (x ≤ 𝑥)
0 0.25 0.50
1 0.50 075
2 0.25 1.00

5.3 Expected Values and Variance.

5.3.1 Expected value ( mean). This is E(x) = ∑ 𝑥 . P(x)

5.3.2 Variance and Standard Deviation:

Var(x) = ∑ 𝑥 2 . P(x) - [E(x)]2

Example : From the above table

E(x) = ∑ 𝑥 . P(x) =(0)(0.25)+1(0.50)+2(0.5)= 1.0

5.4 The Binomial Distribution. This is used when there is a fixed


number trials(n); two outcomes (success/failures);
independent trial and a constant probability of success
(p)
𝑛
P(x =r) =( 𝑟 )pr (1-p)n-r

Where : r = outcomes
n = fixed number of trials
p = constant probability of success

38
Example :

An engineer tests electronic components from a


production line. Based on past data, the probability that
a component is defective is 0.1 (or 10%). If the engineer
randomly selects 5 components, what is the probability
that exactly 2 components are defective?

Solution:

This is a binomial experiment because:

• There are a fixed number of trials: n=5n = 5n=5


• Each trial has only two outcomes: defective
(success) or not defective (failure)
• The probability of success (defective component)
is constant: p=0.
• The trials are independent

Using the Binomial Probability Formula


𝑛
P(x =r) =( 𝑟 )pr (1-p)n-r

Where: n = 5, r = 2, p= 0.1
5
P(x =2) =( )(0.1)2 (1-0.1)5-2
2

P(x= 2) = 0.07029

Answer: The probability that exactly 2 out of 5


components are defective is 0.0729 or 7.29%.

5.5 The Poisson Distribution

The Poisson Distribution is a discrete probability


distribution that models the number of times an event occurs in a
fixed interval of time, area, volume, or distance, given that the
events occur independently and at a constant average rate. Poisson
Distribution is used when you are counting number of events in a
fixed interval (time, area, volume). Also when the event occurs
randomly and independently.

39
Probability Mass Function (PMF):

𝑒 −𝜆 𝜆𝑘
P(X=k)=
𝑘!
Where:

• P(X=k): probability of observing k events in the interval


• λ: average number of occurrences in the
interval(mean)
• e: Euler’s number (approximately 2.71828)
• k: number of events (0, 1, 2, …)
• k! : factorial of k

Characteristics of Poisson Distribution

• Discrete: The variable takes on whole number values


(0, 1, 2, …)
• Events are independent
• The mean and variance of a Poisson distribution are
both equal to λ
• Appropriate when events happen rarely but at a
constant average rate

Example :

A machine produces metal parts, and on average, 2 defective


parts are found per hour. What is the probability that exactly
3 defective parts are found in an hour?

Given: λ=2 , k=3

Using the formula

𝑒 −𝜆 𝜆𝑘
P(X=k)=
𝑘!

𝑒 −2 23 (0.1353)(8)
P(x =k) = = = 0.1804
3! 6

So, there’s an 18.04% chance of finding exactly 3 defective


parts in one hour.

40
VI. TEST OF HYPOTHESIS FOR A SINGLE SAMPLE
Learning Outcomes :

By the end of this lesson, students should be able to:

1. define hypothesis testing, differentiate between null and


alternative hypotheses, and explain the significance of Type I and
Type II errors in engineering decision-making.
2. apply the Z-test and t-test to compare population means,
determine when to use each test, and interpret the results in
engineering and scientific studies.
3. use Analysis of Variance (ANOVA) to compare means across
multiple groups, assess statistical significance, and apply the
method in engineering experiments and quality control.
4. perform the Chi-square test to analyze categorical data, test for
independence and goodness-of-fit, and interpret results in real-
world engineering applications.

6.1 Hypothesis Testing

In engineering, making data-driven decisions is


essential—whether it's improving product quality, testing new
designs, or evaluating system performance. Hypothesis
testing is a statistical method that allows engineers to make
objective decisions based on sample data. It provides a
structured way to determine whether observed results are due
to random chance or if they reflect true differences or effects.

What is Hypothesis Testing?

Hypothesis testing is a process of making inferences or


judgments about a population parameter based on sample
data. It involves formulating two competing hypotheses and
using statistical evidence to decide which one is more likely
to be true.

Null Hypothesis (H0) vs. Alternative Hypothesis (H1)

• The null hypothesis (H0) is a statement of no effect,


no difference, or status quo. It assumes that any
observed variation is purely due to chance.

41
Example: The mean tensile strength of steel rods is
500 MPa.

• The alternative hypothesis (H1) is a statement that


contradicts the null hypothesis. It suggests that there is
an effect or a significant difference.

Example: The mean tensile strength is not 500 MPa


(i.e., it has changed due to a new
manufacturing process).

In hypothesis testing, we assume H0 is true and use


statistical evidence to decide whether we should reject it in
favor of H1.

Types of Errors in Hypothesis Testing

Since decisions are based on sample data (not the


entire population), there's always a risk of making a wrong
conclusion. These risks are classified into Type I and Type II
errors:

Type I Error (α\alphaα): Rejecting the null hypothesis when


it is actually true.

Implication: In engineering, this could mean rejecting a


reliable design or manufacturing process based
on misleading sample data.

Example: You conclude that a new batch of materials is


defective when it is actually within acceptable
limits.

Type II Error (β\betaβ): Failing to reject the null hypothesis


when the alternative is actually true.

Implication: This could result in missing a real problem,


such as accepting a weak material that should
have been rejected.

Example: You accept that two machines produce the same


quality output when one is actually
underperforming.

42
Z-Test in Hypothesis Testing
A Z-test is a statistical method used to determine
whether there is a significant difference between sample data
and a population parameter (mean or proportion), or between
two population means/proportions, when the population
variance is known or the sample size is large (n ≥ 30).

When to use Z-test

Use Z-test if all of these conditions are met:

1. The population standard deviation (σ) is known.


2. The sample size is large (n ≥ 30), or the population is
normally distributed.
3. The data are quantitative and randomly sampled.

Types of Z-test

▪ One-sample Z-test for mean


▪ Two-sample Z-test for comparing means
▪ Z-test for proportions (one or two samples)

One-sample Z-test for mean formula :


𝑋̅ − 𝜇
Z= 𝜎
√𝑛

Where:

• 𝑋̅ = sample mean
• μ = population mean
• σ = population standard deviation
• n = sample size

Example :

A mechanical engineer claims the average thickness of a


machine part is 10 mm. A sample of 36 parts has a mean
thickness of 9.6 mm. The population standard deviation is 1.2
mm. At a 0.05 level of significance, test the claim.

Given : x
̅ = 9.6 mm, n = 36
μ = 10 mm
σ = 1.2 mm

43
Solution:

Step 1. Hypothesis testing

Null hypothesis, Ho: 𝜇 = 10

Alternative hypothesis, Ha: 𝜇 ≠ 10

Step 2. Compute the Z-statistics using the formula:

𝑋̅− 𝜇
Z= 𝜎
√𝑛

9.6−10 −0.4
Z= 1.2 = 0.2
√36

Z = - 2.0

Step 3. Critical Value (𝜶= 0.05, two-tailed)

Zcritical = ± 1.96

Step 4. Conclusion

Since Z = - 2.0 < 1.96, reject Ho

Interpretation: The average thickness is significantly


different from 10 mm.

Two-sample Z-test for comparing mean formula:

̅̅̅1̅+ ̅̅̅
𝑥 𝑥2̅
Z=
𝜎 2𝜎 2
√ 1+ 2
𝑛1 𝑛2

Where: ̅̅̅̅
𝑋1 = mean of sample 1
̅̅̅̅ = mean of sample 2
𝑋2
𝜎1 = population standard deviation of sample 1
𝜎2 = population standard deviation of sample 2
n1 = sample size of the first sample
n2 = sample size of the second sample

44
Example :

Two different suppliers provide aluminum rods used in


aircraft construction. An engineer wants to compare the
average tensile strength of rods from Supplier A and
Supplier B.

A random sample is taken from each supplier:

Supplier A:

Sample size (n1) = 40


Sample mean ( ̅̅̅̅
𝑋1) = 310 MPa
Population standard deviation (𝜎1 ) = 15 MPa

Supplier B:

Sample size (n2) = 40


Sample mean ( ̅̅̅̅
𝑋2) = 310 MPa
Population standard deviation (𝜎2 ) = 15 MPa

At a 0.05 level of significance, test if there is a significant


difference in the mean tensile strengths.

Solution :

Step 1 : State the hypothesis

Null hypothesis, H0: μ1=μ2 (no difference in mean tensile


strengths)
Alternative hypothesis, Ha: μ1≠μ2 (there is a difference) — two-
tailed test

Step 2. Use the formula :

̅̅̅1̅+ 𝑥
𝑥 ̅̅̅2̅
Z=
𝜎2 𝜎2
√ + 2
1
𝑛1 𝑛2

Substitute the values:

45
310+ 305 5
Z=
2 2
=Z=
225 324
√15 +18 √ +
40 50 40 50

5 5 5
Z= = =
√5.625+6.48 √12.105 3.48

Z ≈ 1.44

Step 3. Critical Value

For a two-tailed test at α = 0.05, the critical Z-values are:

Zcritical = ± 1.96

Step 4. Conclusion

Since Z=1.44 is within the range of −1.96< Z <1.96, we


fail to reject the null hypothesis.

There is no significant difference in the average


tensile strength of aluminum rods from the two suppliers at the
0.05 significance level.

Z- test for Proportions (one or two samples)

Z – test for Proportions Formula:

𝑝̆−𝑝
Z=
𝑝(1−𝑝)

𝑛

Where : 𝑝̆ = sample proportion


p = hypothesized population proportion
n = sample size

Example :

An electronics manufacturer claims that no more than 5% of


products are defective. In a recent batch of 200 items, 15 were
found defective. Test the claim at the 0.05 level.

46
Solution :

Step 1. Hypotheses testing

Null hypothesis, Ho: p = 0.05

Alternative hypothesis, Ha: p > 0.05

Step 2. Compute the Z-statistic

Using the formula :

𝑝̆−𝑝
Z=
𝑝(1−𝑝)

𝑛

15
𝑝̆ = 200 = 0.075

0.075−0.05 0.025
Z= = ≈ 1.62
0.05(1−0.05) 0.0154

200

Step 3. Critical Value (right-tailed test, α = 0.05)

Zcritical = 1.645

Step 4. Conclusion

Since Z=1.62 < 1.645 fail to reject Ho

Interpretation: There is no significant evidence that the


defect rate exceeds 5%.

t-Test in Hypothesis Testing


In many engineering applications, professionals make
inferences about a population based on sample data. When
the population standard deviation is unknown and the sample
size is small (n < 30), the t-Test is used. The t-Test is a
valuable statistical tool in engineering for determining if
differences in sample means are statistically significant.
Choosing the correct type of t-Test and following the

47
hypothesis testing procedure ensures reliable decisions
based on sample data.

Types of t-Test

1. One-Sample t-Test:

Used to compare the sample mean to a known or


hypothesized population mean.

One -Sample t-test formula:

𝑥̅ −𝜇
t= 𝑠
√𝑛

where:

𝑥̅ : sample
mean
μ: hypothesized population mean
s: sample standard deviation
n: sample size

Example :

A quality control engineer is testing whether a new type of


cement mix meets the required compressive strength of 30
MPa. A random sample of 10 concrete cylinders yielded the
following compressive strengths (in MPa):

28.5, 30.2, 29.8, 27.9, 31.1, 30.0, 29.5, 30.3, 28.9, 29.7

At α = 0.05, can we conclude that the average compressive


strength of the new mix is different from the required 30 MPa?

Solution:

Step 1. State the hypothesis

Null Hypothesis (H₀): μ = 30 (The population mean is


30MPa)

Alternative Hypothesis (H₁): μ ≠ 30 (The population mean


is not 30 MPa)

48
This is a two-tailed test.

Step 2. Compute the Test Statistics

Using the formula :

𝑥̅ −𝜇
t= 𝑠
√𝑛

Step 2.1 Find the sample mean 𝑥̅

28.5+30.2+29.8+27.9+31.1+30.0+29.5+30.3+28.9+29.7
𝑥̅ =
10
𝑥̅ = 29.6

Step 2.2. Find the sample standard deviation

First compute the squared difference

x x – 𝑥̅ (x – 𝑥̅ )2
28.5 -1.1 1.21
30.2 0.6 0.36
29.8 0.2 0.04
27.9 -1.7 2.89
31.1 1.5 2.25
30.0 0.4 0.16
29.5 -0.1 0.01
30.3 0.7 0.49
28.9 -0.7 0.49
29.7 0.1 0.01
total 7.91

𝑥)2
∑(𝑥− ̅̅̅ 7.91 7.91
s2 = 𝑛−1
= 10−1 = 9
= 0.879

s = √0.879 = 0.938
Step 2.3 Compute t-statistics

𝑥̅ −𝜇 29.6−30 −0.4
t= 𝑠 = 0.938 = 0.2967 = -1.348
√𝑛 √10

49
Step 3. Determine the Critical Value

Degrees of freedom:

df = n – 1 = 10 – 1 = 9

From the t-table, the critical t-value for a two-tailed test at


α = 0.05 and df = 9 is:

tcritical = ± 2.262

Step 4. Decision Rule

If t<−2.262 or t >2.262, reject H₀.

The computed t = –1.348 lies within the acceptance region


(–2.262 < t < 2.262), so: we fail to reject H0

Step 5. Conclusion

At a 5% level of significance, there is not enough evidence


to conclude that the average compressive strength of the
new cement mix is different from 30 MPa.

2. Independent Two- Sample Test:

This is used when comparing the means of two unrelated


groups

Independent Two-Sample Test Formula(Equal Variance)


𝑥̅ 1 + 𝑥̅ 2
t= 1 1
√𝑠𝑝2 (𝑛 +𝑛 )
1 2

where 𝑠𝑝2 is the pooled variance

(𝑛1−1)𝑠12+(𝑛2−1)𝑠22
𝑠𝑝2 = 𝑛1 +𝑛2 −2

50
Example :

An engineer wants to determine if two different welding


methods produce significantly different tensile strengths of
steel joints. Two independent samples were tested:

Method A Tensile Method B Tensile


Strength (MPa) Strength (MPa)
552 545
548 538
563 549
559 541
549 536
555 542
n1 = 6 n2 = 6

Assume that the population variances are equal, and test


at α = 0.05 if the two methods produce significantly different
results.

Solution :

Step 1. State the Hypotheses

H0 (null): 𝜇1 = 𝜇2 (The two welding methods produce the


same mean tensile strength)

Ha (alternative) : 𝜇1 ≠ 𝜇2 ( The mean tensile strength are


different)

Step 2. Compute the Test statistics

We will use the formula for the t-statistics with pooled


variance.

𝑥̅ 1 + 𝑥̅ 2
t= 1 1
√𝑠𝑝2 (𝑛 +𝑛 )
1 2

where 𝑠𝑝2 is the pooled variance

(𝑛1−1)𝑠12+(𝑛2−1)𝑠22
𝑠𝑝2 = 𝑛1 +𝑛2 −2

51
Method A Tensile Method B Tensile
Strength (MPa) Strength (MPa)
552 545
548 538
563 549
559 541
549 536
555 542
∑ 𝐴= 3,326 ∑ 𝐵= 3,251

𝑥𝑎 = 554.33
̅̅̅ 𝑥𝑏 = 541.83
̅̅̅

Find the squared deviation from the mean

Method A Method B
x1 (x1 – ̅̅̅)
𝑥𝑎 (x1 – ̅̅̅)
𝑥𝑎 2 x2 ̅̅̅)
(x2 -𝑥 𝑏 ̅̅̅)
(x2 -𝑥 𝑏
2

552 - 2.33 5.43 545 3.17 10.04


548 - 6.33 40.07 538 - 3.83 14.67
563 8.67 74.17 549 7.17 51.41
559 4.67 21.81 541 - 0.83 0.69
549 - 5.33 28.41 536 - 5.83 33.99
555 0.67 0.44 542 0.17 0.0289
Total 170.33 110.83

𝑥)2
∑(𝑥− ̅̅̅ 170.33
𝑠12 = = = 34.066
𝑛−1 6−1

𝑠1 = 5.836

𝑥)2
∑(𝑥− ̅̅̅ 110.83
𝑠22 = = = 22.166
𝑛−1 6−1

𝑠2 = 4.708

Compute the pooled variance using the formula

(𝑛1−1)𝑠12+(𝑛2−1)𝑠22
𝑠𝑝2 = 𝑛1 +𝑛2 −2

(6−1)(34.066)+(6−1)(22.166) 5(34.066)+5(22.166)
𝑠𝑝2 = =
6+6−2 10

52
170.33+110.83
𝑠𝑝2 = = 28.116
10

Compute the t-statistics for two-sample test using:

𝑥̅ 1 + 𝑥̅ 2 554.33− 541.83
t= 1 1
= 1 1
√𝑠𝑝2 (𝑛 +𝑛 ) √28.116 ( + )
1 2 6 6

12.5
t = 3.061 = 4.083

Step 3. Determine the critical value

Degree of Freedom, df = n1 + n2 – 2 = 6 + 6 – 2 = 10

From t-table, critical value at 𝛼 = 0.05 (two tailed) and df =10:

tcritical = ± 2.228

Step 4. Decision Rule:

Since the computed t-value 4.083 > 2.228, we reject the null
hypothesis.

Step 5. Conclusion

There is sufficient evidence at the 5% level of significance to


conclude that the two welding methods produce significantly
different tensile strengths.

Analysis of Variance (ANOVA) in Hypothesis Testing

Analysis of Variance (ANOVA) is a statistical method


used to compare the means of three or more independent
groups to determine if at least one group mean is significantly
different from the others.

In engineering, ANOVA is commonly used in quality


control, process optimization, and design of experiments to
analyze how different factors or treatments affect outcomes.

ANOVA is used when you want to compare three or


more group means; the data are quantitative (numerical) and
53
collected from independent samples, and the populations are
assumed to be normally distributed with equal variances.

Types of ANOVA

Type Purpose Example in


Engineering
One-Way ANOVA Tests differences among Testing if the mean
means of one factor strength of a material
(independent variable) differs across three curing
temperatures
Two-Way ANOVA Tests the effects of two Studying how machine
factors and their type and operator skill
interaction affect production output
Repeated Measures Used when the same Measuring engine
ANOVA subjects are tested under emissions before, during,
different conditions and after a modification

For most undergraduate engineering courses, One-Way ANOVA is the


primary focus.

One-Way ANOVA

Let’s say we want to compare the mean breaking


strength of a wire across 3 different manufacturers.

The core idea is to partition the total variability in the data


into two parts:

• Between-group variability: How different the group means


are from the overall mean
• Within-group variability: The variability among observations
within each group

Total Variation=Variation Between Groups+Variation Within


Groups
We then form an F-ratio

𝑴𝒆𝒂𝒏 𝑺𝒒𝒖𝒂𝒓𝒆 𝑩𝒆𝒕𝒘𝒆𝒆𝒏 𝑮𝒓𝒐𝒖𝒑𝒔 (𝑴𝑺𝑩)


F= 𝑴𝒆𝒂𝒏 𝑺𝒒𝒖𝒂𝒓𝒆 𝑾𝒊𝒕𝒉𝒊𝒏 𝑮𝒓𝒐𝒖𝒑𝒔 (𝑴𝑺𝑾)

If the F-ratio is large, it suggests that the group means are not all
equal, indicating a statistically significant difference.

54
Example :

A materials engineer wants to know whether the mean


tensile strength (in MPa) of a metal specimen differs when
subjected to three different heat treatment temperatures.
Four specimens are tested at each temperature.

The Tensile strength data (MPa)

Temperature A Temperature B Temperature C


72 80 77
75 82 76
78 79 78
74 81 75

Test at α = 0.05 whether there is a statistically significant


difference in mean tensile strength among the three
temperatures.

Solution:

Step 1. State the Hypothesis

Null hypothesis(Ho) : All group means are equal

𝜇𝐴 = 𝜇𝐵 = 𝜇𝐶

Alternative hypothesis (Ha): At least one group mean is


different

Step 2: Compute Group Means and Grand Mean

Mean of Temperature A :
72+75+78+74
𝑥̅ A = = 74.75
4

80+82+79+81
𝑥̅ B = = 80.50
4

77+76+78+75
𝑥̅ C = = 76.50
4

55
Grand Mean :

72+75+78+74+80+82+79+82+79+81+77+76+78+75
𝑥̅ = 12

927
𝑥̅ = = 77.25
12

Step 3: Compute Sum of Squares

Between-Groups Sum of Squares (SSB):

SSB = ∑𝑘𝑖=1 𝑛𝑖 ( 𝑥̅ I - 𝑥̅ )2 =4[(74.74 − 77.25)2 +(80.8 - 77.25)2+ ( 76.50-77.25)2]

= 4[(2.5)2 +(3.25)2 + (0.75)2]

SSB = 4(17.375) = 69.5

Within Groups Sum of Squares (SSW)

Compute deviations within each group

For A : (72-74.75)2 + ( 75 – 74.75)2 + (78 – 74.75)2 + (74 -74.75)2

= 7.6875

For B : (80 -80.5)2 + (82 – 80.5)2 + (79 – 80.5)2 + (81 – 80.5)2

= 11.25

For C : (77 – 76.5)2 + ( 76 – 76.5)2 + (78 – 76.5)2 + (75 -76.5)2

= 9.8125

SSW = 7.6875 + 11.25 + 9.8125 = 28.75

Total Sum of Squares (SST):

SST = SSB + SSW = 69.5 + 28.75

SST = 98.25

Step 4: Degrees of Freedom

Between groups dfB = k – 1 = 3-1 = 2


Within groups dfw = N – k = 12 – 3 = 9
Total dft = N – 1 = 12-1 = 11

56
Step 5: Mean Squares

Mean Square Between (MSB)

𝑆𝑆𝐵 69.5
MSB = = = 34.75
𝑑𝑓𝐵 2

Mean Square Within (MSW)

𝑆𝑆𝑊 28.75
MSW = = = 3.19
𝑑𝑓𝑤 9

Step 6: Compute the F-Statistic/ ANOVA


𝑀𝑆𝐵 34.75
F = 𝑀𝑆𝑊 = = 10.88
3.19

Step 7: Decision Rule

Degree of Freedom: dfB = 2, dfw = 9

At 𝛼 = 0.05, the critical F (from F-distribution table) is


approximately Fcritical =4.256

Since Fcomputed = 10.88 is greater than Fcritical= 4.256, we have to


reject the null hypothesis.

Step 8: Conclusion

There is sufficient evidence at the 5% significance level to


conclude that not all mean tensile strengths are equal; such that, heat
treatment temperature has a statistically significant effect on tensile
strength.

ANOVA Summary Table

Source of SS df MS F
Variation
Between Groups 69.50 2 34.75
Within Groups 28.75 9 3.19 c10.88
Total 98.25 11

57
Chi- Square Test in Hypothesis Testing

The Chi-square (χ²) test is a non-parametric statistical


method used to determine whether there is a significant
difference between observed frequencies and expected
frequencies in categorical data. It is especially useful when
dealing with qualitative (nominal or ordinal) variables
rather than numerical measurements.

In engineering applications, the Chi-square test can help


evaluate:

• Goodness-of-Fit: How well an observed frequency


distribution matches a theoretical or expected distribution.
This used to determine if the distribution of a categorical
variable matches a hypothesized distribution. For instance:
testing whether defects in manufactured parts occur equally
across different machine shifts.

Formula:

(𝑂𝑖 −𝐸𝑖 )2
X2 = ∑
𝐸𝑖

Where : Oi = observed frequency for category i


Ei = expected frequency for category i

• Test for Independence: Whether two categorical variables


are associated or independent. For instance: testing
whether the type of welding method is independent of defect
occurrence.

Formula :

(𝑂𝑖𝑗−𝐸𝑖𝑗 )2
X =∑
2
𝐸𝑖𝑗

Where : Oij = Observed frequency in cell i , j of the


contingency table
(𝑅𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 𝑥 𝐶𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙)
Eij = 𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙

58
Example: Goodness-of-fit

A quality engineer wants to know if defects are equally


distributed across three production lines, at 5% level of
significance.

Production lines Observed defects(O) Expected defects


A 18 20
B 22 20
C 20 20

Solution :

Step 1. State the hypothesis

Null hypothesis (Ho): The defects are equally distributed


among the three production lines.

Alternative hypothesis(Ha): The defects are not equally


distributed among the three
production lines.

Step 2. Set the significance level.

𝜶 = 0.05

Step 3. Compute the expected frequencies

It is stated in the problem that the defects is expected to


be equally distributed. Since there are total of 60 observed
60
defects, therefore; 3 𝑙𝑖𝑛𝑒𝑠 = 20 is the expected frequency(refer to
table above).

Step 4. Calculate the Chi-square using the formula:

(𝑂𝑖 −𝐸𝑖 )2
X2 = ∑
𝐸𝑖

(18−20)2 (22−20)2 (20−20)2


X2 = + +
20 20 20

4 4
X2 = 20 + 20 + 0

X2 = 0.40

59
Step 5. Determine the degree of freedom

df = k-1 = 3-1= 2

Step 6. Find the critical value

From the Chi-square distribution table with df =2 and


𝛼=0.05

X2citical = 5.991

Step 7. Decision Rule

Since X2computed = 04 is less than X2citical = 5.991, fail to reject Ho,


therefore defects are equally distributed among the three production
lines.

Example- Test of Independence

An engineer investigates whether the type of welding


method is related to defect occurrence.

Welding Defective Non-defective Total


Method
A 12 18 50
B 8 42 50
Total 20 80 100

Step 1. State the Hypothesis

Null hypothesis (Ho): The type of welding method is


independent of defect occurrence.

Alternative hypothesis(Ha) : The type of welding is not


independent of defect occurrence or
there is an association between
welding method and defect
occurrence.

Step 2. Set the significance level.

𝛼 = 0.05

Step 3. Compute the expected frequencies.


60
Using the formula:
𝑅𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 𝑥 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙
E= 𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙

50 𝑥 20
For defective: E = = 10
100

50 𝑥 80
For non-defective: E = = 40
100

Step 4. Calculate the Chi-square using the formula:

(𝑂𝑖𝑗−𝐸𝑖𝑗 )2
X2 = ∑
𝐸𝑖𝑗

(12−10)2 (38−40)2 (8−10)2 (42−40)2


X2 = + + +
10 40 10 40

X2 = 0.4 + 0.1 + 0.4 +0.1 = 1.0

Step 5. Determine the degree of freedom

df = (r-1)(c-1), where r=rows, c=columns

df = (2-1)(2-1) =1

Step 6. Find the critical value

From the Chi-square distribution table with df =1 and


𝛼=0.05, X2critical= 3.841

Step 7. Decision Rule

Since X2computed = 1 is less than X2citical = 3.841, fail to


reject Ho, therefore welding methods and defect are
independent or welding method is independent of defect
occurrence.

61
VII. REGRESSION ANALYSIS
Learning Outcomes :

By the end of this lesson, students should be able to:

1. Define the purpose of regression analysis in engineering


context
2. Differentiate between simple and multiple linear regression
models.
3. Compute the regression coefficient(intercept and slope) for a
simple linear regression from raw data.
4. Interpret the meaning of the slope, intercept, correlation
coefficient (r) and coefficient of determination (R2) in real
engineering scenario.
5. Apply multiple linear regression to model a dependent
variable using two or more predictors

7.1 Regression Analysis

In engineering, we often encounter situations where one


variable depends on another. Regression analysis is a
statistical method used to examine the relationship between
a dependent variable (response) and one or more
independent variables (predictors). This helps engineers
predicts future values; understand the strength and direction
of relationships and optimize processes and improve design.

Example in Engineering:

• Predicting tensile strength from material composition


• Estimating fuel consumption based on vehicle speed
• Relating temperature to electrical resistance in a
component

7. 2 Type of Regression

• Simple Linear Regression- This consists one


independent variable and one dependent variable. A
mathematical equation that allows us to predict values
of dependent variable from known value one
independent variable is called regression equation.

𝑦̂ = bo + b1x

62
Where :

𝑦̂ = predicted value of y
bo= intercept
b1 = slope
x = independent variable

• Multiple Linear Regression. This is an extension of


simple linear regression that allows the modeling of a
dependent variable using two or more independent
variables. In engineering applications, many factors
often influence a response simultaneously, and
analyzing them together provides a more realistic and
accurate model. The general form of a multiple linear
regression equation is

𝑦̂ = b0 + b1x1 + b2x2 + ……bkxk

where 𝑦̂ is the predicted value of the dependent


variable, b0 is the intercept, b1,b2,…,bk are the
regression coefficients that represent the change in 𝑦̂
for a one-unit change in the corresponding predictor
variable x1 ,x2 ,…,xk holding the other predictors
constant. This “holding other variables constant”
property is important because it enables engineers to
assess the individual effect of each factor while
controlling for the influence of the others. For example,
in predicting the heat loss from a pipe, multiple
regression can be used to analyze the combined effects
of temperature difference, insulation thickness, and pipe
diameter. The strength of the model is measured by the
coefficient of determination (R2), which indicates the
proportion of variation in the dependent variable
explained by all predictors together. Multiple regression
also provides p-values for each coefficient, allowing
hypothesis testing to determine which factors
significantly affect the response. When applied
correctly, multiple linear regression is a powerful tool for
engineering decision-making, enabling accurate
predictions, optimization of designs, and identification of
key process variables.

63
Example- Simple Linear Regression

An engineer measures the load (in KN) to steel bar and


the resulting elongation(in mm). The goal is to develop a
predative model.

Load (KN) x Elongation (mm) y


10 0.21
20 0.45
30 0.69
40 0.92
50 1.15

Step 1. Compute the means


10+20+30+40+50
𝑥̅ = 5
= 30 , 𝑦̅ = 0.21+0.45+0.69+0.92+1.15
5
= 0.684

Step 2. Compute regression coefficients

Slope:
∑(𝑥𝑖 − 𝑥̅ )( 𝑦𝑖 − 𝑦̅)
b1= ∑(𝑥𝑖 −𝑥̅ )2

b1 = (10−30)(0.21−0.684)+(20−30)(0.45−0.684)+(30−30)(0.69−0.684)+(40−30)(0.92−0.684)+(50−30)(1.15−0.684)_
(10−30)2 +(20−30)2 +(30−30)2 +(40−30)2 +(50−30)2

b1 = 0.0235

Intercept:

b0 = 𝑦̅ – b1𝑥̅ = 0.684 – 0.0235(30)

b0 = - 0.021

So, the regression equation is :

̂ = - 0.021 + 0.0235x
𝒚

Step 3. Strength of linear relationship

Correlation Coefficient (r): In computing the r we use the


Pearson correlation coefficient formula :
∑(𝑥𝑖 −𝑥̅ )(𝑦𝑖 −𝑦̅)
r=
√∑(𝑥𝑖 −𝑥̅ )2 .∑(𝑦𝑖 −𝑦̅)2

64
Means : 𝑥̅ =30, 𝑦̅ = 0.684

xi y1 (𝑥𝑖 − 𝑥̅ ) (𝑦𝑖 − 𝑦̅) (𝑥𝑖 − 𝑥̅ )2 (𝑦𝑖 − 𝑦̅)2


10 0.21 -20 -0.474 400 0.224676
20 0.45 -10 -0.234 100 0.054756
30 0.69 0 0.006 0 0.000036
40 0.92 10 0.236 100 0.055696
50 1.15 20 0.466 400 0.217156
1000 0.55232

∑(𝑥𝑖 − 𝑥̅ ) . ∑(𝑦𝑖 − 𝑦̅) = 23.5

Applying the above formula for r:


23.5 23.5 23.5
r= = = 23.515 = 0.99994
√(1000)(0.55232) √ 552.32

Computing R2

For simple linear regression: R2 = r2

R2 = ( 0.99994)2

R2 = 0.99987

This means that 99.987% of the variation in elongation


is explained by the variation in load.

Step 4. Engineering interpretation

• r close to + 1: Very strong positive linear relationship.


This means that as load increases, elongation increases
proportionally.
• R2 near 1: The regression line fits the almost perfectly.

65
This Figure explains the simple linear regression of elongation
versus load. The red line is the best-fit regression line 𝒚 ̂
=−0.021+0.0235x, showing a strong positive linear relationship. The
Pearson correlation coefficient r≈0.99994 quantifies the strength and
direction of the linear association, and R2≈ 0.99987 indicates that about
99.987% of the variability in elongation is explained by the applied load.

66
LEARNING EXERCISES
Chapter 1

1. In your own words, define statistics and explain why it is important in


everyday decision making.

2. Give three real-life examples where statistical analysis is applied in your


community.

3. Identify whether each statement refers to Descriptive or Inferential statistics.

a. A survey found that 70% of residents in a town own motorcycle

b. Based on a sample, predicting that the average monthly income of all


farmers is P10,500.

c. A graph showing the monthly rainfall for the past year.

d. A company tests a product in 3 stores and concludes it will sell well


nationwide.

4. Classify each as Continuous, Discrete, Nominal or Ordinal variables and


create two examples of your for each type of variable

a. Body temperature of patients

b. Number of Engineering graduates in 2024

c. Religion of survey respondents

d. Clothing size( S,M,L,XL)

e. Rainfall in millimeters

5. Draw a simple flowchart categorizing variables into Quantitative/Qualitative


and their subtypes

Chapter II

1. Match each scenario to the most appropriate data collection method

a. PSA conducts a national census.

b. A teacher records the behavior of students during group activities.

c. A food company tests a new recipe in two different cities before


launching nationwide.

67
d. Researchers gather the opinions of 8 farmers about new irrigation
system.

e. Customers answer an online questionnaire about a store’s service.

2. From the following sources, identify if the data is Primary or Secondary:

a. Reading a newspaper article about an election

b. Interviewing a barangay captain about local projects.

c. Using PSA records for birth rates in a province.

d. Conducting your own traffic survey.

3. A researcher wants to know the opinion of residents on building a new


public market. There are 5,000 households in the municipality. If the
desired margin of error is 5%, compute the sample size using Slovin’s
formula.

4. Identify the sampling technique used in the following situations:

a. Selecting students by drawing lots.

b. Picking every 5th person from the list.

c. Selecting equal numbers of males and females from each college


departments.

d. Choosing three barangays at random and surveying all households in


them.

e. Interviewing only friends and classmates.

5. Given the dataset below, construct a frequency distribution table using the
steps provided in the lesson.

15, 18, 20, 22, 25, 25, 27, 28, 30, 31

16, 18, 19, 21, 23, 24, 26, 28, 29, 32

6. Decide which type of graph (Bar graph, histogram, Pie chart, Line graph) is
most appropriate for each situation:

a. Showing the monthly electricity consumption of a household for 1 year.

b. Comparing the number of male and female students in intervals of 10.

68
c. Showing the percentage share of different types of transport used by
employees.

7. Using the data below, draw a bar graph

Barangay Number of
Household
San Jose 120
Del Pilar 95
Mabini 85
San Roque 60

8. The scores of 40 students in a mathematics test are

45, 56, 67, 48, 90, 72, 65, 59, 82, 78

91, 66, 47, 64, 68, 74, 80, 87, 92, 55

60, 70, 62, 77, 85, 95, 40, 58, 63, 75

52, 61, 83, 69, 71, 88, 50, 57, 53, 81

Construct a histogram using an appropriate class interval.

9. The daily sales (in pesos) of a store for one week are:

Monday : 1,200 Friday : 2,000


Tuesday : 1,450 Saturday : 2,200
Wednesday : 1,800 Sunday : 1,750
Thursday : 1,650

Plot these data on a line graph and describe the trend.

Chapter III

1. A survey of monthly allowances (in pesos) of 40 college students produced


the following distributions:

Allowance Interval Frequency (f)


500 - 999 6
1,000 – 1,499 9
1,500 – 1,999 12
2,000 – 2,499 8
2,500 – 2,999 5

a. Identify the modal class


b. Compute the mode using the formula

2. The following are the daily sales (in pesos) of a food stall over 7 days: 1,500;
1,800; 1750; 2,000; 1,900; 1,600; 15,000.

69
a. Compute the mean sales
b. Compute the median sales
c. Which measures better represents the typical sales and why?

3. A researcher recorded the ages of 20 farmers in a rural community.

a. Organize the data into a grouped frequency table using a class


width of 5
b. Compute the mean, median and mode.
c. Interpret which measures best describes the central age of the
farmers.

4. Explain why the main is more affected by outliers than the median. Give an
example.

5. In what situations is the mode the most appropriate measure of central


tendency? Provide two real-life example.

Chapter IV

1. Define the following terms:


a. Sanple space
b. Event
c. Mutually Exclusive event
d. Independent event
e. Complementary event

2. List the sample space for each of the following


a. Tossing 2 coins
b. Rolling a die
c. Drawing a card from a standard deck (no joker)

3. Determine if the following pairs of events are mutually exclusive,


independent, or complementary.
a. Drawing a heart and drawing a red card from a deck
b. Rolling a 4 and rolling an even number on a die
c. Event A: Tossing a head, Event B: Tossing a tail
d. A: Student passes, A: Student fails

4. A menu offers 3 types of burgers and 4 types of drinks.How many different


meals can be formed consisting of one burger and one drink?

5. A password is made of 4 letters followed by 2 digits. How many different


passwords are possible if:
a. Repetition is allowed
b. Repetition is not allowed

6. Find the number of permutations:


a. Arranging 4 students in a line
70
b. Choosing and ordering 3 books from a shelf of 6.

7. Find the number of combinations:


a. Choosing 3 students from a group of 5
b. Choosing 2 toppings from 6 available pizza toppings

8. A card is drawn from a standard 52-card deck. Find the probability of :


a. Drawing an Ace
b. Drawing a red card or a face card
c. Not drawing a spade

9. A bag contains 4 red balls and 6 blue balls. One ball is drawn:
a. What is the probability of drawing a red ball?
b. What is the probability of not drawing a red ball
c. If two balls are drawn with replacement, what is the probability that
both are blue?

10. A die is rolled. Let A be the event of getting an odd number, and B be the
event of getting a number greater than 3. Find :
a. P(A), P(B)
b. P(A∩ B), P(A∪ B)

Chapter V

1. Two dice are rolled. Let the random variable x be the sum of the numbers
on the two dice.
a. List the sample space
b. Determine the possible value of x
c. Construct a probability distribution table for x = 2 to x = 12.
d. Verify if the distribution is valid

2. A coin is tossed 3 times. Let x be the number of heads.


a. List the sample space
b. Create the probability distribution of x
c. Compute e(x)

3. Given the following probability distribution of x;

x P(x)
0 0.2
1 0.3
2 0.4
3 0.1

a. Construct the cumulative distribution function table


b. Find F(2)
c. Interpret what F(2) means in the context of probability.

71
4. The probability distribution of the number of defective bulbs in a box of 3 is
:

x P(x)
0 0.5
1 0.3
2 0.1
3 0.1

a. Compute the expected value E(x)


b. Compute the variance and standard deviation of x.

5. A coin is tossed 5 times. What is the probability of getting;


a. Exactly 2 heads
b. At most 2 heads
c. At least 1 head

6. In a class, 60% of students bring their own calculator. If 8 students are


randomly selected, what is the probability that ;
a. Exactly 5 bring a calculator
b. All 8 bring a calculator
c. None bring calculator

7. A small bakery receives an average of 3 phone orders per hour. What is the
probability that;
a. Exactly 2 phone order are received in one hour.
b. More than 2 phone orders are received
c. No orders are received in an hour.

8. A call center receives on average 5 complaints per day. Use the Poisson
distribution to compute the probability that :
a. Exactly 3 complaints are received
b. Fewer than 3 complaints are received
c. At least 1 complaints is received

Chapter VI

1. The average tensile strength of a type of steel rod is claimed to be


500MPa. A sample of 36 rods has a mean strength of 492 MPa. The
population standard deviation is known to be 18 MPa.

a. At 5% level of significance, test the claim that the average tensile


strength is still 500 MPa.
b. State the null and alternative hypothesis
c. Compute the test statistics
d. Draw your conclusion

72
2. A researcher believes that the average weekly working hours of engineers
is more than 40 hours. A sample of 10 engineers showed the following
working hours last week: 44, 42, 39, 46, 41, 43, 40, 44, 42
a. Use a 0.05 significance level to test the claim
b. State Ho and Ha
c. Calculate the sample mean and standard deviation.
d. Find the t-statistics and make a conclusion

3. Two types of fertilizer are being compared. The yield (in kg) of crops using
each fertilizer are shown below:

Fertilizer A: 45, 47, 50, 44, 46


Fertilizer B: 51, 53, 52, 54, 55

Assuming equal variances:


a.Test whether the two fertilizer results in different yield at 0.01 level
of significance
a. Compute the pooled variance
b. Find the t-statistics and the critical value
c. State your decision

4. Three different machines are tested for the time (in minute) it takes to
produce a certain component. The data are:
Machine A: 20, 22, 29
Machine B: 24, 23, 25
Machine C: 21, 20, 22
a. Perform the one-way ANOVA at α = 0.05
b. State the null and alternative hypotheses
c. Show the computation of SST, SSB and SSE
d. Draw a conclusion

5. A study is conducted to examine the relationship between gender and preference


for a new product
Gender Like Neutral Dislike
Male 20 10 5
Female 15 25 15

a. Test the independence of gender and product preference at


α = 0.05
b. State Ho and Ha
c. Compute the expected frequencies
d. Calculate the chi-square statistics and make a conclusion

73
Chapter VII

1. Given the following dataset :

Study Hours(x) Exam Score(y)


2 65
3 70
5 75
7 85
8 90
a. Compute the Mean, slope and the intercept
b. Write the regression equation
c. What does the slope means in this context?
d. Predict the exam score of a student who studied for 6 hours.

2. Based on the regression equation from Problem 1


a. Compute the predicted 𝑦̂
b. Compute the residual e = y - 𝑦 ̂
c. Calculate SSR, SSE, SST
𝑆𝑆𝑅
d. Compute R2 =
𝑆𝑆𝑇
e. How well does the model explain the variability
f. Is the model a good fit

3. An engineer wants to predict the fuel consumption of a machine based on


the load (kg) applied and operating time (hrs.). Data from 10 trials are
collected.
a. Perform multiple linear regression
b. Interpret the model output
c. Predict fuel consumption when load = 100kg and time = 4 hours
d. Validate the model using residual analysis

74
REFERENCES

Montgomery, D. C., & Runger, G. C. (2014). Applied Statistics and Probability


for Engineers.

Walpole, R. E., Myers, R. H., Myers, S. L., & Ye, K. (2012). Probability and
Statistics for Engineers and Scientists.

Devore, J. L. (2011). Probability and Statistics for Engineering and the


Sciences (8th Edition).

Navidi, W. (2015). Statistics for Engineers and Scientists (4th Edition).


McGraw-Hill Education.

NIST/SEMATECH e-Handbook of Statistical Methods

https://www.itl.nist.gov/div898/handbook/

Khan Academy – Statistics and Probability


https://www.khanacademy.org/math/statistics-probability

MIT OpenCourseWare – Probability and Statistics in Engineering


https://ocw.mit.edu

75
76

You might also like