The Hong Kong University of Science and Technology
ISOM2500 Business Statistics (L1 & L2, Spring 2025)
Assignment 2
Due on 10 May 2025 at 11:59pm (HKT)
Read the following instructions carefully:
You are prohibited from using any generative artificial intelligence (AI) tools, e.g., ChatGPT
and DeepSeek, to produce any materials or content related to this assignment.
Show your calculation steps clearly to earn full credits in all the six questions.
Numerical answers should be either exact or rounded to 4 decimal places.
This is a group assignment. Each member in a group should contribute equally. As stated in the
course syllabus, free-riding or irresponsible behavior may result in a lower individual
mark. Report to TA with supporting evidence via email in case of any irresponsible or “free-
riding” behavior of a group member.
Each group is required to submit only ONE set of assignment in Canvas.
1. Click fraud has become a major concern as more and more companies advertise on the
Internet. When companies place an advertisement link on the Google webpage, they pay
Google a fee upon any click of the link. It is fine when the click was by a person who is
interested in buying a product or service from the company, but in case it was initiated by
a computer program pretending to be a customer, it may artificially drive up the
advertiser’s cost.
Suppose that the population proportion of click fraud is 0.15. Consider a sample of 1,200
clicks from this population.
a. Find the mean and variance of the sampling distribution of the sample proportion.
b. Find the probability that the sample proportion is less than 0.145.
c. If the probability that the sample proportion is greater than 𝑘 is 0.5987, find the
value of 𝑘.
For the following parts, let us assume that we do not know the population proportion and
would like to estimate it.
d. Suppose 183 clicks from the sample of 1,200 clicks are identified as fraudulent.
Construct a 90% confidence interval for the population proportion.
1
e. If we want to estimate the population proportion accurately to within ±0.01 at 90%
confidence level, at least how many additional randomly selected clicks do we
need to collect?
2. A researcher randomly interviews 800 customers about their waiting times for the next
bus. A 95% confidence interval using an assumed population standard deviation for the
population average waiting time is [1.5, 4.2] minutes. If the researcher wants to take
another sample and reduce the margin of error to at most 0.5 minutes at 95% confidence
level, what is the minimum sample size?
3. A large software development firm recently relocated its facilities. Top management is
interested in fostering good relations with its new local community and has encouraged its
professional employees to engage in local service activities. The company believes that its
professionals should volunteer an average of more than 15 hours per month. If this is not
the case, the company may install an incentive program to increase community
involvement. Suppose 23 professional employees are randomly selected. This sample
yields a mean of 16.2 hours and a standard deviation of 1.8 hours.
a. Construct a 95% confidence interval for the monthly average hour of voluntary
service of the professionals. What is the necessary assumption to construct this
interval?
b. Is it possible to use the above interval in Part a to test if an incentive program is
needed to increase community involvement at 5% significance level?
If yes, please conduct the test with the interval; otherwise perform an appropriate
hypothesis test.
4. In a fitness center of a large real estate, which is open to all residents, the management has
recently surveyed a simple random sample of 𝑛 = 400 users of the center. Suppose she
obtained a sample mean of 32.5 years and a sample standard deviation of 5 years from the
400 users. Based on the data, she would like to determine if the mean age 𝜇 of current users
is less than 34. If so, she will consider replacing some old equipment by new machines
which are more appealing to younger users.
a. What are the null and the alternative hypotheses of a statistical test to assist the
management’s decision?
b. What is the p-value to carry out the test in Part a? What is the conclusion?
c. Construct a 90% confidence interval for the mean age of users at this fitness center.
d. What can you conclude from your ci in Part c?
2
5. To determine how the number of approved mortgages is affected by mortgage interest rates
(in %), an economist recorded the average mortgage interest rates and the numbers of
approved mortgages in a large country for the last 10 years. The data are listed below.
Interest rate (X) 8.5 7.8 7.6 7.5 8.0 8.4 8.8 8.9 8.5 8.0
Mortgages (Y) 115 111 185 201 206 167 155 117 133 150
a. Determine the least squares (LS) line for Y on X by showing all hand calculations.
b. Interpret the coefficient estimates of your LS line obtained in Part a.
c. Obtain an estimate of the average number of approved mortgages when the mortgage
interest rate is 11%. Do you think this is a reliable estimate? Explain why or why not.
d. Suppose we obtain an additional observation with X = 9.5 and Y = 200, and obtain a
new regression line by including it together with the original 10 observations.
Without fitting any equation, do you think the slope of this new regression line from
11 observations is smaller than the slope in Part a? Please explain your answer
briefly.
3
6. The Global Health Observatory (GHO) keeps tracking the health status in countries for
years to find out the predicting factors of the life expectancy (Y) in years. One such
important factor is the number of years of schooling (X). The following regression outputs
(with some numbers missing) are obtained by fitting a linear regression model for Y on X.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.81815943
R Square
Adj. R Square
Standard Error 4.57501316
Observations 173
Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 42.90159072 1.58699604 39.76896535 46.03421609
X 2.228726885 0.11977942
a. Report the fitted equation.
b. Interpret the slope estimate.
c. Determine the coefficient of determination and discuss its meaning in the context of
this problem.
d. Determine if there is enough evidence at 1% level to infer that number of years of
schooling and life expectancy are linearly related.
e. Construct a 99% confidence interval for the slope parameter of the regression model
and repeat Part d.
f. Estimate the mean life expectancy of people who attended schools for 12 years with
95% confidence. [Hint: the average number of years of schooling is 12.927, with a
sample SD of 2.912, for the above 173 countries in the regression outputs.]
g. Construct a 95% interval estimate for the life expectancy of a person who attended
schools for 12 years.