Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
1K views11 pages

Hypothesis Testing Basics for Students

The document discusses hypothesis testing and comparing sample and population parameters. It introduces key concepts like the null and alternative hypotheses, types of errors, test statistics like z-test and t-test, and the basic steps of hypothesis testing. The z-test is used to compare the sample mean to the population mean when the population variance is known. The t-test is used when the population variance is unknown. Critical values and regions help determine whether to reject or fail to reject the null hypothesis.

Uploaded by

lexter14
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views11 pages

Hypothesis Testing Basics for Students

The document discusses hypothesis testing and comparing sample and population parameters. It introduces key concepts like the null and alternative hypotheses, types of errors, test statistics like z-test and t-test, and the basic steps of hypothesis testing. The z-test is used to compare the sample mean to the population mean when the population variance is known. The t-test is used when the population variance is unknown. Critical values and regions help determine whether to reject or fail to reject the null hypothesis.

Uploaded by

lexter14
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 11

Basic Concepts of Hypothesis Testing

Introduction A hypothesis is a statement or a tentative theory that may or may not be true, but is initially assumed to be true until new evidence suggests otherwise. It may be proposed from a preliminary observation, a guess or based from previous experiences. In hypothesis testing problem, the researcher has in mind a specific notion concerning the characteristics of the population under study before the sample data are gathered. Then investigate the sample information to examine how consistent the data with the hypothesis in questioned. If the sample information deviate much from the stated hypothesis, then researcher tend to disbelieved and reject the proposed statement. Although the proposed statement may be true, it is expected that any single sample (or samples) will differ slightly from the true characteristic of the population and other will not, because of the sampling variation, have the same exact value as the population parameter. Hence, differences between the sample information and the population under study might be due chance. The procedure of statistical test will provide the basis in deciding whether differences between the sample observation and the hypothesized value could be due to sampling variation alone, or are so large enough as to make the proposed statement untenable. Objectives After completing this module the student is expected but not limited to: 1. state the null and alternative hypothesis of the concerned problems; 2. distinguish between directional and non-directional test; 3. know the types of error committed in hypothesis testing; 4. describe the basic steps in hypothesis testing; Types of Hypothesis Null hypothesis the null hypothesis is denoted by Ho, it is the hypothesis of no difference and usually formulated for the purpose of being rejected. Alternative hypothesis the alternative hypothesis is denoted by Ha or H1. This the hypothesis that contradicts the hull hypothesis. If the null hypothesis is rejected, the alternative is being supported. The alternative hypothesis is the operational statement of the experimenters research hypothesis.
math128a_prelimlec1

Types of Errors Two types of errors are usually committed in deciding to accept or reject the null hypothesis Type I error usually committed if the Ho is rejected when the Ho is true Type II error usually committed if the Ho is accepted when the Ho is false In actual situation a given table below summarizes the kind of action and the type of error when one accepts or rejects a hypothesis. Status of Hypothesis Accept Ho Reject Ho Ho is True Correct Decision Type I error Ho is False Type II error Correct Decision One-sided and Two-sided Test Directional/Nondirectional Test In one-sided test the Ha specifies that the unknown population parameter is entirely above or entirely below the specified value of the Ho. It is called one-tailed or a directional test. While the two- sided test, the Ha specifies that the unknown population parameter can lie on either side of the value specified by Ho. It is called two-tailed or nondirectional test. The following are classified as one-tailed test Ho : = 100 Ha : < 100 Ho : = 100 Ha : > 100 The Ha is either entirely below 100 or entirely above 100 The following are classified as two-tailed test Ho : = 100 Ha : 100 Ho : 1 = 2 Ha : 1 - 2 0 The Ha can fall in either side of the Ho Note : For a directional test the inequality symbol of the Ha is less than (<) or greater than (>) while a non-directional test the symbol is not equal ( ). Testing Level of Significance

The probability of committing type I error is called the level of significance or margin of error of the test and it is denoted by (alpha), and one minus the level of significance is called confidence level. The probability of committing type II error is denoted by , and (1 ) is called the power of the test. This will indicate the ability of the test statistics to determine correctly that the Ho is false, hence it should be rejected. Rather compute the actual chance of committing type I error, the researcher conventionally establish the level of significance before hand by considering the consequences of committing type I error. There are 2 most commonly used level of significance 0.05 and 0.01. At 0.05 level, the researcher is willing to accept a 5% chance of being wrong decision when Ho is rejected. At 0.01 level, the researcher is willing to accept a 1% chance of being wrong when Ho is rejected. If Ho is rejected at 0.05 level, then it is usually labeled as "significant, otherwise the result is labeled not significant. If Ho is rejected at 0.01 level, then the result is labeled Highly significant. For a fixed sample size n, decreasing one type of error would mean increasing the other type of error. The only way to decrease both type of errors simultaneously is by increasing the sample size. The Critical Region The level of significance determines which values would be considered improbable or probable if the hypothesis were true. Thus, the range of possible values (sampling distribution) is divided into two sections or regions, the acceptance region (the probable values) and the rejection region (improbable values). The size of both region is completely specified by the level of significance. The acceptance region is equal to (1 ) and the critical region or the rejection region is equal to . The experimenter will decide to reject the null hypothesis only if the probability of observing of such an observed value is equal to or less than . The size of the critical region is being determined by , in general the location of the critical region is determined by the nature of the alternative hypothesis. The difference in the location of the critical region differentiates the statistical hypothesis into one-tailed or two-tailed test. Critical region is the set of all values of the test statistics that would cause to reject the null hypothesis.
math128a_prelimlec2

Basic Steps in Hypothesis Testing The following steps should be properly observed so as to make sure that the thinking is logical. 1. State the Ho and Ha, decide what data to collect and under what conditions 2. Specify the level of significance and the sample size n 3. Find the sampling distribution of the test statistics under the assumption that Ho is true 4. Establish the critical region for the test statistics 5. Computation of the test statistics, for a sample size n 6. Decision. Note: STEPS 4 AND 5 MAY INTERCHANGE

MODULE 2: Comparing Sample and Population


Introduction
This module presents parametric statistical test which may be used to test a hypothesis concerning the sample in relation to the population from which it was drawn. This statistical procedure will tell us whether a particular sample could have come from a specified or hypothesized population. The interest of this module will lead to testing the significant difference between the sample mean and the population mean when the population variance is known and when the population variance is unknown, and between sample proportion and population proportion.

Objectives

After completing this module you are expected but not limited to: 1. choose the appropriate test statistics in comparing sample population; 2. know how to determine the critical values of the test statistics; 3. draw conclusions based from the sample data;

and

Z-test Statistics

Suppose that X1, X2, . . . Xn are sample observations from a normal populations with unknown mean and known variance. The appropriate test statistics in comparing the sample mean and the population is

distribution. The t-distribution becomes more and more similar with the normal distribution as the number of degree of freedom increases. Z-test Statistic for Proportion When the sample size is large, the distribution of the sample proportion p is approximately normal distribution with mean p and variance pq/n. Hence, the statistic is normally distributed with mean 0 and variance 1. This is the appropriate test statistics used to compare the sample proportion and the hypothesized population proportion. Critical Values of Z-test Statistics The critical values of Z-test statistics at 0.05 and 0.01 level of significance for one-tailed and two-tailed test. Level of Significance Tailed 0.05 1.960 0.01 2.575 If the given observations X1, X2, . . . Xn is a random sample from a normal distribution, but the population variance is unknown, then the test statistics has a t-distribution with (n-1) degrees of freedom. A t-distribution is symmetric probability distribution centered at zero, and looks similar but more variable (spread out) than the normal Critical Values of T-test Statistics (see table given) Hypothesis Testing Procedure 1. Ho : = o p = po Ha : o p po < o p < po > o p > po 2. Specify the level of significance and the sample size n 3. Test Statistics : Z-test or t-test 4. Critical Region : Reject Ho if tc > ttab for Ha: > o ; Zc > Ztab for Ha: > o or p > po tc < -ttab for Ha: < -o ; Zc < -Ztab for Ha: < o or p < po tc ttab for Ha: o ; Zc Ztab for Ha: o or p 2.330 1.645 One-Tailed Two-

which is normally distributed with mean 0 and variance 1. The above given test statistics can be also used even if the population variance is unknown, as long as the sample size is large enough (n 30). The unknown population variance can be accurately estimated by the sample variance. Hence, the test statistics is given by

t-test Statistics

The population variance is generally unknown and is estimated by the variance of the random samples. The sampling variability of the sample variance may be affected if the sample size is small (n less than 30). Hence, the unavailability of the population variance must take into consideration and to be estimated by the sample variance.

math128a_prelimlec3

po 5. Computations: Compute the z-test/t-test statistics 6. Decision : Sample Problems 1. A study shows that the average score of the applicants who took the entrance examination was 45 with a standard deviation of 5.15. Is there a reason to believed that the present examinees is better than the previous results if a random sample of 36 applicants showed an average score of 47.34, use 0.01 level of significance. 2. A high school principal claims that the average performance of his graduating class in math was 83. To test this claim, 25 students were randomly selected from the recent graduating class with the following results: 87, 85, 76, 83, 78, 90, 89, 85, 82, 77, 79, 76, 86, 83, 93, 88, 84, 76, 79, 85, 82, 81, 85, 84, 80. Test the principal's claim using 0.05 level of significance. 3. According to the NCR Department of Education, 25 percent of high school teachers had passed the English proficiency. What conclusions would you draw if only 70 in random sample of 250 teachers passed the recent English proficiency, use 0.01 level of significance.

2. conduct online data analysis 3. draw conclusions based from the two dependent or correlated samples. Test Statistics The difference between two populations when dependent or correlated samples was used is almost the same with the t-test with singles samples in the previous module. The only difference is that in dependent samples we will be dealing with the difference of two observed values rather than the original values. The new formula of the t-test statistic is given by d d n tc = Sd Where d average observed difference hypothesized average difference d Sd standard deviation of observed difference n number of paired values has the t-distribution with (n-1) degrees of freedom.

Sample Problem/s 1. The following data represent the weights (in lbs.) of 10 obese participants before and after undergoing a two-weeks weight reducing training program. Participants Before After 1 275 250 2 250 215 3 235 218 4 212 198 5 215 200 6 263 245 7 289 260 8 258 250 9 215 202 10 249 230

MODULE 3: Comparing Two Dependent Samples


Introduction In comparing two dependent samples, the data are considered as a paired values. This is a result of data being obtained from a certain before and after studies, a result from a pairs of observation from two different populations, or a result from matching two subjects of similar characteristics to form a matched pairs. This pairs of observations are compared directly to one another by using their observed differences. The purpose of using correlated or dependent samples is to eliminate or remove the effect of uncontrolled factors which are not part but might influence the outcome of the study. Matched pairs or pairing of observations will assure the researcher that the observed differences of the two samples was really due to the influences of the factors under study. Objectives After completing this module you are expected but not limited to: 1. select the appropriate test statistics in comparing two dependent or correlated samples;
math128a_prelimlec4

Test the hypothesis that the mean weight loss of obese participants after the training program is 10 lbs. against the alternative that the loss weight is greater than 10 lbs. Use 0.01 level of significance. 2. A researcher wishes to determine if there is systematic difference between the readings of the two digital weighing scales. The following data were obtained:

Sample No.

Scale A Scale B

50.0 49.9

82.5 82.7

53.8 85.4 75.4 63.5 35.8 53.8 85.3 75.4 63.7 35.7

25.3 24.9

Use 0.05 level of significance to test whether there is no significant difference between the readings of the two scales. (the weights are express in grams).

MODULE 4: Comparing Two Independent Samples


Introduction
Sometimes it is impossible to design a study or an experiment which utilizes a matched or related samples in comparing two populations. Because it is difficult to look for the subjects or units with more or less the same characteristics before the intervention or application of treatments. The researcher may use the two independent samples by randomly selecting the samples from the two populations or by randomly assigning the subjects or units to the different groups or treatments. The samples size in an independent population need not necessarily be of the same size. The parametric tests for analyzing data from two independent samples are the z-test and t-test. These test statistics assumes that the data in both samples are normally distributed and are usually in an interval scale of measurement.

which is normally distributed with mean 0 and variance 1. However, if the population variances are unknown but the sample sizes are assumed to be large. The unknown population variances can be estimated by their corresponding sample variances. Hence the Z-test statistics is given by

t-test Statistics Equal Variance


Hypothesis testing about the differences of means between two independent populations will involved the t-distribution and the t-test statistics. When the two samples are independent, and the two independent populations are normally distributed and the population variances are unknown but assumed to be equal.

Objectives
After completing this module, you are expected but not limited to: 1. choose the appropriate test statistics in comparing two independent samples; 2. conduct data analysis using an online statistical software; 3. draw conclusions based from two independent samples.

The test statistics

Z-test Statistics
Suppose tat the observations X1, X2, ..., Xn1 and Y1, Y2, . . . , Yn2 are random samples from two independent populations. To test the hypothesis that the two population means are equal. Then the appropriate test statistics is the Z-test. The formula of the test statistics is given by
math128a_prelimlec5

2 3 . . . . Sample Mean Sample Standard Deviation has a t-distribution with (n-2) degrees of freedom.

x2 x3 . . . xn1 x1 S1 S12

y2 y3 . . . yn2 x2 S2 S22

Sample Variance Sample Problems

Z-test Statistics for Proportions


Hypothesis testing about the differences of proportions between two independent populations will involved the Z-test statistics. For instance, a researcher might be interested if there is a significant difference between proportion of passers in elementary statistics and algebra during the previous semester. The formula Ztest statistics is

1. A sample study was made by the Office of Student Affairs of the weekly allowances of the student of Cavite State University. If 60 students in the main campus averaged 250.00 pesos with a standard deviation of 15.75, while 40 offcampus students averaged 263.65 with a standard deviation of 17.30, test at 0.05 level of significance whether the difference between these two sample means is significance. 2. A classroom teacher wishes to compare the performance of students in statistics using two methods of teaching. Two independent samples of sizes 15 and 10 were randomly selected. The following data have been obtained: Method A Method B 85 83 76 85 78 83 78 90 82 89 81 87 86 84 75 83 77 79 83 82 76 84 89 78 87

p1 = (x1/n1) 1- p

p2 = (x2/n2) p = (xn)

n = n1 + n2

q=

Is there a significant difference between the performance of students in the two methods of teaching statistics? Use 0.05 level of significance.

3. Tony was successful in hitting the bulls eye in 55 out of 140 attempts, while Sony was successful in 33 of 90 attempts. At 0.05 level of significance, can we conclude that Tony is a better hitter than Sony? Exercises: A. State the null and alternative hypothesis of the following problems: 1. A manufacturer of a certain brand of rice cereal claims that the average saturated fat content does not exceed 1.5 grams. 2. The softdrink dispenser of a fastfood center was just readjusted. The manager, wanting to knoe if the dispenser is really in good condition, got a sample of 50 cups filled by the dispenser. She would only classify the

Data Layout
Sample No. 1
math128a_prelimlec6

X x1

Y y1

dispenser as in good condition if the average fill per cup of the dispenser is 8 ounces. 3. Audee suspects that male DLSU-D students spend less time studying compared to their female counterpart. She decided to conduct a study regarding the study habits of both male and female DLSU-D students. She intends to find out if the average time per day that a male DLSU-D student spends doing his schoolwork is less than the average time per day a female DLSU-D student spends doing her schoolwork. 4. A real state agent claims that 60% of all private residences being built today are 3-bedrooms homes. 5. A fitness buff reads about a new diet program. He wants to adopt it but unfortunately, following the new diet program required buying nutritious, low calorie yet expensive food. He thus randomly selected some of his friends who adopt the new diet only if the percentage of people who claim that the new diet program works is greater than 60%. B. Perform a complete hypothesis testing to the following problems. 1. A manufacturer of sports equipment has developed a new synthetic fishing line that he claims has a mean breaking strength of 8kgs. With standard deviation of 0.5kg. Test the hypothesis that = 8kgs. against the alternative that 8kgs. If a random sample 0f 50 lines is tested and found to have mean breaking strength of 7.8 kgs. Use a 0.01 level of significance.

= 5.5oz against the alternative hypothesis, < 5.5oz at 0.05 level of significance. 5. Test the hypothesis that the average content of containers of a particular lubricant is 10 liters if the contents of a random sample of 10 containers are 10.2, 9.7, 10.1, 10.3, 10.1, 9.8, 9.9, 10.4, 10.3, and 9.8 liters. Use a 0.01 level of significance and assume that the distribution of contents is normal. 6. The average height of females in the freshman class of a certain college has been 162.5cm with a standard deviation of 6.9cm. Is there reason to believe that there has been a change in the average height if the random sample of 50 females in the present freshman class has an average height of 165.2cm? 7. It is claimed that an automobile is driven on the average more than 20000 km per year. To test this claim, a random sample of 100 automobile owners are asked to keep a record of the kilometers they travel. Would you agree with this claim if the random sample showed an average of 23500 kms and a standard deviation of 3900km? 8. Past experience indicates that the time for high school seniors to complete a standardized test is a normal random variable with a mean of 35 minutes. If a random sample of 20 high school seniors took an average of 33.1 minutes to complete this test with a standard deviation of 4.3 minutes, test the hypothesis at 0.05 level of significance that = 35mins. against the alternative < 35mins. 9. A manufacturer claims that the average tensile strength of thread A exceeds the average tensile strength of thread B by at least 12kgs. To test his claim, 50 pieces of each type of thread are tested under similar conditions. Type A thread had an average tensile strength of 86.7kgs with a standard deviation of 6.28kgs, while Type B thread had an average tensile strength of 77.8kgs with a standard deviation of 5.61kgs. Test the manufacturers claim using a 0.05 level of significance. 10. An experiment was performed to compare the abrasive wear of two different laminated materials. Twelve pieces of material 1 were tested by exposing each piece to a machine measuring wear. Ten pieces of material 2 were similarly tested. In each case, the depth of wear was observed. The samples of material 1 gave an average (coded) wear of 85 units with a sample standard deviation of 4, while the samples of material 2 gave an average (coded) wear of 81 units with a sample standard deviation of 5. Can we conclude at 0.05 level of significance, that the abrasive wear of material 1 exceeds that of material 2 by more than 2 units? Assume the populations to be approximately normal with equal variances.

2. The Edison Electric Institute has published figures on the annual number of
kilowatt-hours expended by various home appliances. It is claimed that a vacuum cleaner expends an average of 46 killowatt-hours per year. If a random sample of 12 homes included in a planned study indicates that vacuum cleaner expend an average of 42 kilowatt-hour per year with a standard deviation of 11.9 killowatt-hours, does this suggest at 0.05 level of significance the vacuum cleaner expends, on the average, less than 46 kilowatt-hour annually? Assume the population of kilowatt- hours to be normal.

3.

An electrical firm manufactures light bulbs that have a lifetime that is approximately normally distributed with a mean of 800 hours and a standard deviation of 40 hours. Test the hypothesis that = 800 hours against the alternative that 800 hours if a random sample of 30 bulbs has an average life of 788 hours.

4. A random sample of 64 bags of white cheddar popcorn weighed, on


average, 5.23oz with a standard deviation of 0.24oz. Test the hypothesis that
math128a_prelimlec7

11. A random sample of size n1 = 25, taken from a normal population with a
standard deviation of 5.2, has a mean of 81. A second random sample of size n2 = 36, taken from a normal population with a standard deviation of 3.2, has a mean of 76. Test the hypothesis that 1 = 2 against the alternative that 1 2.

12. In the study conducted by the Department of Human Nutrition and Foods
in Virginia Polytechnic Institute and State University the following data on the comparison of sorbic acid residuals in parts per million in ham immediately after dipping in a sorbate solution and after 60 days of storage were recorded: Slice 1 2 3 4 5 6 7 8 Sorbic acid residuals in ham Before storage 224 270 400 444 590 660 1400 680 After storage 116 96 239 329 437 597 689 576

Assuming the populations to be normally distributed, is there sufficient evidences, at the 0.05 level of significance, to say that the length of the storage influences sorbic acid residual concentrations? 13. According to the published reports, practice under fatigued conditions distort mechanism which governs performance. An experiment was conducted using 15 college males who were trained to make a continuous horizontal right-left arm movement from a micro-switch to a barrier, knocking over the barrier coincident with the arrival of a clock sweephand to the 6 oclock position. The absolute value of the difference between the time, in milliseconds, that it took to knock over the barrier and the time for the sweephand to reach the 6 oclock position (500msec) was recorded. Each participant performed the task five times under prefatigued and postfatigue conditions, and the sums of the absolute differences for the five performances were recorded as follows: Subje ct 1
math128a_prelimlec8

2 92 59 3 65 215 4 98 226 5 33 223 6 89 91 7 148 92 8 58 177 9 142 134 10 117 116 11 74 153 12 66 219 13 109 143 14 57 164 15 85 100 An increase in the mean absolute time differences when the task is performed under postfatigue conditions would support the claim that practice under fatigued conditions distorts mechanisms that govern performance. Assuming the populations to be normally distributed, test this claim. 14. A study was made to determine if the subject matter in Statistics course is better understood when a lab constitutes part of the course. Students were randomly selected to participate in either a 3-semester- hour course without labs or a 4-semesterhour course with labs. In the section with labs 11 students made an average grade of 85 with a standard deviation of 4.7, and in the section without labs 17 students made an average grade of 79 with a standard deviation of 6.1. Would you say that the laboratory course increases the average grade by as much as 8 points? Assume the populations to be approximately normally distributed with equal variances. 15. A taxi company manager is trying to decide whether the use of radial tires instead of regular belted tires improves fuel economy. Twelve cars were equipped with radial tires and driven over a prescribed test course. Without changing drivers the same cars were then equipped with regular belted tires and driven once again over the test course. The gasoline consumption, in kilometers per liter, was recorded as follows: cars 1 2 Kilometers per liter Radial tires 4.2 4.7 Belted tires 4.1 4.9

Absolute time differences Prefatigue (msec)Postfatigue 158 91

3 4 5 6 7 8 9 10 11 12

6.6 7.0 6.7 4.5 5.7 6.0 7.4 4.9 6.1 5.2

6.2 6.9 6.8 4.4 5.7 5.8 6.9 4.7 6.0 4.9

There are many statistical investigation in which the main objectives of the study was to determine whether there exist significant relationship or association between two or more variables, the correlation coefficient is basically the appropriate statistical tool of measurement. Correlation analysis primarily tells us the magnitude or degree to which the two variables are related. It is useful in expressing how efficiently one variable has predicted the value of another variable. It would also tells us whether the variability of one variable indicates the variability of another variable. This module deals with the most commonly type of relationship between two variables with varying levels of measurements such as: interval, ordinal or nominal. This will also help to compute and interpret the degree of relationships of these variables.

Objectives
After completing this module, you are expected but not limited to: 1. determine the appropriate tool to measure the degree of relationship between two variables; 2. ascertain the applicability of different correlation coefficient; 3. interpret the degree of relationship/association between two variables; 4. solve problems involving relationships/associations between variables.

Can we conclude that cars equipped with radial tires give better fuel economy than those equipped with belted tires? 16. A large automobile manufacturing company is trying to decide whether to purchase a Brand A or Brand B tires for its new models. To help arrive at a decision, an experiment is conducted using 12 of each brand. The tires are run until they wear out. The results are: Brand A B Mean 37900 kms 39800 kms Standard Deviation 5100 kms 5900 kms

Pearson Product Moment Correlation Coefficient


The most important and widely used measure of relationship between two quantitative variables (usually in an interval scale) is the Pearson Product Moment Correlation Coefficient. Its formula is given by

Test the hypothesis that there is no difference in the two brands of tires. Assume the populations to be approximately normally distributed with equal variances. 17. According to Chemical Engineering an important property of fiber is its water absorbency. The average percent absorbency of 25 randomly selected pieces of cotton fiber was found to be 20 with a standard deviation of 1.5. A random sample of 25 pieces of acetate yielded an average percent of 12 with a standard deviation of 1.25. Is there a strong evidence that the population mean percent absorbency for cotton fiber is significantly higher than the mean for acetate. Use 0.05 level of significance.

The Pearson Product Moment Correlation Coefficient can be easily obtained using a scientific calculator with an LR mode Data Layout N 1 2 . N Total X X1 X2 . Xn Xi Y Y1 Y2 . Yn Yi XY . . . . XiYi X2 . . . . X2 Y2 . . . . Y2

MODULE 5: Tests of Relationships and Association


Introduction
math128a_prelimlec9

Interpretation of Correlation Coefficient


Coefficient - 1.00 - 0.76 to - 0.99 - 0.51 to - 0.75 - 0.26 to - 0.50 - 0.01 to - 0.25 0.00 0.01 to 0.25 0.26 to 0.50 0.51 to 0.75 0.76 to 0.99 1.00 Interpretation perfect negative correlation very high negative correlation high negative correlation moderately small negative correlation very small negative correlation no correlation very small positive correlation moderately small positive correlation high positive correlation very high positive correlation perfect positive correlation

10 11 12 13 14 15 Total

26 76 85 52 30 49 843

75 88 95 78 77 81 1233

1950 6688 8075 4056 2310 3969 70557

676 5776 7225 2704 900 2401 52525

5625 7744 9025 6084 5929 6561 101773

r=

0.858083 = 0.858

Testing for the Significance of pearson r 1. Ho: r = 0 or there is no significant relation between entrance scores and GPA Ha : r 0 or there is significant relation between entrance scores and GPA 2. level of significance = 0.05 and sample size n = 15 3. Test Statistics : t-test 4. Critical Region : Reject Ho if tc > 2.160 5. Computations: Compute the t-test statistics using the formula

Sample Problem A principal of a public high school wishes to investigate how well the entrance examination scores affect the grade point average of the freshmen students. the data of a random sample of 15 freshmen student are as follows: Student 1 2 3 4 5 6 7 8 9 Entrance Score (X) 68 56 79 53 46 80 40 69 34 GPA (Y) 85 80 85 79 86 87 78 83 76 XY 5780 4480 6715 4187 3956 6960 3120 5757 2584 X2 4624 3136 6241 2809 2116 6400 1600 4761 1156 Y2 7225 6400 7225 6241 7396 7569 6084 6889 5776 tc = 3.09356994 = 3.094 6. Decision : Since tc > 2.160, therefore reject the Ho and conclude that there is significant relation between entrance examination score and the grade point average of freshmen students.

Examples:
1. A study on the relationship of the supply and price of cellular phones in the Philippines was conducted. Twenty four (24) quarterly records in DTI was analyzed using correlation analysis. The result showed a computed r = -0.9111102 and concluded that more supply of cellular phones would cause high price in market. Is the conclusion correct? If YES, EXPLAIN. If NO, what is the correct conclusion?

math128a_prelimlec10

2. In production flow-shop problems, performance is often evaluated by minimum make-span, the total elapsed time from starting the first job on the first machine until the last job is completed on the last machine. For a particular flowshop the make-span was evaluated with respect to the number of jobs to be done. Let the independent variable X denote the number of jobs and the dependent variable Y denote the make-span (in standardized units):

Number of jobs (x) Make-span (y)

4 3.7 5

5 4.9 0

6 4.8 8

7 7.2

8 7.3

9 9.1

10 9.0

11 11. 9

12 11. 5

13 14. 1

a. Determine the linear equation model and interpret the result. b. Compute and interpret the correlation coefficient. c. Compute the coefficient of determination and draw the necessary conclusion. 3. A study of the amount of rainfall (X) in an area and the quantity of air pollution removed (Y) gave the following data? x 4.3 4.5 5.9 5.6 6.1 5.2 3.8 2.1 7.5 y 126 121 116 118 114 118 132 141 108 a. Find the equation of the regression line. What relationship exists between the two variables? b. How much pollution will be removed if the amount of rainfall is 5.0? 4. In a study comparing engine size, measured in cubic inches of displacement, and miles per gallon estimates (MPG) for 15 compact automobiles, the data are as follows; Engine size: 121 20 97 98 122 97 85 122 112 100 102 110 115 106 108 MPG: 30 31 34 27 29 34 38 32 28 32 25 30 29 28 32 a. Using MPG as the dependent variable, estimate the equation of the regression line? Interpret the results. b. Compute and interpret the coefficient of determination. c. What is the estimated mileage (MPG) when the engine size is 90?

math128a_prelimlec11

You might also like