When is estimation used in relation to a hypothesis test




















The variable is GPA. It is quantitative, so the parameters are means. Are college athletes more likely than nonathletes to receive academic advising? This question contains a claim that compares two population proportions: college athletes and college students who are not athletes. The variable is Receive academic advising yes or no. The variable is categorical, so the parameters are proportions. In the case of testing a claim about a single population parameter, we compare it to a numeric value.

In the case of testing a claim about two population parameters, we compare them to each other. We already know that in inference we use a sample to draw a conclusion about a population. If the research question contains a claim about the population, we translate the claim into two related hypotheses. The null hypothesis is a hypothesis about the value of the parameter.

The null hypothesis relates to our work in Linking Probability to Statistical Inference where we drew a conclusion about a population parameter on the basis of the sampling distribution. We started with an assumption about the value of the parameter, then used a simulation to simulate the selection of random samples from a population with this parameter value. Or we used the parameter value in a mathematical model to describe the center and spread of the sampling distribution. The null hypothesis gives the value of the parameter that we will use to create the sampling distribution.

In this way, the null hypothesis states what we assume to be true about the population. The alternative hypothesis usually reflects the claim in the research question about the value of the parameter.

Here are the hypotheses for the research questions from the previous example. The null hypothesis is abbreviated H 0. The alternative hypothesis is abbreviated H a. When the research question contains a claim that compares two populations, the null hypothesis states that the parameters are equal. We revisit this idea in more depth later. He looks at the table and sees that for 5 df there are 6 classes—there is an expected frequency for size 11 socks , only.

He decides that it would not be all that surprising if the players had a different distribution of sock sizes than the athletes who are currently buying Easy Bounce, since all of the players are women and many of the current customers are men. As a result, he uses the smaller. He starts by finding the expected frequency of size 6 socks by multiplying the relative frequency of size 6 in the population being produced by 97, the sample size.

He then realizes that he will have to do the same computation for the other five sizes, and quickly decides that a spreadsheet will make this much easier see Table 4. David performs his third step, computing his sample statistic, using the spreadsheet.

David has found that his sample data support the hypothesis that the distribution of sock sizes of the players is different from the distribution of sock sizes that are currently being manufactured. If Easy Bounce socks are successfully marketed to the BC college players, the mix of sizes manufactured will have to be altered. Now review what David has done to test to see if the data in his sample support the hypothesis that the world is unsurprising and that the players have the same distribution of sock sizes as the manufacturer is currently producing for other athletes.

Formally, David first wrote null and alternative hypotheses, describing the population his sample comes from in two different cases. The first case is the null hypothesis; this occurs if the players wear socks of the same sizes in the same proportions as the company is currently producing.

The second case is the alternative hypothesis; this occurs if the players wear different sizes. After he wrote his hypotheses, he found that there was a sampling distribution that statisticians knew about that would help him choose between them. Acting on this finding, David will include a different mix of sizes in the sample packages he sends to team coaches.

As you learned in Chapter 3 , sample proportions can be used to compute a statistic that has a known sampling distribution. Reviewing, the z-statistic is:. If you look at the z-table, you can see that.

The basic strategy is the same as that explained earlier in this chapter and followed in the goodness-of-fit example: a write two hypotheses, b find a sample statistic and sampling distribution that will let you develop a decision rule for choosing between the two hypotheses, and c compute your sample statistic and choose the hypothesis supported by the data.

Foothill did not have the right machinery to sew on the embroidered patches and contracted out the sewing. Kevin writes his hypotheses, remembering that Foothill will be making a decision about spending a fair amount of money based on what he finds.

When writing his hypotheses, Kevin knows that if his sample has a proportion of decorated socks well below. He only wants to say the data support the alternative if the sample proportion is well above.

To include the low values in the null hypothesis and only the high values in the alternative, he uses a one-tail test, judging that the data support the alternative only if his z-score is in the upper tail. He will conclude that the machinery should be bought only if his z-statistic is too large to have easily come from the sampling distribution drawn from a population with a proportion of.

Kevin will accept H a only if his z is large and positive. Checking the bottom line of the t-table, Kevin sees that. If his sample z is greater than Using the data the salespeople collected, Kevin finds the proportion of the sample that is decorated:. Figure 4. Because his sample calculated z-score is larger than John can feel comfortable making the decision to buy the embroidery and sewing machinery. We also use hypothesis testing when we deal with categorical variables.

Categorical variables are associated with categorical data. For instance, gender is a categorical variable as it can be classified into two or more categories. In business, and predominantly in marketing, we want to determine on which factor s customers base their preference for one type of product over others.

If it does, she will explore the idea of charging different prices for dishes popular with different age groups. The sales manager has collected data on sales of different dishes over the last six months, along with the approximate age of the customers, and divided the customers into three categories.

Table 4. The underlying test for this contingency table is known as the chi-square test. Then we calculate the expected frequency for the above table with i rows and j columns, using the following formula:. This chi-square distribution will have i -1 j -1 degrees of freedom. One technical condition for this test is that the value for each of the cells must not be less than 5.

The expected frequency, E ij , is found by multiplying the relative frequency of each row and column, and then dividing this amount by the total sample size. For each of the expected frequencies, we select the associated total row from each of the age groups, and multiply it by the total of the same column, then divide it by the total sample size. Now we use the calculated expected frequencies and the observed frequencies to compute the chi-square test statistic:. We computed the sample test statistic as To find out the exact cut-off point from the chi-square table, you can enter the alpha level of.

This template contains two sheets; it will plot the chi-square distribution for this example and will automatically show the exact cut-off point. The result indicates that our sample data supported the alternative hypothesis. Based on this outcome, the owner may differentiate price based on these different age groups.

Using the test of independence, the owner may also go further to find out if such dependency exists among any other pairs of categorical data. This time, she may want to collect data for the selected age groups at different locations of her restaurant in British Columbia. The results of this test will reveal more information about the types of customers these restaurants attract at different locations. Depending on the availability of data, such statistical analysis can also be carried out to help determine an improved pricing policy for different groups in different locations, at different times of day, or on different days of the week.

There are two variables of interest: 1 height in inches and 2 weight in pounds. Both are quantitative variables. The parameter of interest is the correlation between these two variables. We are not given a specific correlation to test.

We are being asked to estimate the strength of the correlation. The appropriate procedure here is a confidence interval for a correlation. Research question: Are the majority of registered voters planning to vote in the next presidential election?

The parameter that is being tested here is a single proportion. We have one group: registered voters. This is a specific parameter that we are testing. The appropriate procedure here is a hypothesis test for a single proportion. We are comparing them in terms of average i. The appropriate procedure here is a hypothesis test for the difference in two means.

Research question: On average, how much taller are adult male giraffes compared to adult female giraffes? There are two groups: males and females. The response variable is height, which is quantitative. We are not given a specific parameter to test, instead we are asked to estimate "how much" taller males are than females.

The appropriate procedure is a confidence interval for the difference in two means. The appropriate procedure is a hypothesis test for the difference in two proportions.

Research question: Is there is a relationship between outdoor temperature in Fahrenheit and coffee sales in cups per day? There are two variables here: 1 temperature in Fahrenheit and 2 cups of coffee sold in a day.



0コメント

  • 1000 / 1000