Mathematics 215: Introduction to Statistics

Study Guide

Unit 5: Self-Test Answer Key

Show all your work and keep your calculations to four decimal places, unless otherwise stated.

  1. Circle True (T) or False (F) for each of the following:
    1. True is correct answerTF
      Testing an alternative hypothesis that the mean of the first population is less than the mean of the second population is the same as: μ1μ2<0 .
    2. TFalse is correct answerF
      When conducting a test of hypothesis involving two population means, if both random samples exceed 30 and the population standard deviations are unknown, then the standard normal distribution can be used to find the critical values.
    3. True is correct answerTF
      When conducting a test of hypothesis involving two population means, if both random samples exceed 30 and both population variances are unknown but equal, then the pooled standard deviation is used in the computation of the test statistic.
    4. TFalse is correct answerF
      The paired or matched sample test is appropriate to use when the two randomly selected samples are independent.
    5. True is correct answerTF
      In a hypothesis test for independence with a contingency table, the alternative hypothesis is that the two variables are related.
    6. True is correct answerTF
      A hypothesis test where the null hypothesis is that three or more population means are equal is called ANOVA.
  2. Economic research shows that in any given month, the unemployment rate of college graduates is significantly lower than the unemployment rate of high-school graduates. The unemployment rate refers to the proportion of graduates registered as unemployed in any given month. In a random sample of 1,200 college graduates, 60 were unemployed; and in a random sample of 1,000 high-school graduates (no college), 64 were unemployed.

    At the 1% significance level, do the samples provide sufficient evidence to conclude that college graduates have a lower unemployment rate than high-school graduates? Use the p-value approach and show all key steps.

    Solution:

    Given n 1 =1200 and n 2 =1000 ,

    the sample proportion of college graduates unemployed is p ^ 1 = 60 1200 =0.05 , and the sample proportion of high-school graduates unemployed is p ^ 2 = 64 1000 =0.064 .

    Step 1: State the null hypothesis ( H 0 ) and the alternative hypothesis ( H 1 )

    H 0 : p 1 p 2 =0 ( p 1 is the same as p 2 )

    H 1 : p 1 p 2 <0 ( p 1 is less than p 2 )

    Step 2: Select the distribution to use.

    Select the z distribution as the sample sizes are large and the sample sizes multiplied by the sample proportions exceed 5.

    Step 3: Calculate the p-value .

    First, calculate the test statistic:

    p ¯ = x 1 + x 2 n 1 + n 2 = 60+64 1200+1000 = 124 2200 =0.0564

    s p ^ 1 p ^ 2 = p ¯ q ¯ 1 n 1 + 1 n 2 = 0.0564 0.9436 1 1200 + 1 1000 = 0.0001 =0.0099

    z observed = ( p ^ 1 p ^ 2 )( p 1 p 2 ) s p ^ 1 p ^ 2 = 0.050.064 0 0.0099 = 0.0140 0.0099

    z observed  =1.4141

    Then compute p-value , which is area to left of z=1.4141=0.0793

    solution

    Step 4: Make a decision.

    Since the p-value of 0.0793 exceeds α=0.01 , do not reject H 0 .

    We cannot conclude that the proportion of college graduates who are unemployed is significantly less than the proportion of high-school graduates who are unemployed.

  3. A heart specialist wants to see if she can lower the cholesterol levels in 6 patients by enrolling these people in a rigorous six-month exercise program. The cholesterol levels for each of the patients before and after taking the exercise program are shown in the table below.

    At α=0.05 , did the cholesterol level decrease on average after taking the exercise program? Using the critical value approach, conduct the appropriate test of hypothesis. Show all key steps. Assume that the population of cholesterol level differences is normally distributed.

    Patient 1 2 3 4 5 6
    Before 243 216 214 222 206 219
    After 215 202 198 195 204 213

    Solution:

    This is a mean of paired samples problem, where the two samples are dependent.

    Let d = (cholesterol level after program) – (cholesterol level before program)

    To find d ¯ and s d ¯ , set up the following table:

    Cholesterol level after program (A) Cholesterol level before program (B) Difference d=A-B d 2
    215 243 28 784
    202 216 14 196
    198 214 16 256
    195 222 27 729
    204 206 2   4
    213 219 6 36
        d=93 d 2 =2005

    d ¯ = d n = 93 6 =15.5

    s d = d 2 d 2 n n1 = 2005 93 2 6 5 =10.6160

    s d ¯ = s d n = 10.6160 6 =4.3339

    The hypothesis test:

    Step 1: State the null hypothesis ( H 0 ) and the alternative hypothesis ( H 1 ). Test: (After < Before) or (After – Before) <0 , so:

    H 0 : μ d 0 (cholesterol level did not decrease)

    H 1 : μ d <0 (cholesterol level decreased)

    Step 2: Select the distribution to use.

    Since the population is normally distributed with the population standard deviation unknown, the t distribution is used.

    Step 3: Determine the rejection and non-rejection regions.

    Since H 1 contains “ < ”, this is a one-tailed test with the left tail having an area of α=0.05 .

    The df=n-1=6-1=5 .

    solution

    Step 4: Calculate the test statistic.

    t observed = d ¯ μ d s d ¯ = 15.50 4.3339 =3.5765

    Step 5: Make a decision.

    Since the t observed falls in the rejection region, we reject H 0 . Conclusion: The mean cholesterol level has decreased after taking the exercise program.

  4. A government health care agency reported, based on a random sample of 16 women who have health insurance, that insured women spend on average 2.3 days in the hospital for a routine childbirth. Based on a separate independent random sample of 16 women who do not have health insurance, uninsured women reportedly spend on average 1.9 days in the hospital for routine childbirth. The standard deviation of the first sample is equal to 0.6 day, and the standard deviation of the second sample is 0.3 day.

    At α=0.01 , test the claim that the mean hospital stay is the same for insured and uninsured women. Assume both samples come from normal populations with equal variances. Show all key steps in using the critical value approach.

    Solution:

    Let μ 1  = population mean hospital stay for insured women.

    Let μ 2  = population mean hospital stay for uninsured women.

    Step 1: State the null hypothesis ( H 0 ) and the alternative hypothesis ( H 1 ).

    H 0 : μ 1 μ 2 =0 (mean hospital stay is the same for both populations)

    H 1 : μ 1 μ 2 0 (mean hospital stay differs between both populations)

    Step 2: Select the distribution to use.

    Since the populations are normally distributed with population standard deviations unknown, the t distribution is used.

    Step 3: Determine the rejection and non-rejection regions.

    Since H 1 contains “ ”, this is a two-tailed test with the combined areas of both rejection regions equal to 0.01. The df= n 1 + n 2 2=16+162=30 .

    solution

    Step 4: Calculate the value of the test statistic.

    t observed = x ¯ 1 x ¯ 2 μ 1 μ 2 s x ¯ 1 x ¯ 2 , where x ¯ 1 =2.3  and x ¯ 2 =1.9 , where μ 1 μ 2 =0 , assuming H 0 is true.

    s p = n 1 1 s 1 2 + n 2 1 s 2 2 n 1 + n 2 2 = 161 0.6 2 + 161 0.3 2 16+162 = 5.4+1.35 30 =0.4743

    x ¯ s 1 x ¯ 2 = s p 1 n 1 + 1 n 2 =0.4743 1 16 + 1 16 = 0.4743 0.3536 =0.1677

    t observed = x ¯ 1 x ¯ 2 μ 1 μ 2 s x ¯ 1 x ¯ 2 = 2.31.9 0 0.1677 =2.3852

    Step 5: Make a decision.

    Since t observed =2.3852 falls in the non-rejection region, do not reject H 0 . We cannot conclude that there is a significant difference in the mean hospital stay of insured women versus uninsured women.

  5. To test the effectiveness of a new drug, a pharmaceutical manufacturer randomly selected 100 patients and cross-classified the results in the following table. At α=0.10 , can the researcher conclude that the effectiveness of the drug is related to gender? Show all the key steps of the critical value approach.

      Effective Not effective
    Female 35 15
    Male 20 30

    Solution:

    Step 1: State the null hypothesis ( H 0 ) and alternative hypothesis ( H 1 ).

    H 0 : The effectiveness of the drug is independent of gender.

    H 1 : The effectiveness of the drug is related to gender.

    Step 2: Select the x 2 distribution for tests involving the relation between two categorical variables.

    Step 3: Determine the rejection and non-rejection regions.

    Since this is a test for independence, it is a right -tailed test with α=0.10 with df= R -1 C -1 = 2 - 1 2 - 1 =1 , where R = the number of rows and C = the number of columns in the contingency table given above.

    n = sample size = 100

    Note that the critical value=2.706 as shown below.

    solution

    Step 4: Calculate the value of the test statistic.

    x 2 observed = (OE) 2 E where this can be found using a tabular method as follows:

      Observed (O) Expected (E) OE 2 E
    Female and Effective 35 27.50 2.0455
    Female and Ineffective 15 22.50   2.50
    Male and Effective 20 27.50 2.0455
    Male and Ineffective 30 22.50   2.50
          9.0910= x 2 observed

    Note that E= rowtotal × columntotal samplesize

    Step 5: Make a decision.

    Reject H 0 as x 2 observed  falls in the rejection region.

    Conclusion: The effectiveness of the drug is related to gender.

  6. A nutritionist wishes to see whether there is any difference in the mean weight loss of individuals following one of three special diets. Individuals are randomly assigned to three groups and placed on the diet for 10 weeks. The weight losses (in kg) are shown in the table below. At α=0.05 , can the nutritionist conclude that there is a difference in the mean weight loss between the three diets? Given that all the necessary assumptions are satisfied (three normally distributed populations, etc.), use the critical value method and show all key steps.

    Protein-based diet Carbohydrates-based diet Fats-based diet
    2 5 3
    3 6 2
    3 5 2
    1 7 4
      4  
      3  

    Solution:

    To test the difference in the mean weight loss between diets, follow the steps below.

    Category (type of diet) Protein-based diet Carbohydrates-based diet Fats-based diet
    Weight loss in kilograms 2 5 3
    3 6 2
    3 5 2
    1 7 4
      4  
      3  
    Total weight loss in each group T 1 =9 T 2 =30 T 3 =11
    Size of each sample n 1 =4 n 2 =6 n 3 =4

    k =numberofgroups=3    n =totalsamplesize=14individuals

    x= T 1 + T 2 + T 3 =9+30+ 11=50= total weight loss across all three diets

    Σ x 2 = 2 2 + 5 2 + 3 2 +. . .+ 4 2 =216

    Between-samples sum of squares
    SSB = 9 2 4 + 30 2 6 + 11 2 4 50 2 14 =200.50-178.5714 =21.9286

    Within-samples sum of squares SSW =216-200.50=15.50

    Step 1: State the null hypothesis ( H 0 ) and the alternative hypothesis ( H 1 ).

    H 0 : μ 1 = μ 2 = μ 3

    H 1 : Not all three population means are equal.

    Step 2: Select the distribution to use.

    Because we are comparing the means of three normally distributed populations, we use the F distribution.

    Step 3: Determine the rejection and non-rejection regions.

    Since this is a one-way ANOVA test, it is a right-tailed test with α=0.05 with df numerator =k-1=3-1=2 and df denominator =n-k=14-3=11 .

    solution

    Step 4: Calculate the value of the test statistic.

    F observed =  MSB MSW =  SSB (k1) SSW (nk) =  21.9286 2 15.50 11 =  10.9643 1.4091 =7.7811

    Step 5: Make a decision.

    Since F observed falls in the rejection region, we reject H 0 . There is a difference in the mean weight loss between the three diets.

  7. A large coffee-house chain wants to determine if its regular customers have a preference in the type of music that is played over the coffee houses’ speaker systems. A random sample of 100 regular customers is selected and the number of customers who prefer each type of music is recorded in the table below. At the 2.5% level of significance, can you reject the hypothesis that there is no difference in customer preference between the four types of music? Show all key steps for the critical value approach.

    Music type Country Rock Jazz Classical
    Number of customers 30 30 20 20

    Solution:

    This random sample may be classified as a multinomial experiment. To test whether the observed frequencies for an experiment follow a certain pattern or theoretical distribution, you use a goodness-of-fit test. Use the critical value approach.

    To test the hypothesis, follow the steps below.

    Step 1: State the null hypothesis ( H 0 ) and the alternative hypothesis ( H 1 ).

    H 0 : There is no difference in customer preference between the four types of music.

    H 1 : There is a difference in customer preference between the four types of music.

    Step 2: Select the distribution to use.

    Select the x 2 distribution for goodness-of-fit tests.

    Step 3: Determine the rejection and non-rejection regions.

    Since this is a goodness-of-fit test, it is a right-tailed test with α=0.025 with df=k-1=4-1=3 , where k = number of categories of music.

    solution

    Step 4: Calculate the value of the test statistic.

    p= 1 4 =0.25 is the probability that a random customer’s preference of music choice is in a given category.

    n = sample size = $100$ customers

    The expected value=E=np=100* 0.25 =25 for each of the categories.

    x 2 observed = (OE) 2 E =4 determined as follows:

    Category Observed (O) Expected (E) O-E 2 E
    Country 30 25 1
    Rock 30 25 1
    Jazz 20 25 1
    Classical 20 25 1
          4 = x 2 observed

    Step 5: Make a decision.

    Do not reject H 0 as x 2 observed  falls in the non-rejection region.

    Conclusion: We cannot conclude that there is a difference in customer preference between the four types of music.

  8. A random sample of the lifetimes of 24 watches manufactured under the same brand name displayed a standard deviation of 3.5 months. At the 1% level of significance, test the hypothesis that the population standard deviation of the lifetimes of this brand name of watch is less than 4 months. Show all key steps using the critical value method. Assume that the lifetimes of all the watches manufactured under this same brand name are normally distributed.

    Solution:

    Step 1: State the null hypothesis ( H 0 ) and the alternative hypothesis ( H 1 ).

    H 0 : σ4

    H 1 : σ<4

    Step 2: Select the distribution to use.

    Since the population is normally distributed, we use the Chi-square distribution to test a hypothesis about the population variance.

    Step 3: Determine the rejection and non-rejection regions.

    Since H 1 contains “ < ”, the rejection region is on the left side of the chi-square distribution, with an area of 0.01 and df=241=23 .

    solution

    Step 4: Calculate the value of the test statistic.

    x 2 observed = (n1) s 2 σ 2 = (23)12.25 16 =17.6094

    Step 5: Make a decision.

    Since the test statistic falls in the non-rejection, we do not reject H 0 . There is no evidence that the standard deviation of the watch lifetimes is less than $4$ months.