Mathematics 215: Introduction to Statistics

Study Guide

Unit 5: Self-Test Answer Key

Show all your work and keep your calculations to four decimal places, unless otherwise stated.

Circle True (T) or False (F) for each of the following:
1. True is correct answerTF
  
  Testing an alternative hypothesis that the mean of the first population is less than the mean of the second population is the same as: $μ 1 - μ 2 < 0$ .
2. TFalse is correct answerF
  
  When conducting a test of hypothesis involving two population means, if both random samples exceed 30 and the population standard deviations are unknown, then the standard normal distribution can be used to find the critical values.
3. True is correct answerTF
  
  When conducting a test of hypothesis involving two population means, if both random samples exceed 30 and both population variances are unknown but equal, then the pooled standard deviation is used in the computation of the test statistic.
4. TFalse is correct answerF
  
  The paired or matched sample test is appropriate to use when the two randomly selected samples are independent.
5. True is correct answerTF
  
  In a hypothesis test for independence with a contingency table, the alternative hypothesis is that the two variables are related.
6. True is correct answerTF
  
  A hypothesis test where the null hypothesis is that three or more population means are equal is called ANOVA.
Economic research shows that in any given month, the unemployment rate of college graduates is significantly lower than the unemployment rate of high-school graduates. The unemployment rate refers to the proportion of graduates registered as unemployed in any given month. In a random sample of 1,200 college graduates, 60 were unemployed; and in a random sample of 1,000 high-school graduates (no college), 64 were unemployed.

At the 1% significance level, do the samples provide sufficient evidence to conclude that college graduates have a lower unemployment rate than high-school graduates? Use the $p -value$ approach and show all key steps.

Solution:

Given $n_{1} = 1200$ and $n_{2} = 1000$ ,

the sample proportion of college graduates unemployed is ${\hat{p}}_{1} = \frac{60}{1200} = 0.05$ , and the sample proportion of high-school graduates unemployed is ${\hat{p}}_{2} = \frac{64}{1000} = 0.064$ .

Step 1: State the null hypothesis ( $H_{0}$ ) and the alternative hypothesis ( $H_{1}$ )

$H_{0}$ : $p_{1} - p_{2} = 0$ ( $p_{1}$ is the same as $p_{2}$ )

$H_{1}$ : $p_{1} - p_{2} < 0$ ( $p_{1}$ is less than $p_{2}$ )

Step 2: Select the distribution to use.

Select the $z$ distribution as the sample sizes are large and the sample sizes multiplied by the sample proportions exceed 5.

Step 3: Calculate the $p -value$ .

First, calculate the test statistic:

$\begin{array}{l} \bar{p} & = \frac{x_{1} + x_{2}}{n_{1} + n_{2}} \\ = \frac{60 + 64}{1200 + 1000} \\ = \frac{124}{2200} \\ = 0.0564 \end{array}$

$\begin{array}{l} s_{{\hat{p}}_{1}}_{-}_{{\hat{p}}_{2}} & = \sqrt{\bar{p} \bar{q} (\frac{1}{n_{1}} + \frac{1}{n_{2}})} \\ = \sqrt{(0.0564) (0.9436) (\frac{1}{1200} + \frac{1}{1000})} \\ = \sqrt{0.0001} \\ = 0.0099 \end{array}$

$\begin{array}{l} z_{observed} & = \frac{({\hat{p}}_{1} - {\hat{p}}_{2}) - (p_{1} - p_{2})}{s_{{\hat{p}}_{1}} {_{-}}_{{\hat{p}}_{2}}} \\ = \frac{(0.05 - 0.064) - (0)}{0.0099} \\ = \frac{- 0.0140}{0.0099} \end{array}$

$z_{observed} = - 1.4141$

Then compute $p -value$ , which is area to left of $z = - 1.4141 = 0.0793$

Step 4: Make a decision.

Since the $p -value$ of 0.0793 exceeds $α = 0.01$ , do not reject $H_{0}$ .

We cannot conclude that the proportion of college graduates who are unemployed is significantly less than the proportion of high-school graduates who are unemployed.

A heart specialist wants to see if she can lower the cholesterol levels in 6 patients by enrolling these people in a rigorous six-month exercise program. The cholesterol levels for each of the patients before and after taking the exercise program are shown in the table below.

At $α = 0.05$ , did the cholesterol level decrease on average after taking the exercise program? Using the critical value approach, conduct the appropriate test of hypothesis. Show all key steps. Assume that the population of cholesterol level differences is normally distributed.

Patient	1	2	3	4	5	6
Before	243	216	214	222	206	219
After	215	202	198	195	204	213

Solution:

This is a mean of paired samples problem, where the two samples are dependent.

Let $d$ = (cholesterol level after program) – (cholesterol level before program)

To find $\bar{d} and s_{\bar{d}}$ , set up the following table:

Cholesterol level after program (A)	Cholesterol level before program (B)	Difference $d = A - B$	$d^{2}$
215	243	$- 28$	$784$
202	216	$- 14$	$196$
198	214	$- 16$	$256$
195	222	$- 27$	$729$
204	206	$- 2$	$4$
213	219	$- 6$	$36$
		$\sum d = - 93$	$\sum d^{2} = 2005$

$\bar{d} = \frac{\sum d}{n} = \frac{- 93}{6} = - 15.5$

$\begin{array}{l} s_{d} & = \sqrt{\frac{\sum d^{2} - \frac{{(\sum d)}^{2}}{n}}{n - 1}} \\ = \sqrt{\frac{2005 - \frac{{(- 93)}^{2}}{6}}{5}} \\ = 10.6160 \end{array}$

$s_{\bar{d}} = \frac{s_{d}}{\sqrt{n}} = \frac{10.6160}{\sqrt{6}} = 4.3339$

The hypothesis test:

Step 1: State the null hypothesis ( $H_{0}$ ) and the alternative hypothesis ( $H_{1}$ ). Test: (After $<$ Before) or (After – Before) $< 0$ , so:

$H_{0}$ : $μ_{d} \geq 0$ (cholesterol level did not decrease)

$H_{1}$ : $μ_{d} < 0$ (cholesterol level decreased)

Step 2: Select the distribution to use.

Since the population is normally distributed with the population standard deviation unknown, the $t$ distribution is used.

Step 3: Determine the rejection and non-rejection regions.

Since $H_{1}$ contains “ $<$ ”, this is a one-tailed test with the left tail having an area of $α = 0.05$ .

The $d f = n - 1 = 6 - 1 = 5$ .

solution

Step 4: Calculate the test statistic.

$t_{observed} = \frac{\bar{d} - μ_{d}}{s_{\bar{d}}} = \frac{- 15.5 - 0}{4.3339} = - 3.5765$

Step 5: Make a decision.

Since the $t_{observed}$ falls in the rejection region, we reject $H_{0}$ . Conclusion: The mean cholesterol level has decreased after taking the exercise program.

A government health care agency reported, based on a random sample of 16 women who have health insurance, that insured women spend on average 2.3 days in the hospital for a routine childbirth. Based on a separate independent random sample of 16 women who do not have health insurance, uninsured women reportedly spend on average 1.9 days in the hospital for routine childbirth. The standard deviation of the first sample is equal to 0.6 day, and the standard deviation of the second sample is 0.3 day.

At $α = 0.01$ , test the claim that the mean hospital stay is the same for insured and uninsured women. Assume both samples come from normal populations with equal variances. Show all key steps in using the critical value approach.

Solution:

Let $μ_{1}$ = population mean hospital stay for insured women.

Let $μ_{2}$ = population mean hospital stay for uninsured women.

Step 1: State the null hypothesis ( $H_{0}$ ) and the alternative hypothesis ( $H_{1}$ ).

$H_{0}$ : $μ_{1} - μ_{2} = 0$ (mean hospital stay is the same for both populations)

$H_{1}$ : $μ_{1} - μ_{2} \neq 0$ (mean hospital stay differs between both populations)

Step 2: Select the distribution to use.

Since the populations are normally distributed with population standard deviations unknown, the $t$ distribution is used.

Step 3: Determine the rejection and non-rejection regions.

Since $H_{1}$ contains “ $\neq$ ”, this is a two-tailed test with the combined areas of both rejection regions equal to 0.01. The $d f = n_{1} + n_{2} - 2 = 16 + 16 - 2 = 30$ .

Step 4: Calculate the value of the test statistic.

$t_{observed} = \frac{({\bar{x}}_{1} - {\bar{x}}_{2}) - (μ_{1} - μ_{2})}{s_{{\bar{x}}_{1} - {\bar{x}}_{2}}}$ , where ${\bar{x}}_{1} = 2.3$ and ${\bar{x}}_{2} = 1.9$ , where $(μ_{1} - μ_{2}) = 0$ , assuming $H_{0}$ is true.

$\begin{array}{l} s_{p} & = \sqrt{\frac{(n_{1} - 1) s_{1}^{2} + (n_{2} - 1) s_{2}^{2}}{n_{1} + n_{2} - 2}} \\ = \sqrt{\frac{(16 - 1) {0.6}^{2} + (16 - 1) {0.3}^{2}}{16 + 16 - 2}} \\ = \sqrt{\frac{5.4 + 1.35}{30}} \\ = 0.4743 \end{array}$

$\begin{array}{l} {}^{s}{\bar{x}}_{1} - {\bar{x}}_{2} & = s_{p} \sqrt{{\frac{1}{n}}_{1} + {\frac{1}{n}}_{2}} \\ = 0.4743 \sqrt{\frac{1}{16} + \frac{1}{16}} \\ = (0.4743) (0.3536) \\ = 0.1677 \end{array}$

$\begin{array}{l} t_{observed} & = \frac{({\bar{x}}_{1} - {\bar{x}}_{2}) - (μ_{1} - μ_{2})}{s_{{\bar{x}}_{1} - {\bar{x}}_{2}}} \\ = \frac{(2.3 - 1.9) - (0)}{0.1677} \\ = 2.3852 \end{array}$

Step 5: Make a decision.

Since $t_{observed} = 2.3852$ falls in the non-rejection region, do not reject $H_{0}$ . We cannot conclude that there is a significant difference in the mean hospital stay of insured women versus uninsured women.

To test the effectiveness of a new drug, a pharmaceutical manufacturer randomly selected 100 patients and cross-classified the results in the following table. At $α = 0.10$ , can the researcher conclude that the effectiveness of the drug is related to gender? Show all the key steps of the critical value approach.

	Effective	Not effective
Female	35	15
Male	20	30

Solution:

Step 1: State the null hypothesis ( $H_{0}$ ) and alternative hypothesis ( $H_{1}$ ).

$H_{0}$ : The effectiveness of the drug is independent of gender.

$H_{1}$ : The effectiveness of the drug is related to gender.

Step 2: Select the $x^{2}$ distribution for tests involving the relation between two categorical variables.

Step 3: Determine the rejection and non-rejection regions.

Since this is a test for independence, it is a right -tailed test with $α = 0.10$ with $d f = (R - 1) (C - 1) = (2 - 1) (2 - 1) = 1$ , where $R$ = the number of rows and $C$ = the number of columns in the contingency table given above.

$n$ = sample size = $100$

Note that the critical $value = 2.706$ as shown below.

solution

Step 4: Calculate the value of the test statistic.

$x^{2}_{observed} = \sum \frac{{(O - E)}^{2}}{E}$ where this can be found using a tabular method as follows:

	Observed (O)	Expected (E)	${(O - E)}^{2} ⁄ E$
Female and Effective	35	27.50	$2.0455$
Female and Ineffective	15	22.50	$2.50$
Male and Effective	20	27.50	$2.0455$
Male and Ineffective	30	22.50	$2.50$
			$9.0910 = x^{2}_{observed}$

Note that $E = \frac{(row total) × (column total)}{sample size}$

Step 5: Make a decision.

Reject $H_{0}$ as $x^{2}_{observed}$ falls in the rejection region.

Conclusion: The effectiveness of the drug is related to gender.

A nutritionist wishes to see whether there is any difference in the mean weight loss of individuals following one of three special diets. Individuals are randomly assigned to three groups and placed on the diet for 10 weeks. The weight losses (in kg) are shown in the table below. At $α = 0.05$ , can the nutritionist conclude that there is a difference in the mean weight loss between the three diets? Given that all the necessary assumptions are satisfied (three normally distributed populations, etc.), use the critical value method and show all key steps.

Protein-based diet	Carbohydrates-based diet	Fats-based diet
2	5	3
3	6	2
3	5	2
1	7	4
	4
	3

Solution:

To test the difference in the mean weight loss between diets, follow the steps below.

Category (type of diet)	Protein-based diet	Carbohydrates-based diet	Fats-based diet
Weight loss in kilograms	2	5	3
	3	6	2
	3	5	2
	1	7	4
		4
		3
Total weight loss in each group	$T_{1} = 9$	$T_{2} = 30$	$T_{3} = 11$
Size of each sample	$n_{1} = 4$	$n_{2} = 6$	$n_{3} = 4$

$k = number of groups = 3$ $n = total sample size = 14 individuals$

$\sum x = T_{1} + T_{2} + T_{3} = 9 + 30 + 11 = 50 =$ total weight loss across all three diets

$Σ x^{2} = {(2)}^{2} + {(5)}^{2} + {(3)}^{2} + . . . + {(4)}^{2} = 216$

Between-samples sum of squares
$\begin{array}{l} (S S B) & = (\frac{9^{2}}{4} + \frac{30^{2}}{6} + \frac{11^{2}}{4}) - (\frac{50^{2}}{14}) \\ = 200.50 - 178.5714 \\ = 21.9286 \end{array}$

Within-samples sum of squares $(S S W) = 216 - 200.50 = 15.50$

Step 1: State the null hypothesis ( $H_{0}$ ) and the alternative hypothesis ( $H_{1}$ ).

$H_{0}$ : $μ_{1} = μ_{2} = μ_{3}$

$H_{1}$ : Not all three population means are equal.

Step 2: Select the distribution to use.

Because we are comparing the means of three normally distributed populations, we use the $F$ distribution.

Step 3: Determine the rejection and non-rejection regions.

Since this is a one-way ANOVA test, it is a right-tailed test with $α = 0.05$ with $d f (numerator) = k - 1 = 3 - 1 = 2$ and $d f (denominator) = n - k = 14 - 3 = 11$ .

solution

Step 4: Calculate the value of the test statistic.

$\begin{array}{l} F_{observed} & = \frac{MSB}{MSW} \\ = \frac{\frac{SSB}{(k - 1)}}{\frac{SSW}{(n - k)}} \\ = \frac{\frac{21.9286}{2}}{\frac{15.50}{11}} \\ = \frac{10.9643}{1.4091} \\ = 7.7811 \end{array}$

Step 5: Make a decision.

Since $F_{observed}$ falls in the rejection region, we reject $H_{0}$ . There is a difference in the mean weight loss between the three diets.

A large coffee-house chain wants to determine if its regular customers have a preference in the type of music that is played over the coffee houses’ speaker systems. A random sample of 100 regular customers is selected and the number of customers who prefer each type of music is recorded in the table below. At the 2.5% level of significance, can you reject the hypothesis that there is no difference in customer preference between the four types of music? Show all key steps for the critical value approach.

Music type	Country	Rock	Jazz	Classical
Number of customers	30	30	20	20

Solution:

This random sample may be classified as a multinomial experiment. To test whether the observed frequencies for an experiment follow a certain pattern or theoretical distribution, you use a goodness-of-fit test. Use the critical value approach.

To test the hypothesis, follow the steps below.

Step 1: State the null hypothesis ( $H_{0}$ ) and the alternative hypothesis ( $H_{1}$ ).

$H_{0}$ : There is no difference in customer preference between the four types of music.

$H_{1}$ : There is a difference in customer preference between the four types of music.

Step 2: Select the distribution to use.

Select the $x^{2}$ distribution for goodness-of-fit tests.

Step 3: Determine the rejection and non-rejection regions.

Since this is a goodness-of-fit test, it is a right-tailed test with $α = 0.025$ with $d f = k - 1 = 4 - 1 = 3$ , where $k$ = number of categories of music.

solution

Step 4: Calculate the value of the test statistic.

$p = \frac{1}{4} = 0.25$ is the probability that a random customer’s preference of music choice is in a given category.

$n$ = sample size = $100$ customers

The expected $value = E = n p = 100 * (0.25) = 25$ for each of the categories.

$x^{2}_{observed} = \sum \frac{{(O - E)}^{2}}{E} = 4$ determined as follows:

Category	Observed (O)	Expected (E)	${(O - E)}^{2} ⁄ E$
Country	30	25	$1$
Rock	30	25	$1$
Jazz	20	25	$1$
Classical	20	25	$1$
			$4 = x^{2}_{observed}$

Step 5: Make a decision.

Do not reject $H_{0}$ as $x^{2}_{observed}$ falls in the non-rejection region.

Conclusion: We cannot conclude that there is a difference in customer preference between the four types of music.

A random sample of the lifetimes of 24 watches manufactured under the same brand name displayed a standard deviation of 3.5 months. At the 1% level of significance, test the hypothesis that the population standard deviation of the lifetimes of this brand name of watch is less than 4 months. Show all key steps using the critical value method. Assume that the lifetimes of all the watches manufactured under this same brand name are normally distributed.

Solution:

Step 1: State the null hypothesis ( $H_{0}$ ) and the alternative hypothesis ( $H_{1}$ ).

$H_{0}$ : $σ \geq 4$

$H_{1}$ : $σ < 4$

Step 2: Select the distribution to use.

Since the population is normally distributed, we use the Chi-square distribution to test a hypothesis about the population variance.

Step 3: Determine the rejection and non-rejection regions.

Since $H_{1}$ contains “ $<$ ”, the rejection region is on the left side of the chi-square distribution, with an area of $0.01$ and $d f = 24 - 1 = 23$ .

Step 4: Calculate the value of the test statistic.

$\begin{array}{l} x^{2}_{observed} & = \frac{(n - 1) s^{2}}{σ^{2}} \\ = \frac{(23) 12.25}{16} \\ = 17.6094 \end{array}$

Step 5: Make a decision.

Since the test statistic falls in the non-rejection, we do not reject $H_{0}$ . There is no evidence that the standard deviation of the watch lifetimes is less than $4$ months.

TOP