Mathematics 215: Introduction to Statistics

Study Guide

Unit 6: Self-Test Answer Key

Show all your work and keep your calculations to four decimal places, unless otherwise stated.

  1. Circle True (T) or False (F) for each of the following:
    1. TFalse is correct answerF
      In regression analysis, the variable that is being explained is called the independent variable.
    2. True is correct answerTF
      In the estimated regression equation y ^ =a+bx , b is the slope of the regression line.
    3. TFalse is correct answerF
      The number of degrees of freedom for a simple linear regression model is ( n1 ).
    4. True is correct answerTF
      The coefficient of variation measures the proportion of the variation in y explained by the variation in x .
    5. True is correct answerTF
      When conducting a test of hypothesis regarding B in the simple linear regression model, the null hypothesis of B=0 states that x will not be a significant predictor of y .
    6. True is correct answerTF
      In correlation analysis, if there is a perfect positive correlation between x and y , then the value of r , the correlation coefficient, equals 1.
  2. A fast food chain would like to predict the number of breakfast burgers that an individual will consume per month, based on the price charged per burger. A random sampling of the breakfast burger consumption of six individuals results in the data recorded in the following table.

    Price 1.50 3.00 2.50 4.00 2.00 1.00
    Number of burgers
    consumed per month
    10  6  8  3  9 12
    1. Which variable is the independent variable, x , and which is the dependent variable, y ?

      Solution:

      The variable that is to be predicted, number of breakfast burgers consumed per month, is the dependent variable ( y ). The other variable, price (the basis used to predict burgers consumed), is the independent variable ( x ).

    2. Find the equation of the least squares regression line. Keep all your work to 4 decimal places.

      Solution:

      To find a and b in the estimated regression equation y ^ =a+bx , as well as the answers to subsequent questions, we suggest forming the following table to work out x , y , S S xx , S S xy , and S S yy .

      Price per burger Number of burgers consumed per month
      x y x 2 y 2 xy
      1.5 10 2.25 100 15
        3 6    9 36 18
      2.5 8 6.25 64 20
        4 3   16   9 12
        2 9    4 81 18
        1 12    1 144 12
      x=14 y=48 x 2 =38.5 y 2 =434 xy=95

      n = number of individuals involved in the study = sample size = 6

        x ¯ = Σx n = 14 6 =2.3333 , and y ¯ = Σy n = 48 6 =8

      SS xx = x 2 x 2 n =38.5 14 2 6 =38.532.6667 =5.8333

      SS xy =xy x y n =95 14 48 6 =95112 =17

      SS yy = y 2 y 2 n =434 48 2 6 =434384 =50

      b=slope = SS xy SS xx = 17 5.8333 =2.9143

      a=intercept = y ¯ b x ¯ =8 2.9143 2.3333 =14.7999

      Regression equation (line): y ^ =a+bx or y ^ =14.79992.9143x

    3. Construct a scatter diagram for the sample data for the six randomly selected individuals. The price is to be measured on the horizontal axis, and the number of burgers consumed is to be indicated on the vertical axis. Plot the estimated regression line on the same set of axes. In plotting the regression line, compute y ^  for x=1 and then for x=4 .

      Solution:

      When x=1 , y ^  =14.79992.9143 1 =11.8856

      When x=4 , y ^  =14.79992.9143 4 =3.1427

      solution

    4. Based on the results from part c. above, would you hypothesize a positive or negative linear relation between the price and the number of burgers consumed?

      Solution:

      Since the regression line is downward-sloping, I hypothesize a negative linear relation between the price and the number of burgers consumed.

    5. Interpret the meaning of the slope of the estimated regression line.

      Solution:

      The slope b=2.9143 suggests that for every one-dollar increase in price, the number of burgers consumed per month will decrease by 2.9143.

    6. Compute the coefficient of determination and interpret your answer. Keep your work to 4 decimal places.

      Solution:

      The coefficient of determination r 2 = b SS xy SS yy = 2.9143 17 50 =0.9909 .

      The coefficient of determination of 0.9909 means that 99.09 % of the total variation in monthly burger consumption is explained by the variation in the burger price.

    7. Test at the 1% significance level whether B (the population regression line slope) is negative. Keep your work to 4 decimal places.

      Solution:

      Step 1: State the null hypothesis ( H 0 ) and the alternative hypothesis ( H 1 ).

      H 0 : B=0

      H 1 : B<0 (the population regression line slope is negative)

      Step 2: Select the distribution to use.

      Use the t distribution.

      Step 3: Determine the rejection and non-rejection regions.

      Since H 1 contains “ < ”, this is a one-tailed test with the left tail having an area of α=0.01 .

      The df=n-2=6-2=4 .

      solution

      Step 4: Calculate the value of the test statistic. Keep all your work to 4 decimal places.

      t observed = bB s b ,where b=2.9143 and B=0 , assuming H 0 is true.

      s b = s e SS xx , where
      s e = SS yy b SS xy n2 = 50(2.9143)(17) 4 =0.3379,so
      s b = 0.3379 5.8333 =0.1399 , and so
      t observed = bB s b = 2.91430 0.1399 =20.8313.

      Step 5: Make a decision.

      Since t observed falls in the rejection region, we reject H 0 .
      We conclude that there is significant evidence from the study to indicate that the slope of the population regression line is negative.

    8. Construct a 99% confidence interval for B . Keep your work to 4 decimal places.

      Solution:

      The formula for the confidence interval for B is b± t s b , where b=2.9143 .

      s b =0.1399 from part g. above.

      To find t , df=n-2=6-2=4 , and the right tail area is equal to 0.01 2 =0.005 , as shown:

      solution

      The 99% confidence interval is

      2.9143 ± 4.604 0.1399 =2.9143 ± 0.6441 =3.5584 to2.2702.

    9. Predict (using a point estimate) the number of burgers consumed per month if the price per burger is set to $3.50. Keep all your work to 4 decimal places.

      Solution:

      The estimated regression equation is y ^ =14.7999-2.9143x .

      Point estimate: y ^ =14.7999-2.9143* 3.50 x=14.7999-10.2001=4.5998 burgers per month

  3. A sleep therapist is examining the relationship between a person’s age and their hours of sleep on a typical weeknight. The therapist selects eight people at random. These people’s ages are paired with their hours of sleep in the following table.

    Age (x) 70 24 36 42 65 58 29 50
    Hours of sleep (y) 11  5  7  8 11 11  6  8

    In answering the following, you may use the following sums of squares and cross products. Keep all your work to 4 decimal places.

    n = sample size = number of individuals participating in the study = 8

    x =374 y =67 S S xx =1981.5 S S yy =39.875 S S xy =272.75 s e =0.6215

    1. Find the least squares regression line y ^ =a+bx .

      Solution:

      b=slope = SS xy SS xx = 272.75 1981.5 =0.1377

      a=intercept = y ¯ b x ¯ =8.375 0.1377 46.75 =1.9375

      Note that the mean of the x-values in the data set is   374 8 =46.75 , and the mean of the y-values in the data set is 67 8 =8.375 .

      Regression equation: y ^ =1.9375+0.1377x

    2. Compute the simple linear correlation coefficient, r . Interpret your answer.

      Solution:

      r = SS xy ( SS xx )( SS yy ) = 272.75 (1981.5)(39.875) = 272.75 281.0913 =0.9703

      Since the linear correlation coefficient is close to +1, this means that there is a strong positive linear correlation between a person’s age and their hours of sleep on a typical weeknight. In other words, the higher the person’s age, the greater their hours of sleep on a typical weeknight.

    3. Using the 1% significance level, test whether the population linear correlation coefficient is different from zero.

      Solution:

      Step 1: State the null hypothesis ( H 0 ) and the alternative hypothesis ( H 1 ).

      H 0 : ρ=0

      H 1 : ρ0 (population linear correlation coefficient differs from zero)

      Step 2: Select the distribution to use. The t distribution is to be used.

      Step 3: Determine the rejection and non-rejection regions.

      Since H 1 contains “ ”, this is a two-tailed test with each tail having an area of alpha 2 = 0.01 2 =0.005 .

      The df=n-2=8-2=6 .

      solution

      Step 4: Calculate the value of the test statistic.

      t observed =r n2 1 r 2 =0.9703 82 1 0.9703 2 =0.9703 6 0.0585 =9.8266

      Step 5: Make a decision.

      Since the t observed falls in the rejection region, we reject H 0 .

      Conclusion: There is evidence to indicate that the population linear correlation coefficient is different from zero within a significance level of 1%.

    4. What percentage of the total variation in weeknight hours of sleep occurs because of the variation in age?

      Solution:

      This question is asking you to find the coefficient of determination, which is r 2 . Since we know from part b. above that r=0.9703 , the coefficient of determination equals 0.9703 2 =0.9415 .

      This means that 94.15% of the variation in the nightly hours of sleep can be explained by the variation in age.

    5. Construct a 90% prediction interval for the weeknight hours of sleep for a 40-year-old individual.

      Solution:

      The formula for the prediction interval is y ^ ±t s y ^ p ,
      where y ^ =1.9375+0.1377 40 =7.4455 .

      To find t :
      df=8-2=6 , and the right tail area= 0.10 2 =0.05 , as shown:

      solution

      So, t=1.943 .

      To find
      s y ^ p = s e 1+ 1 n + ( x 0 x ¯ ) 2 SS xx =0.6215 1+ 1 8 + (4046.75) 2 1981.5 = 0.6215 1+0.125+0.0230 = 0.6215 1.0714 =0.6659,

      the prediction interval, y ^ ±t s y ^ p , is

      7.4455± 1.943 0.6659 =7.4455±1.2938 =6.1517to8.7393.