Mathematics 215: Introduction to Statistics

Study Guide

Unit 6: Bivariate Analysis

In Unit 5, we considered tests of independence used to examine the relationship between two qualitative (categorical) variables. This unit focuses on bivariate analysis, which typically results in statistical inferences about the relationship between two quantitative variables.

Unit 6 describes two methods of analyzing bivariate data: regression analysis and correlation analysis.

In our discussion of regression analysis, we examine the simple linear regression model, where the estimated relation between two variables takes the form of a linear equation. The regression equation is often used to predict one variable (the dependent variable) based on the values of the other variable (the independent variable). In addition, the regression equation can be used to estimate the magnitude of change in the dependent variable resulting from a given change in the independent variable.

The method known as correlation analysis simply focuses on the strength of the linear relationship between two variables. If there is a strong positive correlation between two variables, it means that an increase in the value of one variable will be accompanied by an increase in the value of the second variable. On the other hand, when there is a strong negative correlation between two variables, an increase in the value of one variable will be accompanied by a decrease in the value of the second variable. Correlation analysis does not provide any information about the size of the change in one variable as a result of a specific change in the other variable.

Correlation and regression analysis have been used to make inferences in many different real-world situations. For years, medical researchers have noted the strong positive correlation between the frequency of cigarette smoking and the incidence of cancer. College counselors have emphasized the negative correlation between the hours that students work in a job (outside of their studies) and the grades that students can expect to achieve. Economists use regression analysis to predict the nation’s annual production rate based on annual consumer spending levels. Real-estate boards employ regression models to predict real-estate prices. Criminologists apply regression techniques to estimate the change in the crime rate based on changes in the unemployment rate.

Unit 6 of MATH 215 consists of the following sections:

6-1 Simple Linear Regression Analysis 6-2 Standard Deviation of Random Errors and the Coefficient of Determination 6-3 Inferences About the Slope of the Simple Linear Regression Model, B 6-4 Linear Correlation 6-5 Applying Correlation and Regression

The unit also contains a self-test. When you have completed the material for this unit, including the self-test, complete Assignment 6 (the final assignment).

Section 6-1: Simple Linear Regression Analysis

Outcomes

After completing the readings and exercises for this section, you should be able to do the following:

  1. define, and use in context, the following key terms:
    • simple regression
    • linear regression
    • equation of a regression model
    • population parameters of the simple linear regression model, A and B
    • random error term
    • estimated values of A and B, namely a and B, respectively
    • estimated regression model
    • predicted value of y
    • scatter diagram
  2. construct a scatter diagram based on sample data.
  3. find the estimated regression model (equation), given sample data.
  4. interpret the values of a and b based on the estimated regression model.
  5. plot the regression line.
  6. compute the error of prediction, e, for a given value of x.

Reading

Read the following sections in Chapter 13 of the textbook:

  • Chapter 13 Introduction
  • Section 13.1

Be prepared to read the material in Chapter 13 at least twice—the first time for a general overview of topics, and the second time to concentrate on the terms and examples presented. Return to these sections when you need to review these topics.

Supplementary Video Resources

These videos provide alternative explanations and further exploration of the concepts and techniques presented in Section 13.1 of the textbook.

Exercises

Complete the following exercises from Chapter 13 of the textbook (page numbers are for the downloadable eText):

  • Exercises 13.11, 13.15, 13.17, 13.19, and 13.21 on page 516

Show your work as you develop your answers.

Solutions are provided in the Student Solutions Manual for Chapter 13 (interactive textbook) and on page AN18 in the Answers to Selected Odd-Numbered Exercises section (downloadable eText).

Remember, it is very important that you make a concerted effort to answer each question independently before you refer to the solutions. If your answers differ from those provided and you cannot understand why, contact your tutor for assistance.

Section 6-2: Standard Deviation of Random Errors and the Coefficient of Determination

Outcomes

After completing the readings and exercises for this section, you should be able to do the following:

  1. define, and use in context, the following key terms:
    • standard deviation of errors
    • coefficient of determination
  2. compute the standard deviation of errors.
  3. compute the coefficient of determination, and interpret your answer.

Reading

Read Section 13.2 in Chapter 13 of the textbook.

Supplementary Video Resources

These videos provide alternative explanations and further exploration of the concepts and techniques presented in Section 13.2 of the textbook.

Exercises

Complete the following exercises from Chapter 13 of the textbook (page numbers are for the downloadable eText):

  • Exercises 13.29, 13.31, and 13.33 on page 523

Solutions are provided in the Student Solutions Manual for Chapter 13 (interactive textbook) and on page AN18 in the Answers to Selected Odd-Numbered Exercises section (downloadable eText).

Section 6-3: Inferences About the Slope of the Simple Linear Regression Model, B

Outcomes

After completing the readings and exercises for this section, you should be able to do the following:

  1. define, and use in context, the term “sampling distribution of b.”
  2. construct a confidence interval for B.
  3. conduct tests of hypotheses about B when H 0 is B=0 .

Reading

Read Section 13.3 in Chapter 13 of the textbook:

Note: Omit the subsection titled “Using the p-Value to Make a Decision” on pages 526–527.

Supplementary Video Resources

These videos provide alternative explanations and further exploration of the concepts and techniques presented in Section 13.3 of the textbook.

Exercises

Complete the following exercises from Chapter 13 of the textbook (page numbers are for the downloadable eText):

  • Exercises 13.37, 13.39, 13.41, and 13.43 on pages 527–528

Solutions are provided in the Student Solutions Manual for Chapter 13 (interactive textbook) and on page AN18 in the Answers to Selected Odd-Numbered Exercises section (downloadable eText).

Section 6-4: Linear Correlation

Outcomes

After completing the readings and exercises for this section, you should be able to do the following:

  1. define, and use in context, the following key terms:
    • linear correlation
    • positive linear correlation, negative linear correlation, and zero linear correlation
    • perfect positive linear correlation
    • perfect negative linear correlation
  2. compute the linear correlation coefficient, given population or sample data, and interpret your answer.
  3. conduct tests of hypotheses about the population linear correlation coefficient.

Reading

Read Section 13.4 in Chapter 13 of the textbook:

Note: Omit the subsection titled “Using the p-Value to Make a Decision” on page 532.

Supplementary Video Resources

These videos provide alternative explanations and further exploration of the concepts and techniques presented in Section 13.4 of the textbook.

Exercises

Complete the following exercises from Chapter 13 of the textbook (page numbers are for the downloadable eText):

  • Exercises 13.45, 13.47, 13.49, 13.51, 13.53, 13.55, 13.57, and 13.59 on pages 532–533

    Solutions are provided in the Student Solutions Manual for Chapter 13 (interactive textbook) and on page AN18 in the Answers to Selected Odd-Numbered Exercises section (downloadable eText).

Section 6-5: Applying Correlation and Regression

Outcomes

After completing the readings and exercises for this section, you should be able to do the following:

  1. define, and use in context, the term “prediction interval.”
  2. demonstrate an understanding of the nature of the linear relation between two variables by
    1. computing and interpreting the correlation coefficient and the coefficient of determination
    2. testing hypotheses relating to the population correlation coefficient
    3. determining the least squares regression equation and interpreting the coefficients a and b
    4. plotting the scatter diagram and the regression line
    5. constructing confidence intervals for B
    6. testing hypotheses related to B
    7. computing a point estimate of the dependent variable, given a value for the independent variable
  3. use the regression model to construct confidence intervals for estimating the mean value of y, given a value of x.
  4. use the regression model to construct confidence intervals for predicting a particular value of y, given a value of x.

Reading

Read the following sections in Chapter 13 of the textbook:

  • Section 13.5

    Note: Omit the section titled “Using the p-Value to Make a Decision” for the hypothesis test on B (page 537) and the hypothesis test on the linear correlation coefficient (page 538).

  • Section 13.6

Supplementary Video Resources

These videos provide alternative explanations and further exploration of the concepts and techniques presented in the assigned textbook readings.

Video Related to Section 13.5

Video Related to Section 13.6

Exercises

  1. Complete the following exercises from Chapter 13 of the textbook (page numbers are for the downloadable eText):
    • Exercises 13.63 and 13.65 on page 539
    • Exercises 13.69 and 13.71 on page 543
  2. Complete Questions 1–15 of the Self-Review Test for Chapter 13 (pages 547–548 of the downloadable eText).

    Solutions are provided in the Student Solutions Manual for Chapter 13 (interactive textbook) and on pages AN18 and AN19 in the Answers to Selected Odd-Numbered Exercises section (downloadable eText).

  3. Complete the Unit 3 Self-Test below.

Note: At the end of each chapter of the textbook, there are instructions for how to complete the statistical calculations, graphs, and processes for that chapter using a TI-84 calculator, Microsoft Excel, and Minitab. You are not required to use a TI-84 calculator or to learn these statistical software programs for MATH 215. However, if you happen to have access to this calculator or these applications, you may use them to double-check your work.

You are also not permitted to use a TI-84 calculator, Microsoft Excel, or Minitab on the midterm or the final exam for this course. The only calculator you are allowed to bring into the exam room is the Texas Instruments TI-30Xa Scientific Calculator. You should familiarize yourself with its functionality now so that you can complete the calculations as required on the assignments and exams.

See the Calculators section of the Course Orientation for more information.

Optional Extra Practice

For extra practice with the material presented in this section, you can complete the following questions and exercises, for which the solutions are provided in the textbook:

  1. Any odd-numbered chapter-section practice questions that are not assigned above.
  2. The odd-numbered Supplementary Exercises and Advanced Exercises at the end of Chapter 13 (pages 545–547 of the downloadable eText).

Assignment 6

Once you have completed the Unit 6 Self-Test below, complete Assignment 6. You can access the assignment in the Assessment section of the course home page. Once you have completed the assignment, submit it to your tutor for marking using the drop box on the page for Assignment 6.

Unit 6 Self Test

The self-test questions are shown here for your information. Download the Unit 6 Self-Test document and write out your answers. Show all your work and keep your calculations to four decimal places, unless otherwise stated. You can access the solutions to this self-test on the course home page.

  1. Circle True (T) or False (F) for each of the following:
    1. TF
      In regression analysis, the variable that is being explained is called the independent variable.
    2. TF
      In the estimated regression equation y ^ =a+bx , b is the slope of the regression line.
    3. TF
      The number of degrees of freedom for a simple linear regression model is ( n1 ).
    4. TF
      The coefficient of variation measures the proportion of the variation in y explained by the variation in x.
    5. TF
      When conducting a test of hypothesis regarding B in the simple linear regression model, the null hypothesis of “ B=0 ” states that x will not be a significant predictor of y.
    6. TF
      In correlation analysis, if there is a perfect positive correlation between x and y, then the value of r, the correlation coefficient, equals 1.
  2. A fast food chain would like to predict the number of breakfast burgers that an individual will consume per month, based on the price charged per burger. A random sampling of the breakfast burger consumption of six individuals results in the data recorded in the following table.

    Price 1.50 3.00 2.50 4.00 2.00 1.00
    Number of burgers
    consumed per month
    10 6 8 3 9 12
    1. Which variable is the independent variable, x, and which is the dependent variable, y?
    2. Find the equation of the least squares regression line. Keep all your work to 4 decimal places.
    3. Construct a scatter diagram for the sample data for the six randomly selected individuals. The price is to be measured on the horizontal axis, and the number of burgers consumed is to be indicated on the vertical axis. Plot the estimated regression line on the same set of axes. In plotting the regression line, compute y ^  for x=1 and then for x=4 .

      diagram

    4. Based on the results from part c. above, would you hypothesize a positive or negative linear relation between the price and the number of burgers consumed?
    5. Interpret the meaning of the slope of the estimated regression line.
    6. Compute the coefficient of determination and interpret your answer. Keep your work to 4 decimal places.
    7. Test at the 1% significance level whether B (the population regression line slope) is negative. Keep your work to 4 decimal places.
    8. Construct a 99% confidence interval for B. Keep your work to 4 decimal places.
    9. Predict (using a point estimate) the number of burgers consumed per month if the price per burger is set to $3.50. Keep all your work to 4 decimal places.
  3. A sleep therapist is examining the relationship between a person’s age and their hours of sleep on a typical weeknight. The therapist selects eight people at random. These people’s ages are paired with their hours of sleep in the following table.

    Age (x) 70 24 36 42 65 58 29 50
    Hours of sleep (y) 11  5  7  8 11 11  6  8

    In answering the following, you may use the following sums of squares and cross products. Keep all your work to 4 decimal places.

    x= 374 y= 67 S S xx = 1981.5 S S yy =39.875 S S xy =272.75 s e =0.6215

    1. Find the least squares regression line y ^ =a+bx .
    2. Compute the simple linear correlation coefficient, r. Interpret your answer.
    3. Using the 1% significance level, test whether the population linear correlation coefficient is different from zero.
    4. What percentage of the total variation in weeknight hours of sleep occurs because of the variation in age?
    5. Construct a 90% prediction interval for the weeknight hours of sleep for a 40-year-old individual.