Mathematics 215: Introduction to Statistics

Study Guide

Unit 1: Self-Test Answer Key

Show all your work and keep your calculations to four decimal places.

  1. The following short survey was completed by nine randomly selected regular, paying customers who shop at Savemore, a large supermarket located on the outskirts of a large city.

    1. Please indicate your area of residence: _____ Rural  _____Urban
    2. Please indicate your annual family income before taxes (nearest $1,000): $__________
    3. Please estimate the number of visits you make to Savemore in a typical month: ______.

    The survey responses of the nine randomly selected, regular customers are summarized in the table below.

    Customer last name Residence Annual income (000’s) Visits
    Jackson rural 85 10
    Wong urban 65  4
    Manderson urban 60 22
    Miles rural 66  8
    Selenas urban 78  7
    Goldman urban 75  5
    Jang urban 72  9
    Jones rural 64 24
    Khan urban 68  4
    1. Referring to the Savemore survey, fill in the blanks for each question below.
      1. The name “Jackson” would be referred to as an element.
      2. The first income value of “85” would be referred to as an observation.
      3. How many variables does the Savemore short survey focus on? Three.
      4. The responses to the “residence” question would be classed as qualitative.
      5. The responses to the “income” question would be classed as quantitative.
      6. The responses to the “visits” question would be classed as discrete.
      7. The responses to the “income” question would be classed as continuous.
      8. Data collected from the Savemore survey would be called cross-section data.
      9. The measure of central tendency most appropriate for summarizing the “residence” question responses would be the mode [as it is qualitative].
      10. If Savemore plans to use the nine customer survey responses to make decisions regarding ALL Savemore customers, this would be an example of inferential statistics.
      11. Would mean or median be the better measure of central tendency for the responses to the “visits” data? Explain.

        Answer: Median. Since there are two non-typical extremely large numbers of visits, 22 and 24, the data is positively skewed. When skewed, the median is not affected by the non-typical extreme values, whereas the mean is.

      12. Suppose the population distribution of incomes related to survey question 2 (above) has a mean of $70,000 and a standard deviation of $20,000. If the shape of the population distribution is unknown, 1 1 k 2 =1 1 4 = 3 4 =75% of the population earns between $30,000 and $110,000.

        Further explanation: This answer is based on Chebyshev’s theorem and the fact that the interval $30,000 to $110,000 is within 2 standard deviations from the mean of $70,000. In this case, k=2 .

      13. Suppose the population distribution of incomes related to survey question 2 (above) has a mean of $70,000 and a standard deviation of $20,000. If the shape of the population distribution is bell-shaped, 95% of the population earns between $30,000 and $110,000.

        Further explanation: This answer is based on the Empirical rule and the fact that the interval $30,000 to $110,000 is within 2 standard deviations from the mean of $70,000.

      14. Provide two possible reasons why Savemore did not use a census survey to collect information from its customers:

        A census survey is very expensive and, at the same time, it may prove almost impossible to get a response from all the customers, unless the collection of data is done over a very long time, which is impractical.

      15. If Savemore selected the sample of customers in a way that ensured each customer had an equal chance of being selected, this would be called a simple random sample.
      16. Suppose that Savemore plans to estimate the true average annual income (before taxes) for ALL its regular paying customers, based on computing the average annual income of the sample of nine regular paying customers surveyed. The fact that the mean income calculation of the sample will differ from the population mean income, because the sample is a subset of the population, leads to a sampling error.
      17. If Savemore selected the sample of customers in a way that ensured that three customers were randomly selected from rural residences and six customers were selected from urban areas, this would be an example of a stratified random sample.
    2. Referring to the Savemore survey responses, determine the median customer monthly visits.

      Solution:

      Step 1: Order the data:

      4 4 5 7 8 9 10 22 24

      Step 2: Median = middle value = 8 visits

    3. Referring to the Savemore survey responses, determine the mean customer monthly visits. By comparing the mean with the median customer monthly visits, determine if the distribution of the visits is positively or negatively skewed. Explain. In computing the mean, keep your work to four decimals.

      Solution:

      x x 2
      10 100
      4 16
      22 484
      8 64
      7 49
      5 25
      9 81
      24 576
      4 16
      x=93 x 2 =1411

      Samplemean= x n = 93 9 =10.3333

      Since the mean (10.3333) is greater than the median (8), the distribution of the visits is positively skewed. This means that the majority of the data is less than the mean.

    4. Calculate the standard deviation for the customer monthly visits. Use the short-cut method. Keep your work to four decimals.

      Solution:

      Refer to the column sums in part c. above to understand the following:

      Sample standard deviation=s = x 2 x 2 n n1 = 1411 93 2 9 91 = 1411 961 8 = 450 8 = 56.25 =7.5

    5. Referring to the Savemore survey responses, which variable—income or visits—exhibits greater relative variability? Show the appropriate calculations. Hint: The standard deviation of the nine monthly incomes equals 7.8899.

      Solution:

      The coefficient of variation ( CV ) measures relative variability as follows:

      CVforvisits = Sample standard deviation Sample mean ×100 = 7.5 10.3333 ×100 =72.5809%

      To find CV for income, we first must find the mean for income as follows:

      Samplemean= x n = 633 9 =70.3333  (000’s)

      CVforincome = Sample standard deviation Sample mean ×100 = 7.8899 70.333 ×100 =11.2179%

      Conclusion: The visits variable exhibits a greater degree of variability (72.5809% versus 11.2179%).

    6. Compute the first quartile ( Q1 ) and third quartile ( Q3 ) for customer monthly visits. Interpret your answers.

      Solution:

      Step 1: Order the data:

      4 4 5 7 8 9 10 22 24

      Step 2: Find Q2 = median = middle value = 8 visits

      Step 3: Find Q1 , which is the median of all the values less than Q2 :

      4 4 5 7

      Q1= 4+5 2 =4.5 , which means that approximately 1⁄4 or 25% of the customers make fewer than 4.5 visits to Savemore per month.

      Step 4: Find Q3 , which is the median of all the values greater than Q2 :

      9 10 22 24

      Q3= 10+22 2 =16 , which means that approximately 3⁄4 or 75% of the customers make fewer than 16 visits to Savemore per month.

    7. Calculate the interquartile range ( IQR ) for monthly customer visits. Interpret your answer.

      Solution:

      IQR=Q3Q1=164.5=11.5

      The middle 50% of the customer visits lie within a range of 11.5 visits. It is a measure of variation (dispersion) for the middle 50% of the values in a distribution. The larger this range, the more variable the data.

    8. Calculate the percentile rank of the customer who typically makes 10 visits per month. Interpret your answer.

      Solution:

      Step 1: Order the data:

      4 4 5 7 8 9 10 22 24

      Step 2: Find the percentile rank:

      Since 6 visit values are less than 10, the percentile rank is 6 9 ×100=66.6667% . This means that 67% of the customer visits are less than 10 visits per month.

    9. Construct a stem-and-leaf display for the annual customer income data. Display the leaves in ascending order.

      Solution:

      Order the data:

      60 64 65 66 68 72 75 78 85

      Stem-and-Leaf Display for Income (000’s)

      6
      7
      8
      04568
      258
      5
    10. Sketch a box-and-whisker plot for the annual customer family income data. In your sketch, indicate all the quartiles and the minimum and maximum values. Using the appropriate computations, determine if there are mild outliers.

      Solution:

      Box-and-Whisker Plot for Income (000’s)

      Box-and-Whisker Plot for Income (000's)

      Based on the plot: the minimum is 60; the maximum is 85; Q1 is 64.5; Q2 is 68; Q3 is 76.5.

      The data is positively skewed, as the median is to the left of the center of the box, and the right whisker is longer than the left whisker.

      The IQR=Q3Q1=12 , and 1.5×IQR=1.5×12=18 .
      The lower inner fence =Q118=64.518=46.5 .
      The upper inner fence =Q3+18=76.5+18=94.5 .

      Since none of the data values is below 46.5 or above 94.5, there are no outliers.

  2. A government healthcare clinic that serves older adults is trying to decide whether to open up a clinic location in a new subdivision. The clinic surveyed a random sample of 20 homeowners and recorded their ages as follows:

    81 77 56 44 86
    65 84 74 72 71
    79 63 83 82 58
    52 41 68 67 88
    1. Construct a frequency distribution for the ages above, using a lower limit for the first class of 40 and a class width of 10. In your distribution, include the class limits, the class boundaries, the class midpoints, the frequency, the relative frequency and the cumulative frequency.

      Solution:

      Age Class boundaries Class midpoints Frequency Relative frequency Cumulative frequency
      40 – 49 39.5 – 49.5 44.5 2 0.10  2
      50 – 59 49.5 – 59.5 54.5 3 0.15  5
      60 – 69 59.5 – 69.5 64.5 4 0.20  9
      70 – 79 69.5 – 79.5 74.5 5 0.25 14
      80 – 89 79.5 – 89.5 84.5 6 0.30 20
    2. Construct a percentage histogram for the frequency distribution above. Use a ruler and the grid provided below.

      Solution:

      Percent Frequency

    3. Is the distribution of the ages for the 20 homeowners described above skewed? If so, in which direction is the skew? Does the skewness of the age data support the decision to open a new clinic location in this subdivision? Explain.

      Solution:

      The ages are negatively skewed. This means that the majority of the ages are above average, which supports the decision to locate in this subdivision.

    4. Construct an ogive for the frequency distribution above. Use a ruler and the grid provided below. Hint: An ogive is a curve that describes the cumulative frequencies. It does this by joining with lines the dots marked above the upper boundaries of classes at heights that are equal to the cumulative frequencies of the respective classes.

      Ogive for Variable Age

    5. What percentage of the twenty ages are below 60?

      Solution:

      Based on the Cumulative Frequency column in the frequency table above, or based on the ogive above, 5 or 5⁄20 = 25% of the ages are below 60.

    6. What percentage of the twenty ages are 70 or above?

      Solution:

      Based on the Cumulative Frequency column in the frequency table above, or based on the ogive above, 9 of the ages are below 70, which means 11 or 11⁄20 = 55% of the ages are 70 or above.

  3. The table below describes the commute times for all 30 employees working at a car dealership.

    Commute time
    (in minutes)
    Frequency
    (# of employees)
     1 – 15 10
    16 – 30  9
    31 – 45  8
    46 – 60  2
    61 – 75  1
    1. Would you classify the 30 commute times as sample or population data? Explain.

      Solution:

      The 30 commute times are classed as population data, as this data includes ALL thirty employees of the car dealership.

    2. Compute the mean, standard deviation and variance for the commute times displayed in the distribution above. Use the short-cut method when computing the variance and the standard deviation.

      Solution:

      Frequency (f) Midpoint (m) m×f m 2 f
      10  8 80    640
       9 23 207 4,761
       8 38 304 11,552
       2 53 106 5,618
       1 68 68 4,624
          mf=765 m 2 f=27,195

      Populationmean= mf N = 765 30 =25.5

      Populationstandarddeviation = m 2 f mf 2 N 30 = 27195 765 2 30 30 = 2719519507.50 30 = 7687.50 30 = 256.50 =16.0078

      Populationvariance= Standarddeviation 2  = 16.0078 2 =256.2497

    3. Based on your observation of the skewness of the distribution of the commute times, is the median commute time smaller or larger than the mean commute time? Explain.

      Solution:

      The commute times are positively skewed, which means that the majority of the times are below average. This implies that the mean exceeds the median.

  4. The Golf Depot sold all 100 KPOW-SUPRA golf sets at different prices during the 2018 golf season, as follows.

    Month in 2018 Number of golf sets(w) Price per golf set(x) xw
    May 35 $1,200 $42,000
    June 25 $1,100 $27,500
    July 20 $1,000 $20,000
    August 15   $900 $13,500
    September  5   $800   $4,000
    Sums w=100   xw=$107,000

    Compute the overall average price (per golf set) for the 100 KPOW-SUPRA golf sets sold during the 2018 golf season.

    Solution:

    Find the weighted mean, where x = price and w = number of golf sets

    Weighted mean = xw w = 107000 100 =$1,070

  5. The mean annual income of all five of the doctors employed in a small medical center is $260,000. The annual incomes of three of these five doctors are $210,000, $250,000 and $275,000. Find the annual income of the fourth and fifth doctors, assuming these two doctors make the same annual income. Hint: In getting started with your solution, let the annual income of the fourth doctor be $X.

    Solution:

    Meanincome=$260,000= 210,000+250,000+275,000+X+X 5

    260,000= 735,000+2X 5

    260,000×5=735,000+2X

    1,300,000=735,000+2X

    565,000=2X

    X= 565,000 2

    X=$282,500