Chapter 7: A Tale of Two Variables, Continued (Days 48-49)
Today in Stats, we cover the next six pages of Chapter 7, on bivariate data and scatterplots. On these pages, the students learn how to make a scatterplot on their TI calculators. They also learn correlations, including the correlation coefficient r. Fortunately, there aren't that many differences between the TI-83 and TI-84 as far as scatterplots and their graphs are concerned. There are no TI-84 surprises of the type that we saw in the previous chapter on Normal distributions.
I've heard of the correlation coefficient r before, but this is the first time I've ever seen the actual formula for calculating it: r = sigma/sum z_x z_y/(n - 1), where z_x and z_y are the standardized (that is, converted to z-scores) values of x and y. I like how the text explains it -- standardizing the z-scores centers the plot at the origin. Then if there is a positive correlation, then for most points z_x and z_y have the same sign, so their product is positive -- and for negative correlations, z_x and z_y tend to have opposite sign, so their product is negative.
What throws me off is the statement that if the points lie on a straight line, then there is a perfect linear correlation between the variables and so r = 1 (or -1 if it's a perfect negative correlation). But it's not at all obvious to me that if the points are collinear, the sum in the numerator of that expression must be exactly equal to n - 1 -- in order for the equation to become r = (n - 1)/(n - 1) = 1.
For example, if the slope of that line happens to be 1, then there might be a point for which z_x and z_y are both 2 (that is, two standard deviations above the mean). Then z_x z_y would equal 4 -- and then if there are too many terms equal to 4, the sum would exceed n - 1. But a moment's thought shows why this can't be the case -- we know that in a Normal distribution, about 95% of the data lies within two SD's of the mean, with only 5% of the data at 2 SD's or beyond. So we expect no more than 5% of the terms in the sum to be 4 or greater. And even fewer terms are going to be 9 or greater (for 3 SD's or greater). Ah -- now it sounds reasonable for the sum never to exceed n - 1. (An actual proof that r = 1 or -1 for perfect linear correlation might be interesting to see.)
Meanwhile, in Calculus class we reach Section 3.4, on the Chain Rule. This is one of the most important lessons in the text -- luckily there are no absences today. Going back and forth between notes on paper and markers on whiteboard, we get through most of the examples in the section. The one girl who was absent on Friday says that she'll make up the quiz after school tomorrow. Another girl -- the one who scored only 69% -- tells me that she also wants to be there to retake the quiz. So once again, it appears that those quiz corrections aren't going to happen -- low scorers will just retake the quiz along with the absent students.
Oh, so what did I do about the Rapoport-style Exit Pass today? Well, I ask the students to find the slope of the tangent to y = (x^2 + 5)^(3/2) + (2x - 3)^(1/2) at x = 2. Notice that the first term works out to be 18 -- apparently it's easier to get 18 to be the answer to a Chain Rule problem than it was yesterday, before the Chain Rule. As I mentioned yesterday, I could have simply made the second term x in order to add 1, but instead I found another Chain Rule problem with an answer of 1. Therefore the final answer is 19, since today's date is the nineteenth.
Comments
Post a Comment