This week we were reviewing for the first unit test in AP Statistics, and I stumbled upon an in-class activity that highlights the difference between bar charts and histograms, particularly in the confusing situation where categorical data is masquerading as quantitative data. Plus it grew completely organically from our need to organize our review questions.
To prepare for our unit review, the homework I assigned was for students to browse through the 40 homework problems at the end of the unit and pick three that they particularly wanted to go through in class. This seemed like a good idea, until the next morning when I was getting ready for class, and I realized how hard it was going to be to have a whole class reach consensus on which problems to go over. In fact, it was going to be a challenge to even record which problems were most popular. I thought about doing a google poll, but that required more setup than I had time for. So I just went to the board and drew this axis (painstakingly recreated because I didn’t think to take a picture at the time):
Then I told the students to go to the board and mark the problems they had chosen with a dot, to make a dot-plot while I took attendance, figuring I would know which problems students wanted to review. First three rules of dealing with data: make a picture, make a picture, make a picture. This is what they drew:
I was trying to decide what the cut-score should be for a problem. Clearly I didn’t have time to go over all of the problems that students wanted to review, and equally clearly, numbers 26 and 17 were must-dos. But I wondered how many requests would justify class time spent, and started thinking about the statistics of the situation.
While thinking about this, I asked the class, “What does the distribution look like?” Short silence, followed by some tentative “Multi-modal” and “Skewed” answers. So I asked another question: “What kind of data is this?” (Yes, I know. Grammar. But that’s what I asked.)
The class confidently described it as quantitative. One student then looked dubious, and said, “Wait, no, it’s categorical.” Some argument ensued. So I suggested that instead of going over 26 and 18, maybe we should just average them and go over 22. They immediately saw that the problem number carried no mathematical information, and that the number was just an identifier in this case.
I asked, “But can we make a histogram? Sometimes a dot plot is just a discrete histogram, but not in this case. Can we use these same data and make a proper histogram?”
It took awhile, but a few of the students suggested that the horizontal axis could be “number of students requesting the problem,” and then we were off. We made a table of values:
And plotted the histogram on our calculators:
Then we could describe the shape of the distribution (Unimodal and strongly skewed right). We calculated the Min (zero), Q1 (zero), Median (zero!) and Q3 (two), got an IQR of 2, and found that there were two outliers at 9 and 6 requests each, which corresponded to questions 26 and 18. I said, “Great, then we only have to go over two questions!”
After a few seconds of silence, one student timidly put up her hand and said, “I know it’s not an outlier, but can we go over number 31 also?” (Sometimes they don’t realize when I’m joking….)
- While going over question 26, I erased half of our dot plot, which gave us a great illustration of the information you lose when making a histogram; once I erased the dotplot, I didn’t have a record of the problem numbers anymore. Doh!
- When we looked at our summary statistics (1-var stats on our table of values in the TI-84), we discovered a mistake in our initial graph. Post in the comments if you see what I did wrong!
Bonus side note: My principle dropped in to observe my class while we were doing this. He stopped by at the end of the day to tell me that watching this activity had been a highlight of his day. Stats FTW!