Michael Pershan boldly posted *How I Teach Probability *recently. I have struggled mightily with the teaching of probability over the years, so I took his video as an invitation to share and discuss.

The following is a copy/paste of an email I sent him (with, perhaps, some light editing).

I have been working on a longer, better edited version of the basic idea that I lay out here. But I’m working on tons of other stuff, too, so I cannot promise that the other one will ever see the light of day.

So here you are. Enjoy.

—

An important bit of honesty first: I am crap at teaching probability. Many reasons, mostly having to do with the ephemerality and abstraction of the topic, in contrast to the cold, hard demonstrable reality of (say) fractions.

With that said, I seem to have been making progress with an approach that has one important extended feature to the sort of work you show in your video.

Your video has (1) a scenario, (2) guesses, (3) small group data collection, and (4) large group data collection.

To that I add (2.5) explicit discussion of students’ probability models that inform their guesses, and then (5) some related follow-up activities.

After collecting guesses, I ask someone in class to describe WHY they said what they said. My job, then, is to press for details and to capture what they are thinking as carefully and accurately as possible on the board.

Once I have that, I ask for someone to describe a different way of thinking about it. Even if slightly different, we get it on the board. My writing isn’t abaout me dispensing wisdom, it is about making a permanent record of a student’s model in enough detail that we will be able to test it later. For anything even moderately complicated (such as rolling two dice and considering their sum), I am disappointed if we don’t get at least four different models.

Now the data collection doesn’t just tell us who guessed closest; it can rule out at least some of our models.

As an example, in your scenario I would expect something like this:

**Model 1: **Zero multiples of three and and one multiple of three are equally likely, so I’ll bet on either one. There are only four possibilities: 0, 1, 2 and 3 multiples of three, so the probability of each of these outcomes is 1/4.

**Model 2: **There is more than one way to get 1 multiple of 3. Our model should account for that. There are three ways to get 1 multiple of 3 (on Die1, Die2 or Die3), three ways to get 2 multiples of 3, and only 1 way each to get 0 or 3 multiples of 3. That’s eight possibilities, so the probability of getting 1 multiple of 3 is 3/8, while 0 multiples is 1/8.

**Model 3: **Each die can come up either “Mo3” or “NotMo3”. Getting all “NotMo3” has probability 1/8 (1/2*1/2*1/2), while getting one “Mo3” and 2 “NotMo3” is also 1/8, but there are 3 ways to do it, so it’s 3/8 altogether.

**Et cetera**

As many different ways of thinking about it as my students have, I will dutifully record. I encourage argument, as each new argument suggests a new model, which we can test.

Now before we roll those dice, we set up a way to test these models. In my experience, I have to devise that test. My students are not sophisticated enough to think this way yet.

In your example, and with these models, I might identify an important difference to be that Model 1 predicts equal numbers of 0 multiples of 3 and 1 multiple of 3, while Models 2 and 3 predict three times as many 1s as 0s. Hopefully we also have a model that predicts something in between.

So now as we roll, we are not just looking for which happens more often, but to relative frequency. Because *the model that better predicts the relative frequency that actually happens has got to be the better model*.

Much more to say, examples to offer, etc.