Tag Archives: random

An approach to teaching statistics to 8th graders

There are two main obstacles students have to overcome to learn statistics: one mathematical one philosophical. The math is somewhat difficult, and will be new to a high schooler. What’s more, philosophically, it is rarely obvious what it means to discover a true pattern, or underlying cause. Nor is it obvious how to separate the general pattern from the random accident, the pattern from the variation. This philosophical confusion (cause and effect, essence and accident) is exists in the back of even in the greatest minds. Accepting and dealing with it is at the heart of the best research: seeing what is and is not captured in the formulas of the day. But it is a lot to ask of the young (or the old) who are trying to understand the statistical technique while at the same time trying to understand the subject of the statistical analysis, For young students, especially the good ones, the issue of general and specific will compound the difficulty of the experiment and of the math. Thus, I’ll try to teach statistics with a problem or two where the distinction between essential cause and random variation is uncommonly clear.

A good case to get around the philosophical issue is gambling with crooked dice. I show the class a pair of normal-looking dice and a caliper and demonstrate that the dice are not square; virtually every store-bought die is not square, so finding an uneven pair is easy. After checking my caliper, students will readily accept that these dice are crooked, and so someone who knows how it is crooked will have an unfair advantage. After enough throws, someone who knows the degree of crookedness will win more often than those who do not. Students will also accept that there is a degree of randomness in the throw, so that any pair of dice will look pretty fair if you don’t gable with them too long. I can then use statistics to see which faces show up most, and justify the whole study of statistics to deal with a world where the dice are loaded by God, and you don’t have a caliper, or any more-direct way of checking them. The underlying uneven-ness of the dice is the underlying pattern, the random part in this case is in the throw, and you want to use statistics to grasp them both.

Two important numbers to understand when trying to use statistics are the average and the standard deviation. For an honest pair of dice, you’d expect an average of 1/6 = 0.1667 for every number on the face. But throw a die a thousand times and you’ll find that hardly any of the faces show up at the average rate of 1/6. The average of all the averages will still be 1/6. We will call that grand average, 1/6 = x°-bar, and we will call the specific face average of the face Xi-bar. where i is one, two three, four, five, or six.

There is also a standard deviation — SD. This relates to how often do you expect one fact to turn up more than the next. SD = √SD2, and SD2 is defined by the following formula

SD2 = 1/n ∑(xi – x°-bar)2

Let’s pick some face of the dice, 3 say. I’ll give a value of 1 if we throw that number and 0 if we do not. For an honest pair of dice, x°-bar = 1/6, that is to say, 1 out of 6 throws will be land on the number 3, going us a value of 1, and the others won’t. In this situation, SD2 = 1/n ∑(xi – x°-bar)2 will equal 1/6 ( (1/6)2 + 5 (5/6)2 )= 1/6 (126/36) = 3.5/6 = .58333. Taking the square root, SD = 0.734. We now calculate the standard error. For honest dice, you expect that for every face, on average

SE = Xi-bar minus x°-bar = ± SD √(1/n).

By the time you’ve thrown 10,000 throws, √(1/n) = 1/100 and you expect an error on the order of 0.0073. This is to say that you expect to see each face show up between about 0.1740 and 0.1594. In point of fact, you will likely find that at least one face of your dice shows up a lot more often than this, or a lot less often. To the extent you see that, this is the extent that your dice is crooked. If you throw someone’s dice enough, you can find out how crooked they are, and you can then use this information to beat the house. That, more or less is the purpose of science, by the way: you want to beat the house — you want to live a life where you do better than you would by random chance.

As a less-mathematical way to look at the same thing — understanding statistics — I suggest we consider a crooked coin throw with only two outcomes, heads and tails. Not that I have a crooked coin, but your job as before is to figure out if the coin is crooked, and if so how crooked. This problem also appears in political polling before a major election: how do you figure out who will win between Mr Head and Ms Tail from a sampling of only a few voters. For an honest coin or an even election, on each throw, there is a 50-50 chance of head, or of Mr Head. If you do it twice, there is a 25% chance of two heads, a 25% chance of throwing two tails and a 50% chance of one of each. That’s because there are four possibilities and two ways of getting a Head and a Tail.

pascal's triangle

Pascal’s triangle

You can systematize this with a Pascal’s triangle, shown at left. Pascal’s triangle shows the various outcomes for a coin toss, and shows the ways they can be arrived at. Thus, for example, we see that, by the time you’ve thrown the coin 6 times, or polled 6 people, you’ve introduced 26 = 64 distinct outcomes, of which 20 (about 1/3) are the expected, even result: 3 heads and 3 tails. There is only 1 way to get all heads and one way to get all tails. While an honest coin is unlikely to come up all heads or tails after six throws, more often than not an honest coin will not come up with half heads. In the case above, 44 out of 64 possible outcomes describe situations with more heads than tales, or more tales than heads — with an honest coin.

Similarly, in a poll of an even election, the result will not likely come up even. This is something that confuses many political savants. The lack of an even result after relatively few throws (or phone calls) should not be used to convince us that the die is crooked, or the election has a clear winner. On the other hand there is only a 1/32 chance of getting all heads or all tails (2/64). If you call 6 people, and all claim to be for Mr Head, it is likely that Mr Head is the true favorite to a confidence of 3% = 1/32. In sports, it’s not uncommon for one side to win 6 out of 6 times. If that happens, it is a good possibility that there is a real underlying cause, e.g. that one team is really better than the other.

And now we get to how significant is significant. If you threw 4 heads and 2 tails out of 6 throws we can accept that this is not significant because there are 15 ways to get this outcome (or 30 if you also include 2 heads and 4 tail) and only 20 to get the even outcome of 3-3. But what about if you threw 5 heads and one tail? In that case the ratio is 6/20 and the odds of this being significant is better, similarly, if you called potential voters and found 5 Head supporters and 1 for Tail. What do you do? I would like to suggest you take the ratio as 12/20 — the ratio of both ways to get to this outcome to that of the greatest probability. Since 12/20 = 60%, you could say there is a 60% chance that this result is random, and a 40% chance of significance. What statisticians call this is “suggestive” at slightly over 1 standard deviation. A standard deviation, also known as σ (sigma) is a minimal standard of significance, it’s if the one tailed value is 1/2 of the most likely value. In this case, where 6 tosses come in as 5 and 1, we find the ratio to be 6/20. Since 6/20 is less than 1/2, we meet this, very minimal standard for “suggestive.” A more normative standard is when the value is 5%. Clearly 6/20 does not meet that standard, but 1/20 does; for you to conclude that the dice is likely fixed after only 6 throws, all 6 have to come up heads or tails.

From skdz. It's typical in science to say that <5% chances, p <.050 are significant. If things don't quite come out that way, you redo.

From xkcd. It’s typical in science to say that <5% chances, p< .05. If things don’t quite come out that way, you redo.

If you graph the possibilities from a large Poisson Triangle they will resemble a bell curve; in many real cases (not all) your experiential data variation will also resemble this bell curve. From a larger Poisson’s triange, or a large bell curve, you  will find that the 5% value occurs at about σ =2, that is at about twice the distance from the average as to where σ  = 1. Generally speaking, the number of observations you need is proportional to the square of the difference you are looking for. Thus, if you think there is a one-headed coin in use, it will only take 6 or seven observations; if you think the die is loaded by 10% it will take some 600 throws of that side to show it.

In many (most) experiments, you can not easily use the poisson triangle to get sigma, σ. Thus, for example, if you want to see if 8th graders are taller than 7th graders, you might measure the height of people in both classes and take an average of all the heights  but you might wonder what sigma is so you can tell if the difference is significant, or just random variation. The classic mathematical approach is to calculate sigma as the square root of the average of the square of the difference of the data from the average. Thus if the average is <h> = ∑h/N where h is the height of a student and N is the number of students, we can say that σ = √ (∑ (<h> – h)2/N). This formula is found in most books. Significance is either specified as 2 sigma, or some close variation. As convenient as this is, my preference is for this graphical version. It also show if the data is normal — an important consideration.

If you find the data is not normal, you may decide to break the data into sub-groups. E.g. if you look at heights of 7th and 8th graders and you find a lack of normal distribution, you may find you’re better off looking at the heights of the girls and boys separately. You can then compare those two subgroups to see if, perhaps, only the boys are still growing, or only the girls. One should not pick a hypothesis and then test it but collect the data first and let the data determine the analysis. This was the method of Sherlock Homes — a very worthwhile read.

Another good trick for statistics is to use a linear regression, If you are trying to show that music helps to improve concentration, try to see if more music improves it more, You want to find a linear relationship, or at lest a plausible curve relationship. Generally there is a relationship if (y – <y>)/(x-<x>) is 0.9 or so. A discredited study where the author did not use regressions, but should have, and did not report sub-groups, but should have, involved cancer and genetically modified foods. The author found cancer increased with one sub-group, and publicized that finding, but didn’t mention that cancer didn’t increase in nearby sub-groups of different doses, and decreased in a nearby sub-group. By not including the subgroups, and not doing a regression, the author mislead people for 2 years– perhaps out of a misguided attempt to help. Don’t do that.

Dr. Robert E. Buxbaum, June 5-7, 2015. Lack of trust in statistics, or of understanding of statistical formulas should not be taken as a sign of stupidity, or a symptom of ADHD. A fine book on the misuse of statistics and its pitfalls is called “How to Lie with Statistics.” Most of the examples come from advertising.

Why random experimental design is better

In a previous post I claimed that, to do good research, you want to arrange experiments so there is no pre-hypothesis of how the results will turn out. As the post was long, I said nothing direct on how such experiments should be organized, but only alluded to my preference: experiments should be organized at randomly chosen conditions within the area of interest. The alternative, shown below is that experiments should be done at the cardinal points in the space, or at corner extremes: the Wilson Box and Taguchi design of experiments (DoE), respectively. Doing experiments at these points implies a sort of expectation of the outcome; generally that results will be linearly, orthogonal related to causes; in such cases, the extreme values are the most telling. Sorry to say, this usually isn’t how experimental data will fall out. First experimental test points according to a Wilson Box, a Taguchi, and a random experimental design. The Wilson box and Taguchi are OK choices if you know or suspect that there are no significant non-linear interactions, and where experiments can be done at these extreme points. Random is the way nature works; and I suspect that's best -- it's certainly easiest.

First experimental test points according to a Wilson Box, a Taguchi, and a random experimental design. The Wilson box and Taguchi are OK choices if you know or suspect that there are no significant non-linear interactions, and where experiments can be done at these extreme points. Random is the way nature works; and I suspect that’s best — it’s certainly easiest.

The first test-points for experiments according to the Wilson Box method and Taguchi method of experimental designs are shown on the left and center of the figure above, along with a randomly chosen set of experimental conditions on the right. Taguchi experiments are the most popular choice nowadays, especially in Japan, but as Taguchi himself points out, this approach works best if there are “few interactions between variables, and if only a few variables contribute significantly.” Wilson Box experimental choices help if there is a parabolic effect from at least one parameter, but are fairly unsuited to cases with strong cross-interactions.

Perhaps the main problems with doing experiments at extreme or cardinal points is that these experiments are usually harder than at random points, and that the results from these difficult tests generally tell you nothing you didn’t know or suspect from the start. The minimum concentration is usually zero, and the minimum temperature is usually one where reactions are too slow to matter. When you test at the minimum-minimum point, you expect to find nothing, and generally that’s what you find. In the data sets shown above, it will not be uncommon that the two minimum W-B data points, and the 3 minimum Taguchi data points, will show no measurable result at all.

Randomly selected experimental conditions are the experimental equivalent of Monte Carlo simulation, and is the method evolution uses. Set out the space of possible compositions, morphologies and test conditions as with the other method, and perhaps plot them on graph paper. Now, toss darts at the paper to pick a few compositions and sets of conditions to test; and do a few experiments. Because nature is rarely linear, you are likely to find better results and more interesting phenomena than at any of those at the extremes. After the first few experiments, when you think you understand how things work, you can pick experimental points that target an optimum extreme point, or that visit a more-interesting or representative survey of the possibilities. In any case, you’ll quickly get a sense of how things work, and how successful the experimental program will be. If nothing works at all, you may want to cancel the program early, if things work really well you’ll want to expand it. With random experimental points you do fewer worthless experiments, and you can easily increase or decrease the number of experiments in the program as funding and time allows.

Consider the simple case of choosing a composition for gunpowder. The composition itself involves only 3 or 4 components, but there is also morphology to consider including the gross structure and fine structure (degree of grinding). Instead of picking experiments at the maximum compositions: 100% salt-peter, 0% salt-peter, grinding to sub-micron size, etc., as with Taguchi, a random methodology is to pick random, easily do-able conditions: 20% S and 40% salt-peter, say. These compositions will be easier to ignite, and the results are likely to be more relevant to the project goals.

The advantages of random testing get bigger the more variables and levels you need to test. Testing 9 variables at 3 levels each takes 27 Taguchi points, but only 16 or so if the experimental points are randomly chosen. To test if the behavior is linear, you can use the results from your first 7 or 8 randomly chosen experiments, derive the vector that gives the steepest improvement in n-dimensional space (a weighted sum of all the improvement vectors), and then do another experimental point that’s as far along in the direction of that vector as you think reasonable. If your result at this point is better than at any point you’ve visited, you’re well on your way to determining the conditions of optimal operation. That’s a lot faster than by starting with 27 hard-to-do experiments. What’s more, if you don’t find an optimum; congratulate yourself, you’ve just discovered an non-linear behavior; something that would be easy to overlook with Taguchi or Wilson Box methodologies.

The basic idea is one Sherlock Holmes pointed out (Study in Scarlet): It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.” (Case of Identity). Life is infinitely stranger than anything which the mind of man could invent.

Robert E. Buxbaum, September 11, 2013. A nice description of the Wilson Box method is presented in Perry’s Handbook (6th ed). SInce I had trouble finding a free, on-line description, I linked to a paper by someone using it to test ingredient choices in baked bread. Here’s a link for more info about random experimental choice, from the University of Michigan, Chemical Engineering dept. Here’s a joke on the misuse of statistics, and a link regarding the Taguchi Methodology. Finally, here’s a pointless joke on irrational numbers, that I posted for pi-day.