Category Archives: math

Fractal power laws and radioactive waste decay

Here’s a fairly simple model for nuclear reactor decay heat versus time. It’s based on a fractal model I came up with for dealing with the statistics of crime, fires, etc. The start was to notice that radioactive waste is typically a mixture of isotopes with different decay times and different decay heats. I then came to suspect that there would be a general fractal relation, and that the fractal relation would hold through as the elements of the mixed waste decayed to more stable, less radioactive products. After looking a bit, if seems that the fractal time characteristic is time to the 1/4 power, that is

heat output = H° exp (-at1/4).

Here H° is the heat output rate at some time =0 and “a” is a characteristic of the waste. Different waste mixes will have different values of this decay characteristic.

If nuclear waste consisted of one isotope and one decay path, the number of atoms decaying per day would decrease exponentially with time to the power of 1. If there were only one daughter product produced, and it were non-radioactive, the heat output of a sample would also decay with time to the power of 1. Thus, Heat output would equal  H° exp (-at) and a plot of the log of the decay heat would be linear against linear time — you could plot it all conveniently on semi-log paper.

But nuclear waste generally consists of many radioactive components with different half lives, and these commpnents decay into other radioactive isotopes, all of whom have half-lives that vary by quite a lot. The result is that a semi-log plot is rarely helpful.  Some people therefore plot radioactivity on a log-log plot, typically including a curve for each major isotope and decay mode. I find these plots hardly useful. They are certainly impossible to extrapolate. What I’d like to propose instead is a fractal variation of the original semi-log plot: a  plot of the log of the heat rate against a fractal time. As shown below the use of time to the 1/4 power seems to be helpful. The plot is similar to a fractal decay model that I’d developed for crimes and fires a few weeks ago

Afterheat of fuel rods used to generate 20 kW/kg U; Top graph 35 MW-days/kg U; bottom graph 20 Mw-day /kg  U. Data from US NRC Regulatory Guide 3.54 - Spent Fuel Heat Generation in an Independent Spent Fuel Storage Installation, rev 1, 1999. http://www.nrc.gov/reading-rm/doc-collections/reg-guides/fuels-materials/rg/03-054/

After-heat of nuclear fuel rods used at 20 kW/kg U; Top graph 35 MW-days/kg U; bottom graph 20 Mw-day /kg U. Data from US NRC Regulatory Guide 3.54. A typical reactor has 200,000 kg of uranium.

A plausible justification for this fractal semi-log plot is to observe that the half-life of daughter isotopes relates to the parent isotopes. Unless I find that someone else has come up with this sort of plot or analysis before, I’ll call it after myself: a Buxbaum Mandelbrot plot –Why not?

Nuclear power is attractive because it is a lot more energy dense than any normal fuel. Still the graph at right illustrates the problem of radioactive waste. With nuclear, you generate about 35 MW-days of power per kg of uranium. This is enough to power an average US home for 8 years, but it produces 1 kg of radioactive waste. Even after 81 years the waste is generating about 1/2 W of decay heat. It should be easier to handle and store the 1 kg of spent uranium than to deal with the many tons of coal-smoke produced when 35 MW-days of electricity is made from coal, still, there is reason to worry about the decay heat.

I’ve made a similar plot of decay heat of a fusion reactor, see below. Fusion looks better in this regard. A fission-based nuclear reactor to power 1/2 of Detroit, would hold some 200,000 kg of uranium that would be replaced every 5 years. Even 81 years after removal, the after-heat would be about 100 kW, and that’s a lot.

Afterheat of a 4000 MWth Fusion Reactor, from UMAC III Report. Nb-1%Zr is a fairly common high-temerature engineering material of construction.

After-heat of a 4000 MWth Fusion Reactor built from niobium-1%zirconium; from UWMAC III Report. The after heat is far less than with normal uranium fission.

The plot of the after-heat of a similar power fusion reactor (right) shows a far greater slope, but the same time to the1/4 power dependence. The heat output drops from 1 MW at 3 weeks to only 100 W after 1 year and far less than 1 W after 81 years. Nuclear fusion is still a few years off, but the plot at left shows the advantages fairly clearly, I. think.

This plot was really designed to look at the statistics of crime, fires, and the need for servers / checkout people.

Dr. R.E. Buxbaum, January 2, 2014, edited Aug 30, 2022. *A final, final thought about theory from Yogi Berra: “In theory, it matches reality.”

Near-Poisson statistics: how many police – firemen for a small city?

In a previous post, I dealt with the nearly-normal statistics of common things, like river crests, and explained why 100 year floods come more often than once every hundred years. As is not uncommon, the data was sort-of like a normal distribution, but deviated at the tail (the fantastic tail of the abnormal distribution). But now I’d like to present my take on a sort of statistics that (I think) should be used for the common problem of uncommon events: car crashes, fires, epidemics, wars…

Normally the mathematics used for these processes is Poisson statistics, and occasionally exponential statistics. I think these approaches lead to incorrect conclusions when applied to real-world cases of interest, e.g. choosing the size of a police force or fire department of a small town that rarely sees any crime or fire. This is relevant to Oak Park Michigan (where I live). I’ll show you how it’s treated by Poisson, and will then suggest a simpler way that’s more relevant.

First, consider an idealized version of Oak Park, Michigan (a semi-true version until the 1980s): the town had a small police department and a small fire department that saw only occasional crimes or fires, all of which required only 2 or 4 people respectively. Lets imagine that the likelihood of having one small fire at a given time is x = 5%, and that of having a violent crime is y =5% (it was 6% in 2011). A police department will need to have to have 2 policemen on call at all times, but will want 4 on the 0.25% chance that there are two simultaneous crimes (.05 x .05 = .0025); the fire department will want 8 souls on call at all times for the same reason. Either department will use the other 95% of their officers dealing with training, paperwork, investigations of less-immediate cases, care of equipment, and visiting schools, but this number on call is needed for immediate response. As there are 8760 hours per year and the police and fire workers only work 2000 hours, you’ll need at least 4.4 times this many officers. We’ll add some more for administration and sick-day relief, and predict a total staff of 20 police and 40 firemen. This is, more or less, what it was in the 1980s.

If each fire or violent crime took 3 hours (1/8 of a day), you’ll find that the entire on-call staff was busy 7.3 times per year (8x365x.0025 = 7.3), or a bit more since there is likely a seasonal effect, and since fires and violent crimes don’t fall into neat time slots. Having 3 fires or violent crimes simultaneously was very rare — and for those rare times, you could call on nearby communities, or do triage.

In response to austerity (towns always overspend in the good times, and come up short later), Oak Park realized it could use fewer employees if they combined the police and fire departments into an entity renamed “Public safety.” With 45-55 employees assigned to combined police / fire duty they’d still be able to handle the few violent crimes and fires. The sum of these events occurs 10% of the time, and we can apply the sort of statistics above to suggest that about 91% of the time there will be neither a fire nor violent crime; about 9% of the time there will be one or more fires or violent crimes (there is a 5% chance for each, but also a chance that 2 happen simultaneously). At least two events will occur 0.9% of the time (2 fires, 2 crimes or one of each), and they will have 3 or more events .09% of the time, or twice per year. The combined force allowed fewer responders since it was only rarely that 4 events happened simultaneously, and some of those were 4 crimes or 3 crimes and a fire — events that needed fewer responders. Your only real worry was when you have 3 fires, something that should happen every 3 years, or so, an acceptable risk at the time.

Before going to what caused this model of police and fire service to break down as Oak Park got bigger, I should explain Poisson statistics, exponential Statistics, and Power Law/ Fractal Statistics. The only type of statistics taught for dealing with crime like this is Poisson statistics, a type that works well when the events happen so suddenly and pass so briefly that we can claim to be interested in only how often we will see multiples of them in a period of time. The Poisson distribution formula is, P = rke/r! where P is the Probability of having some number of events, r is the total number of events divided by the total number of periods, and k is the number of events we are interested in.

Using the data above for a period-time of 3 hours, we can say that r= .1, and the likelihood of zero, one, or two events begin in the 3 hour period is 90.4%, 9.04% and 0.45%. These numbers are reasonable in terms of when events happen, but they are irrelevant to the problem anyone is really interested in: what resources are needed to come to the aid of the victims. That’s the problem with Poisson statistics: it treats something that no one cares about (when the thing start), and under-predicts the important things, like how often you’ll have multiple events in-progress. For 4 events, Poisson statistics predicts it happens only .00037% of the time — true enough, but irrelevant in terms of how often multiple teams are needed out on the job. We need four teams no matter if the 4 events began in a single 3 hour period or in close succession in two adjoining periods. The events take time to deal with, and the time overlaps.

The way I’d dealt with these events, above, suggests a power law approach. In this case, each likelihood was 1/10 the previous, and the probability P = .9 x10-k . This is called power law statistics. I’ve never seen it taught, though it appears very briefly in Wikipedia. Those who like math can re-write the above relation as log10P = log10 .9 -k.

One can generalize the above so that, for example, the decay rate can be 1/8 and not 1/10 (that is the chance of having k+1 events is 1/8 that of having k events). In this case, we could say that P = 7/8 x 8-k , or more generally that log10P = log10 A –kβ. Here k is the number of teams required at any time, β is a free variable, and Α = 1-10 because the sum of all probabilities has to equal 100%.

In college math, when behaviors like this appear, they are incorrectly translated into differential form to create “exponential statistics.” One begins by saying ∂P/∂k = -βP, where β = .9 as before, or remains some free-floating term. Everything looks fine until we integrate and set the total to 100%. We find that P = 1/λ e-kλ for k ≥ 0. This looks the same as before except that the pre-exponential always comes out wrong. In the above, the chance of having 0 events turns out to be 111%. Exponential statistics has the advantage (or disadvantage) that we find a non-zero possibility of having 1/100 of a fire, or 3.14159 crimes at a given time. We assign excessive likelihoods for fractional events and end up predicting artificially low likelihoods for the discrete events we are interested in except going away from a calculus that assumes continuity in a world where there is none. Discrete math is better than calculus here.

I now wish to generalize the power law statistics, to something similar but more robust. I’ll call my development fractal statistics (there’s already a section called fractal statistics on Wikipedia, but it’s really power-law statistics; mine will be different). Fractals were championed by Benoit B. Mandelbrot (who’s middle initial, according to the old joke, stood for Benoit B. Mandelbrot). Many random processes look fractal, e.g. the stock market. Before going here, I’d like to recall that the motivation for all this is figuring out how many people to hire for a police /fire force; we are not interested in any other irrelevant factoid, like how many calls of a certain type come in during a period of time.

To choose the size of the force, lets estimate how many times per year some number of people are needed simultaneously now that the city has bigger buildings and is seeing a few larger fires, and crimes. Lets assume that the larger fires and crimes occur only .05% of the time but might require 15 officers or more. Being prepared for even one event of this size will require expanding the force to about 80 men; 50% more than we have today, but we find that this expansion isn’t enough to cover the 0.0025% of the time when we will have two such major events simultaneously. That would require a 160 man fire-squad, and we still could not deal with two major fires and a simultaneous assault, or with a strike, or a lot of people who take sick at the same time. 

To treat this situation mathematically, we’ll say that the number times per year where a certain number of people are need, relates to the number of people based on a simple modification of the power law statistics. Thus:  log10N = A – βθ  where A and β are constants, N is the number of times per year that some number of officers are needed, and θ is the number of officers needed. To solve for the constants, plot the experimental values on a semi-log scale, and find the best straight line: -β is the slope and A  is the intercept. If the line is really straight, you are now done, and I would say that the fractal order is 1. But from the above discussion, I don’t expect this line to be straight. Rather I expect it to curve upward at high θ: there will be a tail where you require a higher number of officers. One might be tempted to modify the above by adding a term like but this will cause problems at very high θ. Thus, I’d suggest a fractal fix.

My fractal modification of the equation above is the following: log10N = A-βθ-w where A and β are similar to the power law coefficients and w is the fractal order of the decay, a coefficient that I expect to be slightly less than 1. To solve for the coefficients, pick a value of w, and find the best fits for A and β as before. The right value of w is the one that results in the straightest line fit. The equation above does not look like anything I’ve seen quite, or anything like the one shown in Wikipedia under the heading of fractal statistics, but I believe it to be correct — or at least useful.

To treat this politically is more difficult than treating it mathematically. I suspect we will have to combine our police and fire department with those of surrounding towns, and this will likely require our city to revert to a pure police department and a pure fire department. We can’t expect other cities specialists to work with our generalists particularly well. It may also mean payments to other cities, plus (perhaps) standardizing salaries and staffing. This should save money for Oak Park and should provide better service as specialists tend to do their jobs better than generalists (they also tend to be safer). But the change goes against the desire (need) of our local politicians to hand out favors of money and jobs to their friends. Keeping a non-specialized force costs lives as well as money but that doesn’t mean we’re likely to change soon.

Robert E. Buxbaum  December 6, 2013. My two previous posts are on how to climb a ladder safely, and on the relationship between mustaches in WWII: mustache men do things, and those with similar mustache styles get along best.

Ab Normal Statistics and joke

The normal distribution of observation data looks sort of like a ghost. A Distribution  that really looks like a ghost is scary.

The normal distribution of observation data looks sort of like a ghost. A Distribution that really looks like a ghost is scary.

It’s funny because …. the normal distribution curve looks sort-of like a ghost. It’s also funny because it would be possible to imagine data being distributed like the ghost, and most people would be totally clue-less as to how to deal with data like that — abnormal statistics. They’d find it scary and would likely try to ignore the problem. When faced with a statistics problem, most people just hope that the data is normal; they then use standard mathematical methods with a calculator or simulation package and hope for the best.

Take the following example: you’re interested in buying a house near a river. You’d like to analyze river flood data to know your risks. How high will the river rise in 100 years, or 1000. Or perhaps you would like to analyze wind data to know how strong to make a sculpture so it does not blow down. Your first thought is to use the normal distribution math in your college statistics book. This looks awfully daunting (it doesn’t have to) and may be wrong, but it’s all you’ve got.

The normal distribution graph is considered normal, in part, because it’s fairly common to find that measured data deviates from the average in this way. Also, this distribution can be derived from the mathematics of an idealized view of the world, where any variety derives from multiple small errors around a common norm, and not from some single, giant issue. It’s not clear this is a realistic assumption in most cases, but it is comforting. I’ll show you how to do the common math as it’s normally done, and then how to do it better and quicker with no math at all, and without those assumptions.

Lets say you want to know the hundred-year maximum flood-height of a river near your house. You don’t want to wait 100 years, so you measure the maximum flood height every year over five years, say, and use statistics. Lets say you measure 8 foot, 6 foot, 3 foot (a draught year), 5 feet, and 7 feet.

The “normal” approach (pardon the pun), is to take a quick look at the data, and see that it is sort-of normal (many people don’t bother). One now takes the average, calculated here as (8+6+3+5+7)/5 = 5.8 feet. About half the times the flood waters should be higher than this (a good researcher would check this, many do not). You now calculate the standard deviation for your data, a measure of the width of the ghost, generally using a spreadsheet. The formula for standard deviation of a sample is s = √{[(8-5.8)2 + (6-5.8)2 + (3-5.8)2 + (5-5.8)2 + (7-5.8)2]/4} = 1.92. The use of 4 here in the denominator instead of 5 is called the Brussels correction – it refers to the fact that a standard of deviation is meaningless if there is only one data point.

For normal data, the one hundred year maximum height of the river (the 1% maximum) is the average height plus 2.2 times the deviation; in this case, 5.8 + 2.2 x 1.92 = 10.0 feet. If your house is any higher than this you should expect few troubles in a century. But is this confidence warranted? You could build on stilts or further from the river, but you don’t want to go too far. How far is too far?

So let’s do this better. We can, with less math, through the use of probability paper. As with any good science we begin with data, not assumptions, like that the data is normal. Arrange the river height data in a list from highest to lowest (or lowest to highest), and plot the values in this order on your probability paper as shown below. That is on paper where likelihoods from .01% to 99.99% are arranged along the bottom — x axis, and your other numbers, in this case the river heights, are the y values listed at the left. Graph paper of this sort is sold in university book stores; you can also get jpeg versions on line, but they don’t look as nice.

probability plot of maximum river height over 5 years -- looks reasonably normal, but slightly ghost-like.

Probability plot of the maximum river height over 5 years. If the data suggests a straight line, like here the data is reasonably normal. Extrapolating to 99% suggests the 100 year flood height would be 9.5 to 10.2 feet, and that it is 99.99% unlikely to reach 11 feet. That’s once in 10,000 years, other things being equal.

For the x axis values of the 5 data points above, I’ve taken the likelihood to be the middle of its percentile. Since there are 5 data points, each point is taken to represent its own 20 percentile; the middles appear at 10%, 30%, 50%, etc. I’ve plotted the highest value (8 feet) at the 10% point on the x axis, that being the middle of the upper 20%. I then plotted the second highest (7 feet) at 30%, the middle of the second 20%; the third, 6 ft at 50%; the fourth at 70%; and the draught year maximum (3 feet) at 90%.  When done, I judge if a reasonably straight line would describe the data. In this case, a line through the data looks reasonably straight, suggesting a fairly normal distribution of river heights. I notice that, if anything the heights drop off at the left suggesting that really high river levels are less likely than normal. The points will also have to drop off at the right since a negative river height is impossible. Thus my river heights describe a version of the ghost distribution in the cartoon above. This is a welcome finding since it suggests that really high flood levels are unlikely. If the data were non-normal, curving the other way we’d want to build our house higher than a normal distribution would suggest. 

You can now find the 100 year flood height from the graph above without going through any the math. Just draw your best line through the data, and look where it crosses the 1% value on your graph (that’s two major lines from the left in the graph above — you may have to expand your view to see the little 1% at top). My extrapolation suggests the hundred-year flood maximum will be somewhere between about 9.5 feet, and 10.2 feet, depending on how I choose my line. This prediction is a little lower than we calculated above, and was done graphically, without the need for a spreadsheet or math. What’s more, our predictions is more accurate, since we were in a position to evaluate the normality of the data and thus able to fit the extrapolation line accordingly. There are several ways to handle extreme curvature in the line, but all involve fitting the curve some way. Most weather data is curved, e.g. normal against a fractal, I think, and this affects you predictions. You might expect to have an ice age in 10,000 years.

The standard deviation we calculated above is related to a quality standard called six sigma — something you may have heard of. If we had a lot of parts we were making, for example, we might expect to find that the size deviation varies from a target according to a normal distribution. We call this variation σ, the greek version of s. If your production is such that the upper spec is 2.2 standard deviations from the norm, 99% of your product will be within spec; good, but not great. If you’ve got six sigmas there is one-in-a-billion confidence of meeting the spec, other things being equal. Some companies (like Starbucks) aim for this low variation, a six sigma confidence of being within spec. That is, they aim for total product uniformity in the belief that uniformity is the same as quality. There are several problems with this thinking, in my opinion. The average is rarely an optimum, and you want to have a rational theory for acceptable variation boundaries. Still, uniformity is a popular metric in quality management, and companies that use it are better off than those that do nothing. At REB Research, we like to employ the quality methods of W. Edwards Deming; we assume non-normality and aim for an optimum (that’s subject matter for a further essay). If you want help with statistics, or a quality engineering project, contact us.

I’ve also meant to write about the phrase “other things being equal”, Ceteris paribus in Latin. All this math only makes sense so long as the general parameters don’t change much. Your home won’t flood so long as they don’t build a new mall up river from you with runoff in the river, and so long as the dam doesn’t break. If these are concerns (and they should be) you still need to use statistics and probability paper, but you will now have to use other data, like on the likelihood of malls going up, or of dams breaking. When you input this other data, you will find the probability curve is not normal, but typically has a long tail (when the dam breaks, the water goes up by a lot). That’s outside of standard statistic analysis, but why those hundred year floods come a lot more often than once in 100 years. I’ve noticed that, even at Starbucks, more than 1/1,000,000,000 cups of coffee come out wrong. Even in analyzing a common snafu like this, you still use probability paper, though. It may be ‘situation normal”, but the distribution curve it describes has an abnormal tail.

by Dr. Robert E. Buxbaum, November 6, 2013. This is my second statistics post/ joke, by the way. The first one dealt with bombs on airplanes — well, take a look.

Calculus is taught wrong, and is often wrong

The high point of most people’s college math is The Calculus. Typically this is a weeder course that separates the science-minded students from the rest. It determines which students are admitted to medical and engineering courses, and which will be directed to english or communications — majors from which they can hope to become lawyers, bankers, politicians, and spokespeople (the generally distrusted). While calculus is very useful to know, my sense is that it is taught poorly: it is built up on a year of unnecessary pre-calculus and several shady assumptions that were not necessary for the development, and that are not generally true in the physical world. The material is presented in a way that confuses and turns off many of the top students — often the ones most attached to the reality of life.

The most untenable assumption in calculus teaching, in my opinion, are that the world involves continuous functions. That is, for example, that at every instant in time an object has one position only, and that its motion from point to point is continuous, defining a slow-changing quantity called velocity. That is, every x value defines one and only one y value, and there is never more than a small change in y at the limit of a small change in X. Does the world work this way? Some parts do, others do not. Commodity prices are not really defined except at the moment of sale, and can jump significantly between two sales a micro-second apart. Objects do not really have one position, the quantum sense, at any time, but spread out, sometimes occupying several positions, and sometimes jumping between positions without ever occupying the space in-between.

These are annoying facts, but calculus works just fine in a discontinuous world — and I believe that a discontinuous calculus is easier to teach and understand too. Consider the fundamental law of calculus. This states that, for a continuous function, the integral of the derivative of changes equals the function itself (nearly incomprehensible, no?) Now consider the same law taught for a discontinuous group of changes: the sum of the changes that take place over a period equals the total change. This statement is more general, since it applies to discrete and continuous functions, and it’s easier to teach. Any idiot can see that this is true. By contrast, it takes weeks of hard thinking to see that the integral of all the derivatives equals the function — and then it takes more years to be exposed to delta functions and realize that the statement is still true for discrete change. Why don’t we teach so that people will understand? Teach discrete first and then smooth as a special case where the discrete changes happen at a slow rate. Is calculus taught this way to make us look smart, or because we want this to be a weeder course?

Because most students are not introduced to discrete change, they are in a very poor position  to understand, or model, activities that are discreet, like climate change or heart rate. Climate only makes sense year to year, as day-to-day behavior is mostly affected by seasons, weather, and day vs night. We really want to model the big picture and leave out the noise by considering each day or year as a whole, keeping track of the average temperature for noon on September 21, for example. Similarly with heart rate, the rate has no meaning if measured every microsecond; it’s only meaning is as a measure of the time between beats. If we taught calculus in terms of discrete functions, our students would be in a better place to deal with these things, and in a better place to deal with total discontinuous behaviors, like chaos and fractals, an important phenomena when dealing with economics, for example.

A fundamental truth of quantum mechanics is that there is no defined speed and position of an object at any given time. Students accept this, but (because they are used to continuous change) they come to wonder how it is that over time energy is conserved. It’s simple, quantum motion involves a gross discrete changes in position that leaves energy conserved by the end, but where an item goes from here to there without ever having to be in the middle. This helps explain the old joke about Heisenberg and his car.

Calculus-based physics is taught in terms of limits and the mean value theorem: that if x is the position of a thing at any time, t then the derivative of these positions, the velocity, will approach ∆x/∆t more and more as ∆x and ∆t become more tightly defined. When this is found to be untrue in a quantum sense, the remnant of the belief in it hinders them when they try to solve real world problems. Normal physics is the limit of quantum physics because velocity is really a macroscopic ratio of difference in position divided by macroscopic difference in time. Because of this, it is obvious that the sum of these differences is the total distance traveled even when summed over many simultaneous paths. A feature of electromagnetism, Green’s theorem becomes similarly obvious: the sum effect of a field of changes is the total change. It’s only confusing if you try to take the limits to find the exact values of these change rates at some infinitesimal space.

This idea is also helpful in finance, likely a chaotic and fractal system. Finance is not continuous: just because a stock price moved from $1 to $2 per share in one day does not mean that the price was ever $1.50 per share. While there is probably no small change in sales rate caused by a 1¢ change in sales price at any given time, this does not mean you won’t find it useful to consider the relation between the sales of a product. Though the details may be untrue, the price demand curve is still very useful (but unjustified) abstraction.

This is not to say that there are not some real-world things that are functions and continuous, but believing that they are, just because the calculus is useful in describing them can blind you to some important insights, e.g. of phenomena where the butterfly effect predominates. That is where an insignificant change in one place (a butterfly wing in China) seems to result in a major change elsewhere (e.g. a hurricane in New York). Recognizing that some conclusions follow from non-continuous math may help students recognize places where some parts of basic calculus allies, while others do not.

Dr. Robert Buxbaum (my thanks to Dr. John Klein for showing me discrete calculus).

Why random experimental design is better

In a previous post I claimed that, to do good research, you want to arrange experiments so there is no pre-hypothesis of how the results will turn out. As the post was long, I said nothing direct on how such experiments should be organized, but only alluded to my preference: experiments should be organized at randomly chosen conditions within the area of interest. The alternative, shown below is that experiments should be done at the cardinal points in the space, or at corner extremes: the Wilson Box and Taguchi design of experiments (DoE), respectively. Doing experiments at these points implies a sort of expectation of the outcome; generally that results will be linearly, orthogonal related to causes; in such cases, the extreme values are the most telling. Sorry to say, this usually isn’t how experimental data will fall out. First experimental test points according to a Wilson Box, a Taguchi, and a random experimental design. The Wilson box and Taguchi are OK choices if you know or suspect that there are no significant non-linear interactions, and where experiments can be done at these extreme points. Random is the way nature works; and I suspect that's best -- it's certainly easiest.

First experimental test points according to a Wilson Box, a Taguchi, and a random experimental design. The Wilson box and Taguchi are OK choices if you know or suspect that there are no significant non-linear interactions, and where experiments can be done at these extreme points. Random is the way nature works; and I suspect that’s best — it’s certainly easiest.

The first test-points for experiments according to the Wilson Box method and Taguchi method of experimental designs are shown on the left and center of the figure above, along with a randomly chosen set of experimental conditions on the right. Taguchi experiments are the most popular choice nowadays, especially in Japan, but as Taguchi himself points out, this approach works best if there are “few interactions between variables, and if only a few variables contribute significantly.” Wilson Box experimental choices help if there is a parabolic effect from at least one parameter, but are fairly unsuited to cases with strong cross-interactions.

Perhaps the main problems with doing experiments at extreme or cardinal points is that these experiments are usually harder than at random points, and that the results from these difficult tests generally tell you nothing you didn’t know or suspect from the start. The minimum concentration is usually zero, and the minimum temperature is usually one where reactions are too slow to matter. When you test at the minimum-minimum point, you expect to find nothing, and generally that’s what you find. In the data sets shown above, it will not be uncommon that the two minimum W-B data points, and the 3 minimum Taguchi data points, will show no measurable result at all.

Randomly selected experimental conditions are the experimental equivalent of Monte Carlo simulation, and is the method evolution uses. Set out the space of possible compositions, morphologies and test conditions as with the other method, and perhaps plot them on graph paper. Now, toss darts at the paper to pick a few compositions and sets of conditions to test; and do a few experiments. Because nature is rarely linear, you are likely to find better results and more interesting phenomena than at any of those at the extremes. After the first few experiments, when you think you understand how things work, you can pick experimental points that target an optimum extreme point, or that visit a more-interesting or representative survey of the possibilities. In any case, you’ll quickly get a sense of how things work, and how successful the experimental program will be. If nothing works at all, you may want to cancel the program early, if things work really well you’ll want to expand it. With random experimental points you do fewer worthless experiments, and you can easily increase or decrease the number of experiments in the program as funding and time allows.

Consider the simple case of choosing a composition for gunpowder. The composition itself involves only 3 or 4 components, but there is also morphology to consider including the gross structure and fine structure (degree of grinding). Instead of picking experiments at the maximum compositions: 100% salt-peter, 0% salt-peter, grinding to sub-micron size, etc., as with Taguchi, a random methodology is to pick random, easily do-able conditions: 20% S and 40% salt-peter, say. These compositions will be easier to ignite, and the results are likely to be more relevant to the project goals.

The advantages of random testing get bigger the more variables and levels you need to test. Testing 9 variables at 3 levels each takes 27 Taguchi points, but only 16 or so if the experimental points are randomly chosen. To test if the behavior is linear, you can use the results from your first 7 or 8 randomly chosen experiments, derive the vector that gives the steepest improvement in n-dimensional space (a weighted sum of all the improvement vectors), and then do another experimental point that’s as far along in the direction of that vector as you think reasonable. If your result at this point is better than at any point you’ve visited, you’re well on your way to determining the conditions of optimal operation. That’s a lot faster than by starting with 27 hard-to-do experiments. What’s more, if you don’t find an optimum; congratulate yourself, you’ve just discovered an non-linear behavior; something that would be easy to overlook with Taguchi or Wilson Box methodologies.

The basic idea is one Sherlock Holmes pointed out (Study in Scarlet): It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.” (Case of Identity). Life is infinitely stranger than anything which the mind of man could invent.

Robert E. Buxbaum, September 11, 2013. A nice description of the Wilson Box method is presented in Perry’s Handbook (6th ed). SInce I had trouble finding a free, on-line description, I linked to a paper by someone using it to test ingredient choices in baked bread. Here’s a link for more info about random experimental choice, from the University of Michigan, Chemical Engineering dept. Here’s a joke on the misuse of statistics, and a link regarding the Taguchi Methodology. Finally, here’s a pointless joke on irrational numbers, that I posted for pi-day.

The Scientific Method isn’t the method of scientists

A linchpin of middle school and high-school education is teaching ‘the scientific method.’ This is the method, students are led to believe, that scientists use to determine Truths, facts, and laws of nature. Scientists, students are told, start with a hypothesis of how things work or should work, they then devise a set of predictions based on deductive reasoning from these hypotheses, and perform some critical experiments to test the hypothesis and determine if it is true (experimentum crucis in Latin). Sorry to say, this is a path to error, and not the method that scientists use. The real method involves a few more steps, and follows a different order and path. It instead follows the path that Sherlock Holmes uses to crack a case.

The actual method of Holmes, and of science, is to avoid beginning with a hypothesis. Isaac Newton claimed: “I never make hypotheses” Instead as best we can tell, Newton, like most scientists, first gathered as much experimental evidence on a subject as possible before trying to concoct any explanation. As Holmes says (Study in Scarlet): “It is a capital mistake to theorize before you have all the evidence. It biases the judgment.”

It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts (Holmes, Scandal in Bohemia).

Holmes barely tolerates those who hypothesize before they have all the data: “It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.” (Scandal in Bohemia).

Then there is the goal of science. It is not the goal of science to confirm some theory, model, or hypothesis; every theory probably has some limited area where it’s true. The goal for any real-life scientific investigation is the desire to explain something specific and out of the ordinary, or do something cool. Similarly, with Sherlock Holmes, the start of the investigation is the arrival of a client with a specific, unusual need – one that seems a bit outside of the normal routine. Similarly, the scientist wants to do something: build a bigger bridge, understand global warming, or how DNA directs genetics; make better gunpowder, cure a disease, or Rule the World (mad scientists favor this). Once there is a fixed goal, it is the goal that should direct the next steps: it directs the collection of data, and focuses the mind on the wide variety of types of solution. As Holmes says: , “it’s wise to make one’s self aware of the potential existence of multiple hypotheses, so that one eventually may choose one that fits most or all of the facts as they become known.” It’s only when there is no goal, that any path will do

In gathering experimental data (evidence), most scientists spend months in the less-fashionable sections of the library, looking at the experimental methods and observations of others, generally from many countries, collecting any scrap that seems reasonably related to the goal at hand. I used 3 x5″ cards to catalog this data and the references. From many books and articles, one extracts enough diversity of data to be able to look for patterns and to begin to apply inductive logic. “The little things are infinitely the most important” (Case of Identity). You have to look for patterns in the data you collect. Holmes does not explain how he looks for patterns, but this skill is innate in most people to a greater or lesser extent. A nice set approach to inductive logic is called the Baconian Method, it would be nice to see schools teach it. If the author is still alive, a scientist will try to contact him or her to clarify things. In every SH mystery, Holmes does the same and is always rewarded. There is always some key fact or observation that this turns up: key information unknown to the original client.

Based on the facts collected one begins to create the framework for a variety of mathematical models: mathematics is always involved, but these models should be pretty flexible. Often the result is a tree of related, mathematical models, each highlighting some different issue, process, or problem. One then may begin to prune the tree, trying to fit the known data (facts and numbers collected), into a mathematical picture of relevant parts of this tree. There usually won’t be quite enough for a full picture, but a fair amount of progress can usually be had with the application of statistics, calculus, physics, and chemistry. These are the key skills one learns in college, but usually the high-schooler and middle schooler has not learned them very well at all. If they’ve learned math and physics, they’ve not learned it in a way to apply it to something new, quite yet (it helps to read the accounts of real scientists here — e.g. The Double Helix by J. Watson).

Usually one tries to do some experiments at this stage. Homes might visit a ship or test a poison, and a scientist might go off to his, equally-smelly laboratory. The experiments done there are rarely experimenti crucae where one can say they’ve determined the truth of a single hypothesis. Rather one wants to eliminated some hypotheses and collect data to be used to evaluate others. An answer generally requires that you have both a numerical expectation and that you’ve eliminated all reasonable explanations but one. As Holmes says often, e.g. Sign of the four, “when you have excluded the impossible, whatever remains, however improbable, must be the truth”. The middle part of a scientific investigation generally involves these practical experiments to prune the tree of possibilities and determine the coefficients of relevant terms in the mathematical model: the weight or capacity of a bridge of a certain design, the likely effect of CO2 on global temperature, the dose response of a drug, or the temperature and burn rate of different gunpowder mixes. Though not mentioned by Holmes, it is critically important in science to aim for observations that have numbers attached.

The destruction of false aspects and models is a very important part of any study. Francis Bacon calls this act destruction of idols of the mind, and it includes many parts: destroying commonly held presuppositions, avoiding personal preferences, avoiding the tendency to see a closer relationship than can be justified, etc.

In science, one eliminates the impossible through the use of numbers and math, generally based on your laboratory observations. When you attempt to the numbers associated with our observations to the various possible models some will take the data well, some poorly; and some twill not fit the data at all. Apply the deductive reasoning that is taught in schools: logical, Boolean, step by step; if some aspect of a model does not fit, it is likely the model is wrong. If we have shown that all men are mortal, and we are comfortable that Socrates is a man, then it is far better to conclude that Socrates is mortal than to conclude that all men but Socrates is mortal (Occam’s razor). This is the sort of reasoning that computers are really good at (better than humans, actually). It all rests on the inductive pattern searches similarities and differences — that we started with, and very often we find we are missing a piece, e.g. we still need to determine that all men are indeed mortal, or that Socrates is a man. It’s back to the lab; this is why PhDs often take 5-6 years, and not the 3-4 that one hopes for at the start.

More often than not we find we have a theory or two (or three), but not quite all the pieces in place to get to our goal (whatever that was), but at least there’s a clearer path, and often more than one. Since science is goal oriented, we’re likely to find a more efficient than we fist thought. E.g. instead of proving that all men are mortal, show it to be true of Greek men, that is for all two-legged, fairly hairless beings who speak Greek. All we must show is that few Greeks live beyond 130 years, and that Socrates is one of them.

Putting numerical values on the mathematical relationship is a critical step in all science, as is the use of models — mathematical and otherwise. The path to measure the life expectancy of Greeks will generally involve looking at a sample population. A scientist calls this a model. He will analyze this model using statistical model of average and standard deviation and will derive his or her conclusions from there. It is only now that you have a hypothesis, but it’s still based on a model. In health experiments the model is typically a sample of animals (experiments on people are often illegal and take too long). For bridge experiments one uses small wood or metal models; and for chemical experiments, one uses small samples. Numbers and ratios are the key to making these models relevant in the real world. A hypothesis of this sort, backed by numbers is publishable, and is as far as you can go when dealing with the past (e.g. why Germany lost WW2, or why the dinosaurs died off) but the gold-standard of science is predictability.  Thus, while we a confident that Socrates is definitely mortal, we’re not 100% certain that global warming is real — in fact, it seems to have stopped though CO2 levels are rising. To be 100% sure you’re right about global warming we have to make predictions, e.g. that the temperature will have risen 7 degrees in the last 14 years (it has not), or Al Gore’s prediction that the sea will rise 8 meters by 2106 (this seems unlikely at the current time). This is not to blame the scientists whose predictions don’t pan out, “We balance probabilities and choose the most likely. It is the scientific use of the imagination” (Hound of the Baskervilles)The hope is that everything matches; but sometimes we must look for an alternative; that’s happened rarely in my research, but it’s happened.

You are now at the conclusion of the scientific process. In fiction, this is where the criminal is led away in chains (or not, as with “The Woman,” “The Adventure of the Yellow Face,” or of “The Blue Carbuncle” where Holmes lets the criminal free — “It’s Christmas”). For most research the conclusion includes writing a good research paper “Nothing clears up a case so much as stating it to another person”(Memoirs). For a PhD, this is followed by the search for a good job. For a commercial researcher, it’s a new product or product improvement. For the mad scientist, that conclusion is the goal: taking over the world and enslaving the population (or not; typically the scientist is thwarted by some detail!). But for the professor or professional research scientist, the goal is never quite reached; it’s a stepping stone to a grant application to do further work, and from there to tenure. In the case of the Socrates mortality work, the scientist might ask for money to go from country to country, measuring life-spans to demonstrate that all philosophers are mortal. This isn’t as pointless and self-serving as it seems, Follow-up work is easier than the first work since you’ve already got half of it done, and you sometimes find something interesting, e.g. about diet and life-span, or diseases, etc. I did some 70 papers when I was a professor, some on diet and lifespan.

One should avoid making some horrible bad logical conclusion at the end, by the way. It always seems to happen that the mad scientist is thwarted at the end; the greatest criminal masterminds are tripped by some last-minute flaw. Similarly the scientist must not make that last-mistep. “One should always look for a possible alternative, and provide against it” (Adventure of Black Peter). Just because you’ve demonstrated that  iodine kills germs, and you know that germs cause disease, please don’t conclude that drinking iodine will cure your disease. That’s the sort of science mistakes that were common in the middle ages, and show up far too often today. In the last steps, as in the first, follow the inductive and quantitative methods of Paracelsus to the end: look for numbers, (not a Holmes quote) check how quantity and location affects things. In the case of antiseptics, Paracelsus noticed that only external cleaning helped and that the help was dose sensitive.

As an example in the 20th century, don’t just conclude that, because bullets kill, removing the bullets is a good idea. It is likely that the trauma and infection of removing the bullet is what killed Lincoln, Garfield, and McKinley. Theodore Roosevelt was shot too, but decided to leave his bullet where it was, noticing that many shot animals and soldiers lived for years with bullets in them; and Roosevelt lived for 8 more years. Don’t make these last-minute missteps: though it’s logical to think that removing guns will reduce crime, the evidence does not support that. Don’t let a leap of bad deduction at the end ruin a line of good science. “A few flies make the ointment rancid,” said Solomon. Here’s how to do statistics on data that’s taken randomly.

Dr. Robert E. Buxbaum, scientist and Holmes fan wrote this, Sept 2, 2013. My thanks to Lou Manzione, a friend from college and grad school, who suggested I reread all of Holmes early in my PhD work, and to Wikiquote, a wonderful site where I found the Holmes quotes; the Solomon quote I knew, and the others I made up.

Slowing Cancer with Fish and Unhealth Food

Some 25 years ago, while still a chemical engineering professor at Michigan State University, I did some statistical work for a group in the Physiology department on the relationship between diet and cancer. The research involved giving cancer to groups of rats and feeding them different diets of the same calorie intake to see which promoted or slowed the disease. It had been determined that low-calorie diets slowed cancer growth, and were good for longevity in general, while overweight rats died young (true in humans too, by the way, though there’s a limit and starvation will kill you).

The group found that fish oil was generally good for you, but they found that there were several unhealthy foods that slowed cancer growth in rats. The statistics were clouded by the fact that cancer growth rates are not normally distributed, and I was brought in to help untangle the observations.

With help from probability paper (a favorite trick of mine), I confirmed that healthy rats fared better on healthily diets, but cancerous rats did better with some unhealth food. Sick or well, all rats did best with fish oil, and all rats did pretty well with olive oil, but the cancerous rats did better with lard or palm oil (normally an unhealthy diet) and very poorly with corn oil or canola, oils that are normally healthful. The results are published in several articles in the journals “Cancer” and “Cancer Research.”

Among vitamins, they found something similar (it was before I joined the group). Several anti-oxidizing vitamins, A, D and E made things worse for carcinogenic rats while being good for healthy rats (and for people in moderation). Moderation is key; too much of a good thing isn’t good, and a diet with too much fish oil promotes cancer.

What seems to be happening is that the cancer cells grow at the same rate with all of the equi-caloric diets, but that there was a difference the rate of natural cancer cell death. More cancer cells died when the rat was fed junk food oils than those fed a diet of corn oil and canola. Similarly, the reason anti-oxidizing vitamins hurt cancerous rats was that fewer cancer cells died when the rats were fed these vitamins. A working hypothesis is that the junk oils (and the fish oil) produced free radicals that did more damage to the cancer than to the rats. In healthy rats (and people), these free radicals are bad, promoting cell mutation, cell degradation, and sometimes cancer. But perhaps our body use these same free radicals to fight disease.

Larger amounts of vitamins A, D, and E hurt cancerous-rats by removing the free radicals they normally use fight the disease, or so our model went. Bad oils and fish-oil in moderation, with calorie intake held constant, helped slow the cancer, by a presumed mechanism of adding a few more free radicals. Fish oil, it can be assumed, killed some healthy cells in the healthy rats too, but not enough to cause problems when taken in moderation. Even healthy people are often benefitted by poisons like sunlight, coffee, alcohol and radiation.

At this point, a warning is in-order: Don’t rely on fish oil and lard as home remedies if you’ve got cancer. Rats are not people, and your calorie intake is not held artificially constant with no other treatments given. Get treated by a real doctor — he or she will use radiation and/ or real drugs, and those will form the right amount of free radicals, targeted to the right places. Our rats were given massive amounts of cancer and had no other treatment besides diet. Excess vitamin A has been shown to be bad for humans under treatment for lung cancer, and that’s perhaps because of the mechanism we imagine, or perhaps everything works by some other mechanism. However it works, a little fish in your diet is probably a good idea whether you are sick or well.

A simpler health trick is that it couldn’t hurt most Americans is a lower calorie diet, especially if combined with exercise. Dr. Mites, a colleague of mine in the department (now deceased at 90+) liked to say that, if exercise could be put into a pill, it would be the most prescribed drug in America. There are few things that would benefit most Americans more than (moderate) exercise. There was a sign in the physiology office, perhaps his doing, “If it’s physical, it’s therapy.”

Anyway these are some useful things I learned as an associate professor in the physiology department at Michigan State. I ended up writing 30-35 physiology papers, e.g. on how cells crawl and cell regulation through architecture; and I met a lot of cool people. Perhaps I’ll blog more about health, biology, the body, or about non-normal statistics and probability paper. Please tell me what you’re interested in, or give me some keen insights of your own.

Dr. Robert Buxbaum is a Chemical Engineer who mostly works in hydrogen I’ve published some 75 technical papers, including two each in Science and Nature: fancy magazines that you’d normally have to pay for, but this blog is free. August 14, 2013

Control engineer joke

What made the control engineer go crazy?

 

He got positive feedback.

Is funny because …… it’s a double entente, where both meanings are true: (1) control engineers very rarely get compliments (positive feedback); the aim of control is perfection, something that’s unachievable for a dynamic system (and generally similar to near perfection: the slope at a maximum is zero). Also (2) systems go unstable if the control feedback is positive. This can happen if the controller was set backwards, but more usually happens when the response is too fast or too extreme. Positive feedback pushes a system further to error and the process either blows up, or (more commonly) goes wildly chaotic, oscillating between two or more “strange attractor” states.

It seems to me that hypnosis, control-freak love, and cult behaviors are the result of intentionally produced positive feedback. Palsies, economic cycles, and global warming are more likely the result of unintentional positive feedback. In each case, the behavior is oscillatory chaotic.

The  normal state of Engineering is lack of feedback. Perhaps this is good because messed up feedback leads to worse results. From xykd.

Our brains give little reliable feedback on how well they work, but that may be better than strong, immediate feedback, as that could lead to bipolar instability. From xkcd. For more on this idea, see Science and Sanity, by Alfred Korzbski (mini youtube)

Control engineers tend to be male (85%), married (80%), happy people (at least they claim to be happy). Perhaps they know that near-perfection is close enough for a complex system in a dynamic world, or that one is about as happy as believes ones-self to be. It also helps that control engineer salaries are about $95,000/ year with excellent benefits and low employment turnover.

Here’s a chemical engineer joke I made up, and an older engineering joke. If you like, I’ll be happy to consult with you on the behavior of your processes.

By Dr. Robert E. Buxbaum, July 4, 2013

Another Quantum Joke, and Schrödinger’s waves derived

Quantum mechanics joke. from xkcd.

Quantum mechanics joke. from xkcd.

Is funny because … it’s is a double entente on the words grain (as in grainy) and waves, as in Schrödinger waves or “amber waves of grain” in the song America (Oh Beautiful). In Schrödinger’s view of the quantum world everything seems to exist or move as a wave until you observe it, and then it always becomes a particle. The math to solve for the energy of things is simple, and thus the equation is useful, but it’s hard to understand why,  e.g. when you solve for the behavior of a particle (atom) in a double slit experiment you have to imagine that the particle behaves as an insubstantial wave traveling though both slits until it’s observed. And only then behaves as a completely solid particle.

Math equations can always be rewritten, though, and science works in the language of math. The different forms appear to have different meaning but they don’t since they have the same practical predictions. Because of this freedom of meaning (and some other things) science is the opposite of religion. Other mathematical formalisms for quantum mechanics may be more comforting, or less, but most avoid the wave-particle duality.

The first formalism was Heisenberg’s uncertainty. At the end of this post, I show that it is identical mathematically to Schrödinger’s wave view. Heisenberg’s version showed up in two quantum jokes that I explained (beat into the ground), one about a lightbulb  and one about Heisenberg in a car (also explains why water is wet or why hydrogen diffuses through metals so quickly).

Yet another quantum formalism involves Feynman’s little diagrams. One assumes that matter follows every possible path (the multiple universe view) and that time should go backwards. As a result, we expect that antimatter apples should fall up. Experiments are underway at CERN to test if they do fall up, and by next year we should finally know if they do. Even if anti-apples don’t fall up, that won’t mean this formalism is wrong, BTW: all identical math forms are identical, and we don’t understand gravity well in any of them.

Yet another identical formalism (my favorite) involves imagining that matter has a real and an imaginary part. In this formalism, the components move independently by diffusion, and as a result look like waves: exp (-it) = cost t + i sin t. You can’t observe the two parts independently though, only the following product of the real and imaginary part: (the real + imaginary part) x (the real – imaginary part). Slightly different math, same results, different ways of thinking of things.

Because of quantum mechanics, hydrogen diffuses very quickly in metals: in some metals quicker than most anything in water. This is the basis of REB Research metal membrane hydrogen purifiers and also causes hydrogen embrittlement (explained, perhaps in some later post). All other elements go through metals much slower than hydrogen allowing us to make hydrogen purifiers that are effectively 100% selective. Our membranes also separate different hydrogen isotopes from each other by quantum effects (big things tunnel slower). Among the uses for our hydrogen filters is for gas chromatography, dynamo cooling, and to reduce the likelihood of nuclear accidents.

Dr. Robert E. Buxbaum, June 18, 2013.

To see Schrödinger’s wave equation derived from Heisenberg for non-changing (time independent) items, go here and note that, for a standing wave there is a vibration in time, though no net change. Start with a version of Heisenberg uncertainty: h =  λp where the uncertainty in length = wavelength = λ and the uncertainty in momentum = momentum = p. The kinetic energy, KE = 1/2 p2/m, and KE+U(x) =E where E is the total energy of the particle or atom, and U(x) is the potential energy, some function of position only. Thus, p = √2m(E-PE). Assume that the particle can be described by a standing wave with a physical description, ψ, and an imaginary vibration you can’t ever see, exp(-iωt). And assume this time and space are completely separable — an OK assumption if you ignore gravity and if your potential fields move slowly relative to the speed of light. Now read the section, follow the derivation, and go through the worked problems. Most useful applications of QM can be derived using this time-independent version of Schrödinger’s wave equation.

Musical Color and the Well Tempered Scale

by R. E. Buxbaum, (the author of all these posts)

I first heard J. S. Bach’s Well Tempered Clavier some 35 years ago and was struck by the different colors of the different scales. Some were dark and scary, others were light and enjoyable. All of them worked, but each was distinct, though I could not figure out why. That Bach was able to write in all the keys without retuning was a key innovation of his. In his day, people tuned in fifths, a process that created gaps (called wolf) that prevented useful composition in affected keys.

We don’t know exactly how Bach tuned his instruments as he had no scientific way to describe it; we can guess that it was more uniform than the temper produced by tuning in fifths, but it probably was not quite equally spaced. Nowadays electronic keyboards are tuned to 12 equally spaced frequencies per octave through the use of frequency counters.  Starting with the A below “middle C”, A4, tuned at 440 cycles/second (the note symphonies tune to), each note is programmed to vibrate at a wavelength that is lower or higher than one next to it by a factor of the twelfth root of two, 12√2= 1.05946. After 12 multiples of this size, the wavelength has doubled or halved and there is an octave. This is called equal tempering.

Currently, many non-electric instruments are also tuned this way.  Equally tempering avoids all wolf, but makes each note equally ill-tempered. Any key can be transposed to another, but there are no pure harmonies because 12√2 is an irrational number (see joke). There is also no color or feel to any given key except that which has carried over historically in the listeners’ memory. It’s sad.

I’m going to speculate that J.S. Bach found/ favored a way to tune instruments where all of the keys were usable, and OK sounding, but where some harmonies are more perfect than others. Necessarily this means that some harmonies will be less-perfect. There should be no wolf gaps that would sound so bad that Bach could not compose and transpose in every key, but since there is a difference, each key will retain a distinct color that JS Bach explored in his work — or so I’ll assume.

Pythagoras found that notes sound best together when the vibrating lengths are kept in a ratio of small numbers. Consider the tuning note, A4, the A below middle C; this note vibrates a column of air .784 meters long, about 2.5 feet or half the length of an oboe. The octave notes for Aare called A3 and A5. They vibrate columns of air 2x as long and 1/2 as long as the original. They’re called octaves because they’re eight white keys away from A4. Keyboards add 4 black notes per octave so octaves are always 12 notes away. Keyboards are generally tuned so octaves are always 12 keys away. Based on Pythagoras, a reasonable presumption is that J.S Bach tuned every non-octave note so that it vibrates an air column similar to the equal tuning ratio, 12√2 = 1.05946, but whose wavelength was adjusted, in some cases to make ratios of small, whole numbers with the wavelength for A4.

Aside from octaves, the most pleasant harmonies are with notes whose wavelength is 3/2 as long as the original, or 2/3 as long. The best harmonies with A4 (0.784 m) will be with notes with wavelengths (3/2)*0.784 m long, or (2/3)*0.784m long. The first of these is called D3 and the other is E4. A4 combines with D3 to make a chord called D-major, the so-called “the key of glory.” The Hallelujah chorus, Beethoven’s 9th (Ode to Joy), and Mahler’s Titan are in this key. Scriabin believed that D-major had a unique color, gold, suggesting that the pure ratios were retained.

A combines with E (plus a black note C#) to make a chord called A major. Songs in this key sound (to my ear) robust, cheerful and somewhat pompous; Here, in A-major is: Dancing Queen by ABBA, Lady Madonna by the BeatlesPrelude and Fugue in A major by JS Bach. Scriabin believed that A-major was green.

A4 also combines with E and a new white note, C3, to make a chord called A minor. Since E4 and E3 vibrate at 2/3 and 4/3 the wavelength of A4 respectively, I’ll speculate that Bach tuned C3 to 5/3 the length of A4; 5/3*.0784m =1.307m long. Tuned his way, the ratio of wavelengths in the A minor chord are 3:4:5. Songs in A minor tend to be edgy and sort-of sad: Stairway to heaven, Für Elise“Songs in A Minor sung by Alicia Keys, and PDQ Bach’s Fugue in A minor. I’m going to speculate the Bach tuned this to 1.312 m (or thereabouts), roughly half-way between the wavelength for a pure ratio and that of equal temper.

The notes D3 and Ewill not sound particularly good together. In both pure ratios and equal tempers their wavelengths are in a ratio of 3/2 to 4/3, that is a ratio of 9 to 8. This can be a tensional transition, but it does not provide a satisfying resolution to my, western ears.

Now for the other white notes. The next white key over from A4 is G3, two half-tones longer that for A4. For equal tuning, we’d expect this note to vibrate a column of air 1.05946= 1.1225 times longer than A4. The most similar ratio of small whole numbers is 9/8 = 1.1250, and we’d already generated one before between D and E. As a result, we may expect that Bach tuned G3 to a wavelength 9/8*0.784m = .88 meters.

For equal tuning, the next white note, F3, will vibrate an air column 1.059464 = 1.259 times as long as the A4 column. Tuned this way, the wavelength for F3 is 1.259*.784 = .988m. Alternately, since 1.259 is similar to 5/4 = 1.25, it is reasonable to tune F3 as (5/4)*.784 = .980m. I’ll speculate that he split the difference: .984m. F, A, and C combine to make a good harmony called the F major chord. The most popular pieces in F major sound woozy and not-quite settled in my opinion, perhaps because of the oddness of the F tuning. See, e.g. the Jeopardy theme song, “My Sweet Lord,Come together (Beetles)Beethoven’s Pastoral symphony (Movement 1, “Awakening of cheerful feelings upon arrival in the country”). Scriabin saw F-major as bright blue.

We’ve only one more white note to go in this octave: B4, the other tension note to A4. Since the wavelengths for G3 was 9/8 as long as for A4, we can expect the wavelength for B4 will be 8/9 as long. This will be dissonant to A4, but it will go well with E3 and E4 as these were 2/3 and 4/3 of A4 respectively. Tuned this way, B4 vibrates a column 1.40 m. When B, in any octave, is combined with E it’s called an E chord (E major or E minor); it’s typically combined with a black key, G-sharp (G#). The notes B, E vibrate at a ratio of 4 to 3. J.S. Bach called the G#, “H” allowing him to spell out his name in his music. When he played the sequence BACH, he found B to A created tension; moving to C created harmony with A, but not B, while the final note, G# (H) provided harmony for C and the original B. Here’s how it works on cello; it’s not bad, but there is no grand resolution. The Promenade from “Pictures at an Exhibition” is in E.

The black notes go somewhere between the larger gaps of the white notes, and there is a traditional confusion in how to tune them. One can tune the black notes by equal temper  (multiples of 21/12), or set them exactly in the spaces between the white notes, or tune them to any alternate set of ratios. A popular set of ratios is found in “Just temper.” The black note 6 from A4 (D#) will have wavelength of 0.784*26/12= √2 *0.784 m =1.109m. Since √2 =1.414, and that this is about 1.4= 7/5, the “Just temper” method is to tune D# to 1.4*.784m =1.098m. If one takes this route, other black notes (F#3 and C#3) will be tuned to ratios of 6/5, and 8/5 times 0.784m respectively. It’s possible that J.S. Bach tuned his notes by Just temper, but I suspect not. I suspect that Bach tuned these notes to fall in-between Just Temper and Equal temper, as I’ve shown below. I suspect that his D#3 might vibrated at about 1.104 m, half way between Just and Equal temper. I would not be surprised if Jazz musicians tuned their black notes more closely to the fifths of Just temper: 5/5 6/5, 7/5, 8/5 (and 9/5?) because jazz uses the black notes more, and you generally want your main chords to sound in tune. Then again, maybe not. Jimmy Hendrix picked the harmony D#3 with A (“Diabolus”, the devil harmony) for his Purple Haze; it’s also used for European police sirens.

To my ear, the modified equal temper is more beautiful and interesting than the equal temperament of todays electronic keyboards. In either temper music plays in all keys, but with an un-equal temper each key is distinct and beautiful in its own way. Tuning is engineering, I think, rather than math or art. In math things have to be perfect; in art they have to be interesting, and in engineering they have to work. Engineering tends to be beautiful its way. Generally, though, engineering is not perfect.

Summary of air column wave-lengths, measured in meters, and as a ratio to that for A4. Just Tempering, Equal Tempering, and my best guess of J.S. Bach's Well Tempered scale.

Summary of air column wave-lengths, measured in meters, and as a ratio to that for A4. Just Tempering, Equal Tempering, and my best guess of J.S. Bach’s Well Tempered scale.

R.E. Buxbaum, May 20 2013 (edited Sept 23, 2013) — I’m not very musical, but my children are.