if everyone agrees, something is wrong

I thought I’d try to semi-derive, and explain a remarkable mathematical paper that was published last month in The Proceedings of the Royal Society A (see full paper here). The paper demonstrates that too much agreement about a thing is counter-indicative of the thing being true. Unless an observation is blindingly obvious, near 100% agreement suggests there is a hidden flaw or conspiracy, perhaps unknown to the observers. This paper has broad application, but I thought the presentation was too confusing for most people to make use of, even those with a background in mathematics, science, or engineering. And the popular versions press versions didn’t even try to be useful. So here’s my shot:

Figure 2 from the original paper. For a method that is 80% accurate, you get your maximum reliability at the third to fifth witness. Beyond that, more agreement suggest a flaw in the people or procedure.

Figure 2 from the original paper. For a method that is 80% accurate, you get your maximum reliability at 3-5 witnesses. More agreement suggests a flaw in the people or procedure.

I will discuss only on specific application, the second one mentioned in the paper, crime (read the paper for others). Lets say there’s been a crime with several witnesses. The police line up a half-dozen, equal (?) suspects, and show them to the first witness. Lets say the first witness points to one of the suspects, the police will not arrest on this because they know that people correctly identify suspects only about 40% of the time, and incorrectly identify perhaps 10% (the say they don’t know or can’t remember the remaining 50% of time). The original paper includes the actual factions here; they’re similar. Since the witness pointed to someone, you already know he/she isn’t among the 50% who don’t know. But you don’t know if this witness is among the 40% who identify right or the 10% who identify wrong. Our confidence that this is the criminal is thus .4/(.4 +.1) = .8, or 80%.

Now you bring in the second witness. If this person identifies the same suspect, your confidence increases; to roughly (.4)2/(.42+.12) = .941,  or 94.1%. This is enough to make an arrest, but let’s say you have ten more witnesses, and all identify this same person. You might first think that this must be the guy with a confidence of (.4)10/(.410+.110) = 99.99999%, but then you wonder how unlikely it is to find ten people who identify correctly when, as we mentioned, each person has only a 40% chance. The chance of all ten witnesses identifying a suspect right is small: (.4)10 = .000104 or 0.01%. This fraction is smaller than the likelihood of having a crooked cop or a screw up the line-up (only one suspect had the right jacket, say). If crooked cops and systemic errors show up 1% of the time, and point to the correct fellow only 15% of these, we find that the chance of being right if ten out of ten agree is (0.0015 +(.4)10)/( .01+ .410+.110) = .16%. Total agreement on guilt suggests the fellow is innocent!

The graph above, the second in the paper, presents a generalization of the math I just presented: n identical tests of 80% accuracy and three different likelihoods of systemic failure. If this systemic failure rate is 1% and the chance of the error pointing right or wrong is 50/50, the chance of being right is P = (.005+ .4n)/(.01 +.4n+.1n), and is the red curve in the graph above. The authors find you get your maximum reliability when there are two to four agreeing witness.

Confidence of guilt as related to the number of judges that agree and your confidence in the integrity of the judges.

Confidence of guilt as related to the number of judges that agree and the integrity of the judges.

The Royal Society article went on to a approve of a feature of Jewish capital-punishment law. In Jewish law, capital cases are tried by 23 judges. To convict a super majority (13) must find guilty, but if all 23 judges agree on guilt the court pronounces innocent (see chart, or an anecdote about Justice Antonin Scalia). My suspicion, by the way, is that more than 1% of judges and police are crooked or inept, and that the same applies to scientific analysis of mental diseases like diagnosing ADHD or autism, and predictions about stocks or climate change. (Do 98% of scientists really agree independently?). Perhaps there are so many people in US prisons, because of excessive agreement and inaccurate witnesses, e.g Ruben Carter. I suspect the agreement on climate experts is a similar sham.

Robert Buxbaum, March 11, 2016. Here are some thoughts on how to do science right. Here is some climate data: can you spot a clear pattern of man-made change?

Leave a Reply