Social science is irreproducible, drug tests nonreplicable, and stoves studies ignore confounders.

Efforts to replicate the results of the most prominent studies in health and social science have found them largely irreproducible with the worst replicability appearing in cancer drug research. The figure below, from “The Reproducibility Project in Cancer Biology, Errington et al. 2021, compares the reported effects in 50 cancer drug experiments from 23 papers with the results from repeated versions of the same experiments, looking at a total of 158 effects.

Graph comparing the original, published effect of a cancer drug with the replication effect. The units are whatever units were used in the original study, percent, or risk ratio, etc. From “Investigating the replicability of preclinical cancer biology,”
Timothy M Errington et al. Center for Open Science, United States; Stanford University, Dec 7, 2021, https://doi.org/10.7554/eLife.71601.

It’s seen that virtually none of the drugs are found to work the same as originally reported. Those below the dotted, horizontal line behaved the opposite in the replication studies. About half, those shown in pink, showed no significant effect. Of those that showed positive behavior as originally published, mostly they show about half the activity with two drugs that now appear to be far more active. A favorite web-site of mine, retraction watch, is filled with retractions of articles on these drugs.

The general lack of replicability has been called a crisis. It was first seen in the social sciences, e.g. the figure below from this article in Science, 2015. Psychology research is bad enough such that Nobel Laureate, Daniel Kahneman, came to disown most of the conclusions in his book, “Thinking, Fast and Slow“. The experiments that underly his major sections don’t replicate. Take, for example, social printing. Classic studies had claimed that, if you take a group of students and have them fill out surveys with words about the aged or the flag, they will then walk slower from the survey room or stand longer near a flag. All efforts to reproduce these studies have failed. We now think they are not true. The problem here is that much of education and social engineering is based on such studies. Public policy too. The lack of replicability throws doubt on much of what modern society thinks and does. We like to have experts we can trust; we now have experts we can’t.

From “Estimating the reproducibility of psychological science” Science, 2015. Social science replication is better than dance drug replication, about 35% of the classic social science studies replicate to some, reasonable extent.

Are gas stoves dangerous? This 2022 environmental study said they are, claiming with 95% confidence that they are responsible for 12.7% of childhood asthma. I doubt the study will be reproducible for reasons I’ll detail below, but for now it’s science, and it may soon be law.

Part of the replication problem is that researchers have been found to lie. They fudge data or eliminate undesirable results, some more some less, and a few are honest, but the journals don’t bother checking. Some researchers convince themselves that they are doing the world a favor, but many seem money-motivated. A foundational study on Alzheimers was faked outright. The authors doctored photos using photoshop, and used the fake results to justify approval of non-working, expensive drugs. The researchers got $1B in NIH funding too. I’d want to see the researchers jailed, long term: it’s grand larceny and a serious violation of trust.

Another cause of this replication crisis — one that particularly hurt Daniel Kahneman’s book — is that many social science researchers do statistically illegitimate studies on populations that are vastly too small to give reliable results. Then, they only publish the results they like. The graph of z-values shown below suggest this is common, at least in some journals, including “Personality and social psychology Bulletin”. The vast fraction of results at ≥95% confidence suggest that researchers don’t publish the 90-95% of their work that doesn’t fit the desired hypothesis. While there has been no detailed analysis of all the social science research, it’s clear that this method was used to show that GMO grains caused cancer. The researcher did many small studies, and only published the one study where GMOs appeared to cause cancer. I review the GMO study here.

From Ulrich Schimmack, ReplicationIndex.com, January, 2023, https://replicationindex.com/2023/01/08/which-social-psychologists-can-you-trust/. If you really want to get into this he is a great resource.

The chart at left shows Z-scores, were Z = ∆X √n/σ. A Z score above 1.93 generally indicates significance, p < .05. Notice that almost all the studies have Z scores just over 1.93 that is almost all the studies proved their hypothesis at 95% confidence. That makes it seem that the researchers were very lucky, near prescient. But it’s clear from the distribution that there were a lot of studies that done but never shown to the public. That is a lot of data that was thrown out, either by the researchers or by the publishers. If all data was published, you’d expect to see a bell curve. Instead the Z values are of a tiny bit of a bell curve, just the tail end. The implication is that these studies with Z= >1.93 suggest far less than 95% confidence. This then shows up in the results being only 25% reproducible. It’s been suggested that you should not throw out all the results in the journal, just look for Z-scores of 3.6 or more. That leaves you with the top 23%, and these should have a good chance of being reproducible. The top graph somewhat supports this, but it’s not that simple.

Another classic way to cook the books, as it were, and make irreproducible studies provide the results you seek is to ignore “confounders.” This leads to association – causation errors. As an example, it’s observed that people taking aspirin have more heart attacks than those who do not, but the confounder is that aspirin is prescribed to those with heart problems; the aspirin actually helps, but appears to hurt. In the case of stoves, it seems likely that poorer, sicker people own gas, and that they live in older, moldy homes, and cook more at home, frying onions, etc. These are confounders that the study to my reading ignores. They could easily be the reason that gas stove owners get more asthma toxins than the rich folks who own electric, induction stoves. If you confuse association, you seem to find that owning the wrong stove causes you to be poor and sick with a moldy home. I suspect that the stove study will not replicate if they correct for the confounders.

I’d like to recommend a book, hardly mathematical, “How to Lie with Statistics” by Darrell Huff ($8.99 on Amazon). I read it in high school. It gives you a sense of what to look out for. I should also mention Dr. Anthony Fauci. He has been going around to campuses saying we should have zero tolerance for those who deny science, particularly health science. Given that so much of health science research is nonreplicable, I’d recommend questioning all of it. Here is a classic clip from the 1973 movie, ‘Sleeper’, where a health food expert wakes up in 2173 to discover that health science has changed.

Robert Buxbaum , February 7, 2023.

12 thoughts on “Social science is irreproducible, drug tests nonreplicable, and stoves studies ignore confounders.

  1. Pingback: Every food causes cancer, and cures it, research shows. | REB Research Blog

  2. Pingback: Vaccines barely work, lockdowns may have made it worse. | REB Research Blog

  3. Pingback: Vaccines barely worked, lockdowns may have made it worse. | REB Research Blog

  4. Pingback: Monday Newsfeed: And You Can Take That to the Bank! | The Universal Spectator

  5. PM

    I wanted to compliment the author of this post for tying together nicely various datapoints that demonstrate the crisis.

    I believe we are close to a tipping point where the reproducibility crisis is common and accepted knowledge, at least in the academy, but are much further away from the critical mass of sentiment necessary to generate sufficient momentum for fundamental reform. As I mentioned in my previous comment, (I think) it’s a matter of incentives.

    Reply
  6. therandomtexan

    Even well-organized and transparent studies that show “significant” effects should be taken with a grain of salt, Reproducibility is essential, and should be confirmed with careful systematic reviews and one or more transparent meta-analyses. All too many recent research findings are “science by press release,” rather than carefully reproduced results. Remember, science is never settled.

    Reply
    1. Joe Smith

      I don’t know if it’s the same study you’re referring to, but one of the studies of gas stoves that supposedly shows they are a health hazard was conducted by surrounding the area of the stove with thick, ceiling-to-floor plastic curtains, running the stove full blast for 90 minutes with no ventilation inside the enclosure and then measuring the air quality inside enclosure. So this proves that if you totally seal off your kitchen and have no fan, it’s going to suck for your health; fortunately, most of us are smarter than that. That’s outright rigging the study to get exactly the results you want, if you ask me.

      Reply
  7. kamas716

    I remember having to take not only statistics in college, but also ethics. Either many of these researchers didn’t, or it didn’t stick. I think the worst part, at least for me, is the loss of confidence in “experts” that this generates in the public. Fauci and Walensky are probably the two biggest, most notable, of recent events. There are many institutions, particularly government ones, that have lost the trust of vast portions of the public that they won’t ever get back.

    Falsifying results for monetary gain should result in criminal fraud convictions. Unless it’s a very large or notable case (like Theranos) I doubt we’ll see much of that.

    Reply
  8. T Hershel Gardin

    I grant you that there is much fraud today in published “science”. Certainly, the adage, “follow the money” applies here. However, I focus on the education system during the past couple or more decades, especially the concept of “value free” education. It is precisely what leads to all that has been fudged during the same time period. However, it seems to me that earlier science (including social) was not as rife with non-replicable results. As a retired social scientist, I can point to findings that remain as replicable today as they appeared 50+ years ago. Some examples:
    Operant and classical conditioning of humans
    Seating position impact on cooperation and competition
    The power of conformity, especially in unequal power situations

    All of the above examples and many more have been reproduced from generation to generation bu honest, competent researchers since they were first observed.

    Reply
    1. R.E. Buxbaum Post author

      Things are worse than you think. Conditioning studies have almost entirely failed to be replicable, for example. It tripped up Dr. Kahneman, among many others. I think I linked to his retraction comments.

      Reply
    2. PM

      The replicable results you cited seem to track closely with common sense.

      Can anyone offer some good social science examples that are robust, replicable, but completely counterintuitive?

      Really the alarm bells in our hallowed halls of erudition should be ringing about the replicability crisis. As it stands hoi pilloi haven’t savvied to the game—yet, and, as we saw with the slavish, dogmatic aping of Dr Fao Chi’s recondite pronouncements, especially by worshippers of teh scienz™, even purportedly “well-educated” people by and large haven’t the faintest clue there is a crisis unless they are directly involved in research—of course excepting your lordship, good reader. The Gell-Mann Amnesia Effect rules the gullible reactions to the latest headlines blaring about some random “study that found X” which therefore entails global inferences and sweeping societal changes—no need to understand the details, thank you, let’s get cracking. Oh, and the study also proves the truth of my Weltanschauung beyond all doubt, natch.

      But eventually the rogue scientists who go off reservation will penetrate the thick skulls of the general public and tip them off to the generalized, wide scale, massive fraud being perpetrated on them by the scientific establishment, as huge meaty chunks of GDP are tossed into its maw with the understanding of tight professional standards and the expectation of concrete pay off in the form of increased knowledge, greater productivity, higher standard of living, etc. As any lover will tell you, trust, once lost, is hard to win back.

      What would help? Hey, I dunno, ask an expert, lol! But Siriusly, we have a long way to go (8.7 ly—lol!) in reining in abuses of the scientific method that have become commonplace in the academy. Journals should adopt minimum standards. Studies should be designed with a statistician. Data and hypotheses should be rigorously reported. Raw data and computer code should be publicly accessible. Individual fields should adopt particularized ‘gold standard’ methodologies. Independent confirmation of results should be built into studies. Repositories of verified results should be complied in publicly accessible data banks. But the academy, as currently instantiated, has a number of economic and professional incentives that work against needed reforms.

      I feelz like part of the problem is a blurring between the sciences and the humanities, where out of science envy or to lend a patina of authori-tay to one’s conclusions, they are dressed up in the form and language of the scientific method.

      But more important than any particular remedy is that the incentives for scientists and institutions are properly aligned with/for the production of reproducible results. Have we learned nothing from the operation of markets? I know teh scienz™ is supposed to be above the petty, coarse world of private interests, but nothing in human affairs really is. Capitalism vanquishes all comers because it harnesses self-interest for wider benefit. Communism fails because it depends on human perfectibility and men rising above self-interest. Let’s not pretend that science exists in some special rarefied dimension where human frailties don’t exist.

      But I digress…

      Reply

Leave a Reply