The decline of statistical science?

As part of his thrashing of Sam Harris recently, Jackson Lears made extensive reference to a statistical phenomenon known as the “decline effect,” in which seemingly solid and significant experimental results appear simply to melt away with replication and the passage of time.

While Lears’s obvious pleasure in reporting the effect comes from his equally obvious desire to discredit “objective” or “factual” science, this is also a serious concern to scientists, especially to those medical researchers and behavioural scientists whose work relies most heavily on statistical analysis. If results fade with time, how real were they in the first place?

While the decline effect has been around since the term was coined in the 1930’s, it returned to prominence with the publication of Jonah Lehrer’s “The Truth Wears Off” in The New Yorker last December. In that article, Lehrer reported on the persistence in the “decline effect” in the work of Jonathan Schooler and other prominent researchers. Lehrer clearly states what’s at stake:

The test of replicability, as it’s known, is the foundation of modern research. Replicability is how the community enforces itself. It’s a safeguard for the creep of subjectivity.

As Lears noted, Lehrer expresses the methodological concerns raised by the statistical fading of the decline effect:

For many scientists, the effect is especially troubling because of what it exposes about the scientific process. If replication is what separates the rigor of science from the squishiness of pseudoscience, where do we put all these rigorously validated findings that can no longer be proved?

The most frequently proposed explanation is the obvious one — regression to the mean. That is, early results appear significant because they sit near one extreme of normal distribution. As time passes and the experiment is repeated, the first results prove to be not typical but aberrational. What made the initial results stand out was not as much a real effect as it was their deviation from an as yet unknown and more modest mean.

Schooler isn’t so sure, since according to statistical principles declines as steep as those being reported everywhere should be exceedingly rare. If they’re not rare, if they’re instead quite common, can regression to the mean really explain the effect?

Others point to a form of publication bias, in which striking results are more likely to be published initially, followed by confirmatory replications as journals participate in a kind of “bandwagon” effect. Studies of publication patterns indicate that the vast majority of published studies show the effect the research looks for. As early as 1959, statistical analysis of published studies showed that 97% of them reported positive results. Part of the decline problem is this “confirmation bias” (a conceptual problem discussed here recently in a different context in both “Forget the science — I still don’t believe it” and “What makes us misread information?“).

Schooler believes that part of the problem is poor experimental design, flaws that are not caught in a publishing field so crowded that 1/3 of studies are not only not replicated but not even cited. His proposed solution is an open and comprehensive publication database, where both any publication bias and the existence of contradictory studies can be assessed.

Lehrer notes that while such a database might rein in publication bias, it can’t affect the disruptive impact of “sheer randomness” — the appearance of unexplainable “noise” in experimental results. Randomness — the influence of unselected, undetected, or unexplained variables — can lead to dramatic results, and these are more likely to be published than more pedestrian outcomes. The distortions caused by the combination of randomness and publication bias can be large and persistent.

As Lehrer notes:

Such anomalies demonstrate the slipperiness of empiricism. … The decline effect is troubling because it reminds us how difficult it is to prove anything. We like to pretend that our experiments define the truth for us. But that’s often not the case.

In his attack on Harris, Lears leaned heavily on the decline effect as an indicator that “there may be disturbing limitations to the scientific method, at least in the statistically based behavioral sciences.”

Surely it’s not news to most that there are limitations to human knowledge, including limits to our methodologies for discovering that knowledge in the first place. Even in the “hard” sciences, results can confirm, change, and even recreate conceptual models. We study what we can perceive, which we then explain in terms we can conceive. The physical world has no responsibility actually to correspond to either our perceptions or conceptions.

Rather than being dramatic evidence that relativism is right and rationalism is wrong, these truths are in one important way irrelevant. If we can never fully describe reality or completely explain its physical processes, that certainly doesn’t mean that the search for as much truth as we can get should be abandoned. On some levels, the relativist’s anti-empirical stance resembles the gloating of a child who cries gleefully that you’re not right, so he is!

In The Grand Design, Stephen Hawking described our scientific knowledge as a kind of well-informed “model-making.” (One attempt to explain the metaphysics of Hawking’s idea of a “model-dependent” science can be found here.) These models exist in our minds, and go as far as they can to describe the current state not of the world, but of our understanding of it. It’s not absolute truth, but it is truth as far as we can know it, and that is most often good enough. As many have pointed out, “science works.” If celestial bodies don’t really consist of what we think they do, or act really the way we think they do, our probes somehow find Neptune nonetheless. If molecules and DNA are not really the ways chemistry operates, aspirin somehow relieves our headaches and hybrid roses somehow come out the way we expect. Our “truth” is in some ways relative — what else could it be? — but our airplanes fly and our smart phones ring.

That our understanding is limited is certainly humbling, but that we have the tools to seek understanding at all is just as certainly ennobling.


2 thoughts on “The decline of statistical science?

  1. Is there a hint in the decline problem that we function better in advanced engineering than in sociology? If the problem is more prominent in some fields that may reflect strengths and weaknesses in aspects of our makeup. That would be interesting because one result could be that we are using our weakest tools to understand our worst difficulties, ourselves.

  2. Nicely put in the last sentence!

    The decline problem is more prominent in fields that make statistical rather than mechanical measurements. One reason that bridges stay up is that the lengths of their metal pieces do not vary widely from one day to the next. This suggests a superior measurement methodology, at the very least.

Comments are closed.