The problem with science is that so much of it simply isn’t. Last summer, the Open Science Collaboration announced that it had tried to replicate one hundred published psychology experiments sampled from three of the most prestigious journals in the field. Scientific claims rest on the idea that experiments repeated under nearly identical conditions ought to yield approximately the same results, but until very recently, very few had bothered to check in a systematic way whether this was actually the case. The OSC was the biggest attempt yet to check a field’s results, and the most shocking. In many cases, they had used original experimental materials, and sometimes even performed the experiments under the guidance of the original researchers. Of the studies that had originally reported positive results, an astonishing 65 percent failed to show statistical significance on replication, and many of the remainder showed greatly reduced effect sizes.
Their findings made the news, and quickly became a club with which to bash the social sciences. But the problem isn’t just with psychology. There’s an unspoken rule in the pharmaceutical industry that half of all academic biomedical research will ultimately prove false, and in 2011 a group of researchers at Bayer decided to test it. Looking at sixty-seven recent drug discovery projects based on preclinical cancer biology research, they found that in more than 75 percent of cases the published data did not match up with their in-house attempts to replicate. These were not studies published in fly-by-night oncology journals, but blockbuster research featured in Science, Nature, Cell, and the like. The Bayer researchers were drowning in bad studies, and it was to this, in part, that they attributed the mysteriously declining yields of drug pipelines. Perhaps so many of these new drugs fail to have an effect because the basic research on which their development was based isn’t valid.
When a study fails to replicate, there are two possible interpretations. The first is that, unbeknownst to the investigators, there was a real difference in experimental setup between the original investigation and the failed replication. These are colloquially referred to as “wallpaper effects,” the joke being that the experiment was affected by the color of the wallpaper in the room. This is the happiest possible explanation for failure to reproduce: It means that both experiments have revealed facts about the universe, and we now have the opportunity to learn what the difference was between them and to incorporate a new and subtler distinction into our theories.
The other interpretation is that the original finding was false. Unfortunately, an ingenious statistical argument shows that this second interpretation is far more likely. First articulated by John Ioannidis, a professor at Stanford University’s School of Medicine, this argument proceeds by a simple application of Bayesian statistics. Suppose that there are a hundred and one stones in a certain field. One of them has a diamond inside it, and, luckily, you have a diamond-detecting device that advertises 99 percent accuracy. After an hour or so of moving the device around, examining each stone in turn, suddenly alarms flash and sirens wail while the device is pointed at a promising-looking stone. What is the probability that the stone contains a diamond?
Most would say that if the device advertises 99 percent accuracy, then there is a 99 percent chance that the device is correctly discerning a diamond, and a 1 percent chance that it has given a false positive reading. But consider: Of the one hundred and one stones in the field, only one is truly a diamond. Granted, our machine has a very high probability of correctly declaring it to be a diamond. But there are many more diamond-free stones, and while the machine only has a 1 percent chance of falsely declaring each of them to be a diamond, there are a hundred of them. So if we were to wave the detector over every stone in the field, it would, on average, sound twice—once for the real diamond, and once when a false reading was triggered by a stone. If we know only that the alarm has sounded, these two possibilities are roughly equally probable, giving us an approximately 50 percent chance that the stone really contains a diamond.
This is a simplified version of the argument that Ioannidis applies to the process of science itself. The stones in the field are the set of all possible testable hypotheses, the diamond is a hypothesized connection or effect that happens to be true, and the diamond-detecting device is the scientific method. A tremendous amount depends on the proportion of possible hypotheses which turn out to be true, and on the accuracy with which an experiment can discern truth from falsehood. Ioannidis shows that for a wide variety of scientific settings and fields, the values of these two parameters are not at all favorable. For instance, consider a team of molecular biologists investigating whether a mutation in one of the countless thousands of human genes is linked to an increased risk of Alzheimer’s. The probability of a randomly selected mutation in a randomly selected gene having precisely that effect is quite low, so just as with the stones in the field, a positive finding is more likely than not to be spurious—unless the experiment is unbelievably successful at sorting the wheat from the chaff. Indeed, Ioannidis finds that in many cases, approaching even 50 percent true positives requires unimaginable accuracy. Hence the eye-catching title of his paper: “Why Most Published Research Findings Are False.”
snip But, and there is no putting it nicely, deliberate fraud is far more widespread than the scientific establishment is generally willing to admit. One way we know that there’s a great deal of fraud occurring is that if you phrase your question the right way, scientists will confess to it. In a survey of two thousand research psychologists conducted in 2011, over half of those surveyed admitted outright to selectively reporting those experiments which gave the result they were after. Then the investigators asked respondents anonymously to estimate how many of their fellow scientists had engaged in fraudulent behavior, and promised them that the more accurate their guesses, the larger a contribution would be made to the charity of their choice. Through several rounds of anonymous guessing, refined using the number of scientists who would admit their own fraud and other indirect measurements, the investigators concluded that around 10 percent of research psychologists have engaged in outright falsification of data, and more than half have engaged in less brazen but still fraudulent behavior such as reporting that a result was statistically significant when it was not, or deciding between two different data analysis techniques after looking at the results of each and choosing the more favorable.
Many forms of statistical falsification are devilishly difficult to catch, or close enough to a genuine judgment call to provide plausible deniability. Data analysis is very much an art, and one that affords even its most scrupulous practitioners a wide degree of latitude. Which of these two statistical tests, both applicable to this situation, should be used? Should a subpopulation of the research sample with some common criterion be picked out and reanalyzed as if it were the totality? Which of the hundreds of coincident factors measured should be controlled for, and how? The same freedom that empowers a statistician to pick a true signal out of the noise also enables a dishonest scientist to manufacture nearly any result he or she wishes. Cajoling statistical significance where in reality there is none, a practice commonly known as “p-hacking,” is particularly easy to accomplish and difficult to detect on a case-by-case basis. And since the vast majority of studies still do not report their raw data along with their findings, there is often nothing to re-analyze and check even if there were volunteers with the time and inclination to do so.
One creative attempt to estimate how widespread such dishonesty really is involves comparisons between fields of varying “hardness.” The author, Daniele Fanelli, theorized that the farther from physics one gets, the more freedom creeps into one’s experimental methodology, and the fewer constraints there are on a scientist’s conscious and unconscious biases. If all scientists were constantly attempting to influence the results of their analyses, but had more opportunities to do so the “softer” the science, then we might expect that the social sciences have more papers that confirm a sought-after hypothesis than do the physical sciences, with medicine and biology somewhere in the middle. This is exactly what the study discovered: A paper in psychology or psychiatry is about five times as likely to report a positive result as one in astrophysics. This is not necessarily evidence that psychologists are all consciously or unconsciously manipulating their data—it could also be evidence of massive publication bias—but either way, the result is disturbing. big snip
At its best, science is a human enterprise with a superhuman aim: the discovery of regularities in the order of nature, and the discerning of the consequences of those regularities. We’ve seen example after example of how the human element of this enterprise harms and damages its progress, through incompetence, fraud, selfishness, prejudice, or the simple combination of an honest oversight or slip with plain bad luck. These failings need not hobble the scientific enterprise broadly conceived, but only if scientists are hyper-aware of and endlessly vigilant about the errors of their colleagues . . . and of themselves. When cultural trends attempt to render science a sort of religion-less clericalism, scientists are apt to forget that they are made of the same crooked timber as the rest of humanity and will necessarily imperil the work that they do. The greatest friends of the Cult of Science are the worst enemies of science’s actual practice.
******* “We cannot continue to allow ourselves to be influenced and molded by the political class and by the media. That is going to destroy us," he said, remarking that it's "kind of sad" that the press is the only business protected by the Constitution "because they were supposed to be the allies of the people." Dr. Ben Carson
And in other news about Climate Change research...
I have to admit I find the 2016 election season disgusting and no one emerges unsullied on either side. I have no idea for whom I will vote, nor even if I will.
True science has been co-opted by collectivists of all kinds. Instead of sticking to strict empirical procedures and logic, scientists' bow down to political correctness and mother gaia.
My favorite: If you see the claim xyz has not been proven to occur, odds on xyz was not tested.
Ask
what happens when you don't go along with the program.
Illegitimi non Carborundum
During times of universal deceit, telling the truth becomes a revolutionary act.