Ad-Hoc Hypothesizing and Data Mining

An ad-hoc hypothesis is

a hypothesis added to a theory in order to save it from being falsified….

Scientists are often skeptical of theories that rely on frequent, unsupported adjustments to sustain them. This is because, if a theorist so chooses, there is no limit to the number of ad hoc hypotheses that they could add. Thus the theory becomes more and more complex, but is never falsified. This is often at a cost to the theory’s predictive power, however. Ad hoc hypotheses are often characteristic of pseudoscientific subjects.

An ad-hoc hypothesis can also be formed from an existing hypothesis (a proposition that hasn’t yet risen to the level of a theory) when the existing hypothesis has been falsified or is in danger of falsification. The (intellectually dishonest) proponents of the existing hypothesis seek to protect it from falsification by putting the burden of proof on the doubters rather than where it belongs, namely, on the proponents.

Data mining is “the process of discovering patterns in large data sets”. It isn’t hard to imagine the abuses that are endemic to data mining; for example, running regressions on the data until the “correct” equation is found, and excluding or adjusting portions of the data because their use leads to “counterintuitive” results.

Ad-hoc hypothesizing and data mining are two sides of the same coin: intellectual dishonesty. The former is overt; the latter is covert. (At least, it is covert until someone gets hold of the data and the analysis, which is why many “scientists” and “scientific” journals have taken to hiding the data and obscuring the analysis.) Both methods are justified (wrongly) as being consistent with the scientific method. But the ad-hoc theorizer is just trying to rescue a falsified hypothesis, and the data miner is just trying to conceal information that would falsify his hypothesis.

From what I have seen, the proponents of the human activity>CO2>”global warming” hypothesis have been guilty of both kinds of quackery: ad-hoc hypothesizing and data mining (with a lot of data manipulation thrown in for good measure).