# Modeling and Science Revisited

I have written a lot about modeling and science. (See the long list of posts at “Modeling, Science, and ‘Reason’“.) I have said, more than once, that modeling isn’t science. What I should have said — though it was always implied — is that a model isn’t scientific if it is merely synthetic.

What do I mean by that? Here is an example by way of contrast. The famous equation E = mc2 is an synthetic model in that it is derived Einstein’s special theory of relativity (and other physical equations). But it is also an empirical model in that the relationship between mass (m) and energy (E) can also be confirmed by observation (given suitable instruments).

On the other hand, a complex model of the U.S. economy, a model of Earth’s “average” temperature (called misleadingly a climate model), or a model of combat (to give a few examples) is only synthetic.

Why do I say that a complex model (of the kind mentioned above) is only synthetic? Such a model consists of a large number of modules, each of which is mathematical formulation of some aspect of the larger phenomenon being modeled. Here’s a simple example: An encounter between a submarine and a surface ship, where the outcome is expressed as the probability that the submarine will sink the surface ship. The outcome could be expressed in this way:

S = D x F x H x K x C, where S = probability that submarine sinks surface ship, which is the product of:

D = probability that submarine detects surface ship within torpedo range

F = probability that, given detection, submarine is able to “fix” the target and fire a torpedo (or salvo of them)

H = probability that, given the firing of a torpedo (or salvo), the surface ship is hit

K = probability that, given a hit (or hits), the surface ship is sunk

C = probability that the submarine survives efforts to find and nullify it before it can detect a surface ship

This is a simple model by comparison with a model of the U.S. economy, a global climate model, or a model of a battle involving large numbers of various kinds of weapons. In fact, it is a simplistic model of combat. Each of the modules could be decomposed into many sub-modules; for example, the module for D could consist of sub-modules for sonar accuracy, sonar operator acuity, acoustic conditions in the area of operation, countermeasures deployed by the target, etc.. In any event, the module for D will consist of a mathematical relationship, based perhaps on some statistics collected from tests or exercises (i.e., not actual combat). The mathematical relationship will encompass many assumptions (mainly implicit ones) about sonar accuracy, sonar operator acuity, etc. The same goes for the other modules — C, in particular, which encompasses all of the effects of D, F, H, and K — at a minimum.

In sum, the number of unknowns completely swamps the number of knowns. There is nothing close to certainty about the model — or any model of its kind. (In the case of the model of S, for example, relatively small errors — say, 25 percent from the actual value of each variable — can yield an estimate of S that is three times greater than or one-third as much as the actual value of S.) The mathematical operations involved do nothing to resolve the uncertainty, they merely multiply it. But the mathematical operations nevertheless convey the appearance of certainty because they yield numbers. The numbers merely represent a lot of guesses, but they seem authoritative because numbers mesmerize most people — even scientists who should be always be skeptical of them.

Despite all of that, analysts have for many decades been producing — and decision-makers have been consuming — the results of such models as the basis for choosing defense systems. Models of similar complexity have been and are being used in making decisions about a broad range of policies affecting the economy, health care, transportation, education, the environment, the climate (i.e., “global warming”), and on into the night.

The unfounded confidence that modelers have in their models, because the models produce numbers, captivates most decision-makers, who simply want answers. And so, modelers will go to ridiculous extremes. One not untypical example that I recall from my days as an in-house critic of analysts’ work is the model that purported to compare competing weapons (on of which was still in development) based on their relative contribution to the outcome of a hypothetical battle. The specific measure was the movement of the forward edge of the battle area (FEBA) to within a yard.

Global climate models are like that warfare model: Their creators pretend that they can estimate the change in the average temperature of the globe to within less than a tenth of a degree. If you believe that, I have a bridge to sell you.

Related reading: Robert L. Bradley Jr. “Climate Models: Worse than Nothing?“, Watts Up With That?, June 23, 2021 (Yes. See below.)

Related pages:

Climate Change

Modeling, Science, and “Reason”

# “It’s Tough to Make Predictions, Especially about the Future”

A lot of people have said it, or something like it, though probably not Yogi Berra, to whom it’s often attributed.

Here’s another saying, which is also apt here: History does not repeat itself. The historians repeat one another.

I am accordingly amused by something called cliodynamics, which is discussed at length by Amanda Rees in “Are There Laws of History?” (Aeon, May 2020). The Wikipedia article about cliodynamics describes it as

a transdisciplinary area of research integrating cultural evolution, economic history/cliometrics, macrosociology, the mathematical modeling of historical processes during the longue durée [the long term], and the construction and analysis of historical databases. Cliodynamics treats history as science. Its practitioners develop theories that explain such dynamical processes as the rise and fall of empires, population booms and busts, spread and disappearance of religions. These theories are translated into mathematical models. Finally, model predictions are tested against data. Thus, building and analyzing massive databases of historical and archaeological information is one of the most important goals of cliodynamics.

I won’t dwell on the methods of cliodynamics, which involve making up numbers about various kinds of phenomena and then making up models which purport to describe, mathematically, the interactions among the phenomena. Underlying it all is the practitioner’s broad knowledge of historical events, which he converts (with the proper selection of numerical values and mathematical relationships) into such things as the Kondratiev wave, a post-hoc explanation of a series of arbitrarily denominated and subjectively measured economic eras.

In sum, if you seek patterns you will find them, but pattern-making (modeling) is not science. (There’s a lot more here.)

Here’s a simple demonstration of what’s going on with cliodynamics. Using the RANDBETWEEN function of Excel, I generated two columns of random numbers ranging in value from 0 to 1,000, with 1,000 numbers in each column. I designated the values in the left column as x variables and the numbers in the right column as y variables. I then arbitrarily chose the first 10 pairs of numbers and plotted them:

As it turns out, the relationship, even though it seems rather loose, has only a 21-percent chance of being due to chance. In the language of statistics, two-tailed p=0.21.

Of course, the relationship is due entirely to chance because it’s the relationship between two sets of random numbers. So much for statistical tests of “significance”.

Moreover, I could have found “more significant” relationships had I combed carefully through the 1,000 pairs of random number with my pattern-seeking brain.

But being an honest person with scientific integrity, I will show you the plot of all 1,000 pairs of random numbers:

I didn’t bother to find a correlation between the x and y values because there is none. And that’s the messy reality of human history. Yes, there have been many determined (i.e., sought-for) outcomes  — such as America’s independence from Great Britain and Hitler’s rise to power. But they are not predetermined outcomes. Their realization depended on the surrounding circumstances of the moment, which were myriad, non-quantifiable, and largely random in relation to the event under examination (the revolution, the putsch, etc.). The outcomes only seem inevitable and predictable in hindsight.

Cliodynamics is a variant of the anthropic principle, which is that he laws of physics appear to be fine-tuned to support human life because we humans happen to be here to observe the laws of physics. In the case of cliodynamics, the past seems to consist of inevitable events because we are here in the present looking back (rather hazily) at the events that occurred in the past.

Cliodynametricians, meet Nostradamus. He “foresaw” the future long before you did.

# More about Modeling and Science

This post is based on a paper that I wrote 38 years ago. The subject then was the bankruptcy of warfare models, which shows through in parts of this post. I am trying here to generalize the message to encompass all complex, synthetic models (defined below). For ease of future reference, I have created a page that includes links to this post and the many that are listed at the bottom.

### THE METAPHYSICS OF MODELING

Alfred North Whitehead said in Science and the Modern World (1925) that “the certainty of mathematics depends on its complete abstract generality” (p. 25). The attraction of mathematical models is their apparent certainty. But a model is only a representation of reality, and its fidelity to reality must be tested rather than assumed. And even if a model seems faithful to reality, its predictive power is another thing altogether. We are living in an era when models that purport to reflect reality are given credence despite their lack of predictive power. Ironically, those who dare point this out are called anti-scientific and science-deniers.

To begin at the beginning, I am concerned here with what I will call complex, synthetic models of abstract variables like GDP and “global” temperature. These are open-ended, mathematical models that estimate changes in the variable of interest by attempting to account for many contributing factors (parameters) and describing mathematically the interactions between those factors. I call such models complex because they have many “moving parts” — dozens or hundreds of sub-models — each of which is a model in itself. I call them synthetic because the estimated changes in the variables of interest depend greatly on the selection of sub-models, the depictions of their interactions, and the values assigned to the constituent parameters of the sub-models. That is to say, compared with a model of the human circulatory system or an internal combustion engine, a synthetic model of GDP or “global” temperature rests on incomplete knowledge of the components of the systems in question and the interactions among those components.

Modelers seem ignorant of or unwilling to acknowledge what should be a basic tenet of scientific inquiry: the complete dependence of logical systems (such as mathematical models) on the underlying axioms (assumptions) of those systems. Kurt Gödel addressed this dependence in his incompleteness theorems:

Gödel’s incompleteness theorems are two theorems of mathematical logic that demonstrate the inherent limitations of every formal axiomatic system capable of modelling basic arithmetic….

The first incompleteness theorem states that no consistent system of axioms whose theorems can be listed by an effective procedure (i.e., an algorithm) is capable of proving all truths about the arithmetic of natural numbers. For any such consistent formal system, there will always be statements about natural numbers that are true, but that are unprovable within the system. The second incompleteness theorem, an extension of the first, shows that the system cannot demonstrate its own consistency.

There is the view that Gödel’s theorems aren’t applicable in fields outside of mathematical logic. But any quest for certainty about the physical world necessarily uses mathematical logic (which includes statistics).

This doesn’t mean that the results of computational exercises are useless. It simply means that they are only as good as the assumptions that underlie them; for example, assumptions about relationships between parameters, assumptions about the values of the parameters, and assumptions as to whether the correct parameters have been chosen (and properly defined) in the first place.

There is nothing new in that, certainly nothing that requires Gödel’s theorems by way of proof. It has long been understood that a logical argument may be valid — the conclusion follows from the premises — but untrue if the premises (axioms) are untrue. But it bears repeating — and repeating.

### REAL MODELERS AT WORK

There have been mathematical models of one kind and another for centuries, but formal models weren’t used much outside the “hard sciences” until the development of microeconomic theory in the 19th century. Then came F.W. Lanchester, who during World War I devised what became known as Lanchester’s laws (or Lanchester’s equations), which are

mathematical formulae for calculating the relative strengths of military forces. The Lanchester equations are differential equations describing the time dependence of two [opponents’] strengths A and B as a function of time, with the function depending only on A and B.

Lanchester’s equations are nothing more than abstractions that must be given a semblance of reality by the user, who is required to make myriad assumptions (explicit and implicit) about the factors that determine the “strengths” of A and B, including but not limited to the relative killing power of various weapons, the effectiveness of opponents’ defenses, the importance of the speed and range of movement of various weapons, intelligence about the location of enemy forces, and commanders’ decisions about when, where, and how to engage the enemy. It should be evident that the predictive value of the equations, when thus fleshed out, is limited to small, discrete engagements, such as brief bouts of aerial combat between two (or a few) opposing aircraft. Alternatively — and in practice — the values are selected so as to yield results that mirror what actually happened (in the “replication” of a historical battle) or what “should” happen (given the preferences of the analyst’s client).

More complex (and realistic) mathematical modeling (also known as operations research) had seen limited use in industry and government before World War II. Faith in the explanatory power of mathematical models was burnished by their use during the war, where such models seemed to be of aid in the design of more effective tactics and weapons.

But the foundation of that success wasn’t the mathematical character of the models. Rather, it was the fact that the models were tested against reality. Philip M. Morse and George E. Kimball put it well in Methods of Operations Research (1946):

Operations research done separately from an administrator in charge of operations becomes an empty exercise. To be valuable it must be toughened by the repeated impact of hard operational facts and pressing day-by-day demands, and its scale of values must be repeatedly tested in the acid of use. Otherwise it may be philosophy, but it is hardly science. [Op cit., p. 10]

A mathematical model doesn’t represent scientific knowledge unless its predictions can be and have been tested. Even then, a valid model can represent only a narrow slice of reality. The expansion of a model beyond that narrow slice requires the addition of parameters whose interactions may not be well understood and whose values will be uncertain.

Morse and Kimball accordingly urged “hemibel thinking”:

Having obtained the constants of the operations under study … we compare the value of the constants obtained in actual operations with the optimum theoretical value, if this can be computed. If the actual value is within a hemibel ( … a factor of 3) of the theoretical value, then it is extremely unlikely that any improvement in the details of the operation will result in significant improvement. [When] there is a wide gap between the actual and theoretical results … a hint as to the possible means of improvement can usually be obtained by a crude sorting of the operational data to see whether changes in personnel, equipment, or tactics produce a significant change in the constants. [Op cit., p. 38]

Should we really attach little significance to differences of less than a hemibel? Consider a five-parameter model involving the conditional probabilities of detecting, shooting at, hitting, and killing an opponent — and surviving, in the first place, to do any of these things. Such a model can easily yield a cumulative error of a hemibel (or greater), given a twenty-five percent error in the value each parameter. (Mathematically, 1.255 = 3.05; alternatively, 0.755 = 0.24, or about one-fourth.)

### ANTI-SCIENTIFIC MODELING

What does this say about complex, synthetic models such as those of economic activity or “climate change”? Any such model rests on the modeler’s assumptions as to the parameters that should be included, their values (and the degree of uncertainty surrounding them), and the interactions among them. The interactions must be modeled based on further assumptions. And so assumptions and uncertainties — and errors — multiply apace.

But the prideful modeler (I have yet to meet a humble one) will claim validity if his model has been fine-tuned to replicate the past (e.g., changes in GDP, “global” temperature anomalies). But the model is useless unless it predicts the future consistently and with great accuracy, where “great” means accurately enough to validly represent the effects of public-policy choices (e.g., setting the federal funds rate, investing in CO2 abatement technology).

#### Macroeconomic Modeling: A Case Study

In macroeconomics, for example, there is Professor Ray Fair, who teaches macroeconomic theory, econometrics, and macroeconometric modeling at Yale University. He has been plying his trade at prestigious universities since 1968, first at Princeton, then at MIT, and since 1974 at Yale. Professor Fair has since 1983 been forecasting changes in real GDP — not decades ahead, just four quarters (one year) ahead. He has made 141 such forecasts, the earliest of which covers the four quarters ending with the second quarter of 1984, and the most recent of which covers the four quarters ending with the second quarter of 2019. The forecasts are based on a model that Professor Fair has revised many times over the years. The current model is here. His forecasting track record is here.) How has he done? Here’s how:

1. The median absolute error of his forecasts is 31 percent.

2. The mean absolute error of his forecasts is 69 percent.

3. His forecasts are rather systematically biased: too high when real, four-quarter GDP growth is less than 3 percent; too low when real, four-quarter GDP growth is greater than 3 percent.

4. His forecasts have grown generally worse — not better — with time. Recent forecasts are better, but still far from the mark.

Thus:

This and the next two graphs were derived from The Forecasting Record of the U.S. Model, Table 4: Predicted and Actual Values for Four-Quarter Real Growth, at Prof. Fair’s website. The vertical axis of this graph is truncated for ease of viewing, as noted in the caption.

You might think that Fair’s record reflects the persistent use of a model that’s too simple to capture the dynamics of a multi-trillion-dollar economy. But you’d be wrong. The model changes quarterly. This page lists changes only since late 2009; there are links to archives of earlier versions, but those are password-protected.

As for simplicity, the model is anything but simple. For example, go to Appendix A: The U.S. Model: July 29, 2016, and you’ll find a six-sector model comprising 188 equations and hundreds of variables.

And what does that get you? A weak predictive model:

It fails a crucial test, in that it doesn’t reflect the downward trend in economic growth:

#### General Circulation Models (GCMs) and “Climate Change”

As for climate models, Dr. Tim Ball writes about a

fascinating 2006 paper by Essex, McKitrick, and Andresen asked, Does a Global Temperature Exist.” Their introduction sets the scene,

It arises from projecting a sampling of the fluctuating temperature field of the Earth onto a single number (e.g. [3], [4]) at discrete monthly or annual intervals. Proponents claim that this statistic represents a measurement of the annual global temperature to an accuracy of ±0.05 ◦C (see [5]). Moreover, they presume that small changes in it, up or down, have direct and unequivocal physical meaning.

The word “sampling” is important because, statistically, a sample has to be representative of a population. There is no way that a sampling of the “fluctuating temperature field of the Earth,” is possible….

… The reality is we have fewer stations now than in 1960 as NASA GISS explain (Figure 1a, # of stations and 1b, Coverage)….

Not only that, but the accuracy is terrible. US stations are supposedly the best in the world but as Anthony Watt’s project showed, only 7.9% of them achieve better than a 1°C accuracy. Look at the quote above. It says the temperature statistic is accurate to ±0.05°C. In fact, for most of the 406 years when instrumental measures of temperature were available (1612), they were incapable of yielding measurements better than 0.5°C.

The coverage numbers (1b) are meaningless because there are only weather stations for about 15% of the Earth’s surface. There are virtually no stations for

• 70% of the world that is oceans,
• 20% of the land surface that are mountains,
• 20% of the land surface that is forest,
• 19% of the land surface that is desert and,
• 19% of the land surface that is grassland.

The result is we have inadequate measures in terms of the equipment and how it fits the historic record, combined with a wholly inadequate spatial sample. The inadequacies are acknowledged by the creation of the claim by NASA GISS and all promoters of anthropogenic global warming (AGW) that a station is representative of a 1200 km radius region.

I plotted an illustrative example on a map of North America (Figure 2).

Figure 2

Notice that the claim for the station in eastern North America includes the subarctic climate of southern James Bay and the subtropical climate of the Carolinas.

However, it doesn’t end there because this is only a meaningless temperature measured in a Stevenson Screen between 1.25 m and 2 m above the surface….

The Stevenson Screen data [are] inadequate for any meaningful analysis or as the basis of a mathematical computer model in this one sliver of the atmosphere, but there [are] even less [data] as you go down or up. The models create a surface grid that becomes cubes as you move up. The number of squares in the grid varies with the naïve belief that a smaller grid improves the models. It would if there [were] adequate data, but that doesn’t exist. The number of cubes is determined by the number of layers used. Again, theoretically, more layers would yield better results, but it doesn’t matter because there are virtually no spatial or temporal data….

So far, I have talked about the inadequacy of the temperature measurements in light of the two- and three-dimensional complexities of the atmosphere and oceans. However, one source identifies the most important variables for the models used as the basis for energy and environmental policies across the world.

Sophisticated models, like Coupled General Circulation Models, combine many processes to portray the entire climate system. The most important components of these models are the atmosphere (including air temperature, moisture and precipitation levels, and storms); the oceans (measurements such as ocean temperature, salinity levels, and circulation patterns); terrestrial processes (including carbon absorption, forests, and storage of soil moisture); and the cryosphere (both sea ice and glaciers on land). A successful climate model must not only accurately represent all of these individual components, but also show how they interact with each other.

The last line is critical and yet impossible. The temperature data [are] the best we have, and yet [they are] completely inadequate in every way. Pick any of the variables listed, and you find there [are] virtually no data. The answer to the question, “what are we really measuring,” is virtually nothing, and what we measure is not relevant to anything related to the dynamics of the atmosphere or oceans.

I am especially struck by Dr. Ball’s observation that the surface-temperature record applies to about 15 percent of Earth’s surface. Not only that, but as suggested by Dr. Ball’s figure 2, that 15 percent is poorly sampled.

And yet the proponents of CO2-forced “climate change” rely heavily on that flawed temperature record because it is the only one that goes back far enough to “prove” the modelers’ underlying assumption, namely, that it is anthropogenic CO2 emissions which have caused the rise in “global” temperatures. See, for example, Dr. Roy Spencer’s “The Faith Component of Global Warming Predictions“, wherein Dr. Spencer points out that the modelers

have only demonstrated what they assumed from the outset. It is circular reasoning. A tautology. Evidence that nature also causes global energy imbalances is abundant: e.g., the strong warming before the 1940s; the Little Ice Age; the Medieval Warm Period. This is why many climate scientists try to purge these events from the historical record, to make it look like only humans can cause climate change.

In fact the models deal in temperature anomalies, that is, departures from a 30-year average. The anomalies — which range from -1.41 to +1.68 degrees C — are so small relative to the errors and uncertainties inherent in the compilation, estimation, and model-driven adjustments of the temperature record, that they must fail Morse and Kimball’s hemibel test. (The model-driven adjustments are, as Dr. Spencer suggests, downward adjustments of historical temperature data for consistency with the models which “prove” that CO2 emissions induce a certain rate of warming. More circular reasoning.)

They also fail, and fail miserably, the acid test of predicting future temperatures with accuracy. This failure has been pointed out many times. Dr. John Christy, for example, has testified to that effect before Congress (e.g., this briefing). Defenders of the “climate change” faith have attacked Dr. Christy’s methods and finding, but the rebuttals to one such attack merely underscore the validity of Dr. Christy’s work.

This is from “Manufacturing Alarm: Dana Nuccitelli’s Critique of John Christy’s Climate Science Testimony“, by Mario Lewis Jr.:

Christy’s testimony argues that the state-of-the-art models informing agency analyses of climate change “have a strong tendency to over-warm the atmosphere relative to actual observations.” To illustrate the point, Christy provides a chart comparing 102 climate model simulations of temperature change in the global mid-troposphere to observations from two independent satellite datasets and four independent weather balloon data sets….

To sum up, Christy presents an honest, apples-to-apples comparison of modeled and observed temperatures in the bulk atmosphere (0-50,000 feet). Climate models significantly overshoot observations in the lower troposphere, not just in the layer above it. Christy is not “manufacturing doubt” about the accuracy of climate models. Rather, Nuccitelli is manufacturing alarm by denying the models’ growing inconsistency with the real world.

And this is from Christopher Monckton of Brenchley’s “The Guardian’s Dana Nuccitelli Uses Pseudo-Science to Libel Dr. John Christy“:

One Dana Nuccitelli, a co-author of the 2013 paper that found 0.5% consensus to the effect that recent global warming was mostly manmade and reported it as 97.1%, leading Queensland police to inform a Brisbane citizen who had complained to them that a “deception” had been perpetrated, has published an article in the British newspaper The Guardian making numerous inaccurate assertions calculated to libel Dr John Christy of the University of Alabama in connection with his now-famous chart showing the ever-growing discrepancy between models’ wild predictions and the slow, harmless, unexciting rise in global temperature since 1979….

… In fact, as Mr Nuccitelli knows full well (for his own data file of 11,944 climate science papers shows it), the “consensus” is only 0.5%. But that is by the bye: the main point here is that it is the trends on the predictions compared with those on the observational data that matter, and, on all 73 models, the trends are higher than those on the real-world data….

[T]he temperature profile [of the oceans] at different strata shows little or no warming at the surface and an increasing warming rate with depth, raising the possibility that, contrary to Mr Nuccitelli’s theory that the atmosphere is warming the ocean, the ocean is instead being warmed from below, perhaps by some increase in the largely unmonitored magmatic intrusions into the abyssal strata from the 3.5 million subsea volcanoes and vents most of which Man has never visited or studied, particularly at the mid-ocean tectonic divergence boundaries, notably the highly active boundary in the eastern equatorial Pacific. [That possibility is among many which aren’t considered by GCMs.]

How good a job are the models really doing in their attempts to predict global temperatures? Here are a few more examples:

Mr Nuccitelli’s scientifically illiterate attempts to challenge Dr Christy’s graph are accordingly misconceived, inaccurate and misleading.

I have omitted the bulk of both pieces because this post is already longer than needed to make my point. I urge you to follow the links and read the pieces for yourself.

Finally, I must quote a brief but telling passage from a post by Pat Frank, “Why Roy Spencer’s Criticism is Wrong“:

[H]ere’s NASA on clouds and resolution: “A doubling in atmospheric carbon dioxide (CO2), predicted to take place in the next 50 to 100 years, is expected to change the radiation balance at the surface by only about 2 percent. … If a 2 percent change is that important, then a climate model to be useful must be accurate to something like 0.25%. Thus today’s models must be improved by about a hundredfold in accuracy, a very challenging task.

Frank’s very long post substantiates what I say here about the errors and uncertainties in GCMs — and the multiplicative effect of those errors and uncertainties. I urge you to read it. It is telling that “climate skeptics” like Spencer and Frank will argue openly, whereas “true believers” work clandestinely to present a united front to the public. It’s science vs. anti-science.

### CONCLUSION

In the end, complex, synthetic models can be defended only by resorting to the claim that they are “scientific”, which is a farcical claim when models consistently fail to yield accurate predictions. It is a claim based on a need to believe in the models — or, rather, what they purport to prove. It is, in other words, false certainty, which is the enemy of truth.

Newton said it best:

I do not know what I may appear to the world, but to myself I seem to have been only like a boy playing on the seashore, and diverting myself in now and then finding a smoother pebble or a prettier shell than ordinary, whilst the great ocean of truth lay all undiscovered before me.

Just as Newton’s self-doubt was not an attack on science, neither have I essayed an attack on science or modeling — only on the abuses of both that are too often found in the company of complex, synthetic models. It is too easily forgotten that the practice of science (of which modeling is a tool) is in fact an art, not a science. With this art we may portray vividly the few pebbles and shells of truth that we have grasped; we can but vaguely sketch the ocean of truth whose horizons are beyond our reach.

Related pages and posts:

# Modeling Is Not Science: Another Demonstration

The title of this post is an allusion to an earlier one: “Modeling Is Not Science“. This post addresses a model that is the antithesis of science. Tt seems to have been extracted from the ether. It doesn’t prove what its authors claim for it. It proves nothing, in fact, but the ability of some people to dazzle other people with mathematics.

In this case, a writer for MIT Technology Review waxes enthusiastic about

the work of Alessandro Pluchino at the University of Catania in Italy and a couple of colleagues. These guys [sic] have created a computer model of human talent and the way people use it to exploit opportunities in life. The model allows the team to study the role of chance in this process.

The results are something of an eye-opener. Their simulations accurately reproduce the wealth distribution in the real world. But the wealthiest individuals are not the most talented (although they must have a certain level of talent). They are the luckiest. And this has significant implications for the way societies can optimize the returns they get for investments in everything from business to science.

Pluchino and co’s [sic] model is straightforward. It consists of N people, each with a certain level of talent (skill, intelligence, ability, and so on). This talent is distributed normally around some average level, with some standard deviation. So some people are more talented than average and some are less so, but nobody is orders of magnitude more talented than anybody else….

The computer model charts each individual through a working life of 40 years. During this time, the individuals experience lucky events that they can exploit to increase their wealth if they are talented enough.

However, they also experience unlucky events that reduce their wealth. These events occur at random.

At the end of the 40 years, Pluchino and co rank the individuals by wealth and study the characteristics of the most successful. They also calculate the wealth distribution. They then repeat the simulation many times to check the robustness of the outcome.

When the team rank individuals by wealth, the distribution is exactly like that seen in real-world societies. “The ‘80-20’ rule is respected, since 80 percent of the population owns only 20 percent of the total capital, while the remaining 20 percent owns 80 percent of the same capital,” report Pluchino and co.

That may not be surprising or unfair if the wealthiest 20 percent turn out to be the most talented. But that isn’t what happens. The wealthiest individuals are typically not the most talented or anywhere near it. “The maximum success never coincides with the maximum talent, and vice-versa,” say the researchers.

So if not talent, what other factor causes this skewed wealth distribution? “Our simulation clearly shows that such a factor is just pure luck,” say Pluchino and co.

The team shows this by ranking individuals according to the number of lucky and unlucky events they experience throughout their 40-year careers. “It is evident that the most successful individuals are also the luckiest ones,” they say. “And the less successful individuals are also the unluckiest ones.”

The writer, who is dazzled by pseudo-science, gives away his Obamanomic bias (“you didn’t build that“) by invoking fairness. Luck and fairness have nothing to do with each other. Luck is luck, and it doesn’t make the beneficiary any less deserving of the talent, or legally obtained income or wealth, that comes his way.

In any event, the model in question is junk. To call it junk science would be to imply that it’s just bad science. But it isn’t science; it’s a model pulled out of thin air. The modelers admit this in the article cited by the Technology Review writer, “Talent vs. Luck, the Role of Randomness in Success and Failure“:

In what follows we propose an agent-based model, called “Talent vs Luck” (TvL) model, which builds on a small set of very simple assumptions, aiming to describe the evolution of careers of a group of people influenced by lucky or unlucky random events.

We consider N individuals, with talent Ti (intelligence, skills, ability, etc.) normally distributed in the interval [0; 1] around a given mean mT with a standard deviation T , randomly placed in xed positions within a square world (see Figure 1) with periodic boundary conditions (i.e. with a toroidal topology) and surrounded by a certain number NE of “moving” events (indicated by dots), someone lucky, someone else unlucky (neutral events are not considered in the model, since they have not relevant effects on the individual life). In Figure 1 we report these events as colored points: lucky ones, in green and with relative percentage pL, and unlucky ones, in red and with percentage (100􀀀pL). The total number of event-points NE are uniformly distributed, but of course such a distribution would be perfectly uniform only for NE ! 1. In our simulations, typically will be NE N=2: thus, at the beginning of each simulation, there will be a greater random concentration of lucky or unlucky event-points in different areas of the world, while other areas will be more neutral. The further random movement of the points inside the square lattice, the world, does not change this fundamental features of the model, which exposes dierent individuals to dierent amount of lucky or unlucky events during their life, regardless of their own talent.

In other words, this is a simplistic, completely abstract model set in a simplistic, completely abstract world, using only the authors’ assumptions about the values of a small number of abstract variables and the effects of their interactions. Those variables are “talent” and two kinds of event: “lucky” and “unlucky”.

What could be further from science — actual knowledge — than that? The authors effectively admit the model’s complete lack of realism when they describe “talent”:

[B]y the term “talent” we broadly mean intelligence, skill, smartness, stubbornness, determination, hard work, risk taking and so on.

Think of all of the ways that those various — and critical — attributes vary from person to person. “Talent”, in other words, subsumes an array of mostly unmeasured and unmeasurable attributes, without distinguishing among them or attempting to weight them. The authors might as well have called the variable “sex appeal” or “body odor”. For that matter, given the complete abstractness of the model, they might as well have called its three variables “body mass index”, “elevation”, and “race”.

It’s obvious that the model doesn’t account for the actual means by which wealth is acquired. In the model, wealth is just the mathematical result of simulated interactions among an arbitrarily named set of variables. It’s not even a multiple regression model based on statistics. (Although no set of statistics could capture the authors’ broad conception of “talent”.)

The modelers seem surprised that wealth isn’t normally distributed. But that wouldn’t be a surprise if they were to consider that wealth represents a compounding effect, which naturally favors those with higher incomes over those with lower incomes. But they don’t even try to model income.

So when wealth (as modeled) doesn’t align with “talent”, the discrepancy — according to the modelers — must be assigned to “luck”. But a model that lacks any nuance in its definition of variables, any empirical estimates of their values, and any explanation of the relationship between income and wealth cannot possibly tell us anything about the role of luck in the determination of wealth.

At any rate, it is meaningless to say that the model is valid because its results mimic the distribution of wealth in the real world. The model itself is meaningless, so any resemblance between its results and the real world is coincidental (“lucky”) or, more likely, contrived to resemble something like the distribution of wealth in the real world. On that score, the authors are suitably vague about the actual distribution, pointing instead to various estimates.

(See also “Modeling, Science, and Physics Envy” and “Modeling Revisited“.)

# Macroeconomic Modeling Revisited

Modeling is not science. Take Professor Ray Fair, for example. He teaches macroeconomic theory, econometrics, and macroeconometric models at Yale University. He has been plying his trade since 1968, first at Princeton, then at M.I.T., and (since 1974) at Yale. Those are big-name schools, so I assume that Prof. Fair is a big name in his field.

Well, since 1983 Professor Fair has been forecasting changes in real GDP four quarters ahead. He has made dozens of forecasts based on a model that he has tweaked many times over the years. The current model can be found here. His forecasting track record is here.

How has he done? Here’s how:

1. The mean absolute error of his forecasts is 70 percent; that is, on average his predictions vary by 70 percent from actual rates of growth.

2. The median absolute error of his forecasts is 33 percent.

3. His forecasts are systematically biased: too high when real, four-quarter GDP growth is less than 3 percent; too low when real, four-quarter GDP growth is greater than 3 percent. (See figure 1.)

4. His forecasts have grown generally worse — not better — with time. (See figure 2.)

5. In sum, the overall predictive value of the model is weak. (See figures 3 and 4.)

FIGURE 1

Figures 1-4 are derived from The Forecasting Record of the U.S. Model, Table 4: Predicted and Actual Values for Four-Quarter Real Growth, at Fair’s website.

FIGURE 2

FIGURE 3

FIGURE 4

Given the foregoing, you might think that Fair’s record reflects the persistent use of a model that’s too simple to capture the dynamics of a multi-trillion-dollar economy. But you’d be wrong. The model changes quarterly. This page lists changes only since late 2009; there are links to archives of earlier versions, but those are password-protected.

As for simplicity, the model is anything but simple. For example, go to Appendix A: The U.S. Model: July 29, 2016, and you’ll find a six-sector model comprising 188 equations and hundreds of variables.

Could I do better? Well, I’ve done better, with the simple model that I devised to estimate the Rahn Curve. It’s described in “The Rahn Curve in Action“, which is part III of “Economic Growth Since World War II“.

The theory behind the Rahn Curve is simple — but not simplistic. A relatively small government with powers limited mainly to the protection of citizens and their property is worth more than its cost to taxpayers because it fosters productive economic activity (not to mention liberty). But additional government spending hinders productive activity in many ways, which are discussed in Daniel Mitchell’s paper, “The Impact of Government Spending on Economic Growth.” (I would add to Mitchell’s list the burden of regulatory activity, which grows even when government does not.)

What does the Rahn Curve look like? Mitchell estimates this relationship between government spending and economic growth:

The curve is dashed rather than solid at low values of government spending because it has been decades since the governments of developed nations have spent as little as 20 percent of GDP. But as Mitchell and others note, the combined spending of governments in the U.S. was 10 percent (and less) until the eve of the Great Depression. And it was in the low-spending, laissez-faire era from the end of the Civil War to the early 1900s that the U.S. enjoyed its highest sustained rate of economic growth.

Elsewhere, I estimated the Rahn curve that spans most of the history of the United States. I came up with this relationship (terms modified for simplicity (with a slight cosmetic change in terminology):

Yg = 0.054 -0.066F

To be precise, it’s the annualized rate of growth over the most recent 10-year span (Yg), as a function of F (fraction of GDP spent by governments at all levels) in the preceding 10 years. The relationship is lagged because it takes time for government spending (and related regulatory activities) to wreak their counterproductive effects on economic activity. Also, I include transfer payments (e.g., Social Security) in my measure of F because there’s no essential difference between transfer payments and many other kinds of government spending. They all take money from those who produce and give it to those who don’t (e.g., government employees engaged in paper-shuffling, unproductive social-engineering schemes, and counterproductive regulatory activities).

When F is greater than the amount needed for national defense and domestic justice — no more than 0.1 (10 percent of GDP) — it discourages productive, growth-producing, job-creating activity. And because government spending weighs most heavily on taxpayers with above-average incomes, higher rates of F also discourage saving, which finances growth-producing investments in new businesses, business expansion, and capital (i.e., new and more productive business assets, both physical and intellectual).

I’ve taken a closer look at the post-World War II numbers because of the marked decline in the rate of growth since the end of the war (Figure 2).

Here’s the revised result, which accounts for more variables:

Yg = 0.0275 -0.340F + 0.0773A – 0.000336R – 0.131P

Where,

Yg = real rate of GDP growth in a 10-year span (annualized)

F = fraction of GDP spent by governments at all levels during the preceding 10 years

A = the constant-dollar value of private nonresidential assets (business assets) as a fraction of GDP, averaged over the preceding 10 years

R = average number of Federal Register pages, in thousands, for the preceding 10-year period

P = growth in the CPI-U during the preceding 10 years (annualized).

The r-squared of the equation is 0.74 and the F-value is 1.60E-13. The p-values of the intercept and coefficients are 0.093, 3.98E-08, 4.83E-09, 6.05E-07, and 0.0071. The standard error of the estimate is 0.0049, that is, about half a percentage point.

Here’s how the equation stacks up against actual 10-year rates of real GDP growth:

What does the new equation portend for the next 10 years? Based on the values of F, A, R, and P for 2008-2017, the real rate of growth for the next 10 years will be about 2.0 percent.

There are signs of hope, however. The year-over-year rate of real growth in the four most recent quarters (2017Q4 – 2018Q3) were 2.4, 2.6, 2.9, and 3.0 percent, as against the dismal rates of 1.4, 1.2, 1.5, and 1.8 percent for four quarters of 2016 — Obama’s final year in office. A possible explanation is the election of Donald Trump and the well-founded belief that his tax and regulatory policies would be more business-friendly.

I took the data set that I used to estimate the new equation and made a series of out-of-sample estimates of growth over the next 10 years. I began with the data for 1946-1964 to estimate the growth for 1965-1974. I continued by taking the data for 1946-1965 to estimate the growth for 1966-1975, and so on, until I had estimated the growth for every 10-year period from 1965-1974 through 2008-2017. In other words, like Professor Fair, I updated my model to reflect new data, and I estimated the rate of economic growth in the future. How did I do? Here’s a first look:

FIGURE 5

For ease of comparison, I made the scale of the vertical axis of figure 5 the same as the scale of the vertical axis of figure 2. It’s obvious that my estimate of the Rahn Curve does a much better job of predicting the real rate of GDP growth than does Fair’s model.

Not only that, but my model is less biased:

FIGURE 6

The systematic bias reflected in figure 6 is far weaker than the systematic bias in Fair’s estimates (figure 1).

Finally, unlike Fair’s model (figure 4), my model captures the downward trend in the rate of real growth:

FIGURE 7

The moral of the story: It’s futile to build complex models of the economy. They can’t begin to capture the economy’s real complexity, and they’re likely to obscure the important variables — the ones that will determine the future course of economic growth.

A final note: Elsewhere (e.g., here) I’ve disparaged economic aggregates, of which GDP is the apotheosis. And yet I’ve built this post around estimates of GDP. Am I contradicting myself? Not really. There’s a rough consistency in measures of GDP across time, and I’m not pretending that GDP represents anything but an estimate of the monetary value of those products and services to which monetary values can be ascribed.

As a practical matter, then, if you want to know the likely future direction and value of GDP, stick with simple estimation techniques like the one I’ve demonstrated here. Don’t get bogged down in the inconclusive minutiae of a model like Professor Fair’s.

/

# The Pretence of Knowledge

Updated, with links to a related article and additional posts, and republished.

Friedrich Hayek, in his Nobel Prize lecture of 1974, “The Pretence of Knowledge,” observes that

the great and rapid advance of the physical sciences took place in fields where it proved that explanation and prediction could be based on laws which accounted for the observed phenomena as functions of comparatively few variables.

Hayek’s particular target was the scientism then (and still) rampant in economics. In particular, there was (and is) a quasi-religious belief in the power of central planning (e.g., regulation, “stimulus” spending, control of the money supply) to attain outcomes superior to those that free markets would yield.

But, as Hayek says in closing,

There is danger in the exuberant feeling of ever growing power which the advance of the physical sciences has engendered and which tempts man to try, “dizzy with success” … to subject not only our natural but also our human environment to the control of a human will. The recognition of the insuperable limits to his knowledge ought indeed to teach the student of society a lesson of humility which should guard him against becoming an accomplice in men’s fatal striving to control society – a striving which makes him not only a tyrant over his fellows, but which may well make him the destroyer of a civilization which no brain has designed but which has grown from the free efforts of millions of individuals.

I was reminded of Hayek’s observations by John Cochrane’s post, “Groundhog Day” (The Grumpy Economist, May 11, 2014), wherein Cochrane presents this graph:

Every serious forecast looked like this — Fed, yes, but also CBO, private forecasters, and the term structure of forward rates. Everyone has expected bounce-back growth and rise in interest rates to start next year, for the last 6 years. And every year it has not happened. Welcome to the slump. Every year, Sonny and Cher wake us up, and it’s still cold, and it’s still grey. But we keep expecting spring tomorrow.

Whether the corrosive effects of government microeconomic and regulatory policy, or a failure of those (unprintable adjectives) Republicans to just vote enough wasted-spending Keynesian stimulus, or a failure of the Fed to buy another \$3 trillion of bonds, the question of the day really should be why we have this slump — which, let us be honest, no serious forecaster expected.

(I add the “serious forecaster” qualification on purpose. I don’t want to hear randomly mined quotes from bloviating prognosticators who got lucky once, and don’t offer a methodology or a track record for their forecasts.)

The Fed’s forecasting models are nothing more than sophisticated charlatanism — a term that Hayek applied to pseudo-scientific endeavors like macroeconomic modeling. Nor is charlatanism confined to economics and the other social “sciences.” It’s rampant in climate “science,” as Roy Spencer has shown. Consider, for example, this graph from Spencers’s post, “95% of Climate Models Agree: The Observations Must Be Wrong” (Roy Spencer, Ph.D., February 7, 2014):

Spencer has a lot more to say about the pseudo-scientific aspects of climate “science.” This example is from “Top Ten Good Skeptical Arguments” (May 1, 2014):

1) No Recent Warming. If global warming science is so “settled”, why did global warming stop over 15 years ago (in most temperature datasets), contrary to all “consensus” predictions?

2) Natural or Manmade? If we don’t know how much of the warming in the longer term (say last 50 years) is natural, then how can we know how much is manmade?

3) IPCC Politics and Beliefs. Why does it take a political body (the IPCC) to tell us what scientists “believe”? And when did scientists’ “beliefs” translate into proof? And when was scientific truth determined by a vote…especially when those allowed to vote are from the Global Warming Believers Party?

4) Climate Models Can’t Even Hindcast How did climate modelers, who already knew the answer, still fail to explain the lack of a significant temperature rise over the last 30+ years? In other words, how to you botch a hindcast?

5) …But We Should Believe Model Forecasts? Why should we believe model predictions of the future, when they can’t even explain the past?

6) Modelers Lie About Their “Physics”. Why do modelers insist their models are based upon established physics, but then hide the fact that the strong warming their models produce is actually based upon very uncertain “fudge factor” tuning?

7) Is Warming Even Bad? Who decided that a small amount of warming is necessarily a bad thing?

8) Is CO2 Bad? How did carbon dioxide, necessary for life on Earth and only 4 parts in 10,000 of our atmosphere, get rebranded as some sort of dangerous gas?

9) Do We Look that Stupid? How do scientists expect to be taken seriously when their “theory” is supported by both floods AND droughts? Too much snow AND too little snow?

10) Selective Pseudo-Explanations. How can scientists claim that the Medieval Warm Period (which lasted hundreds of years), was just a regional fluke…yet claim the single-summer (2003) heat wave in Europe had global significance?

11) (Spinal Tap bonus) Just How Warm is it, Really? Why is it that every subsequent modification/adjustment to the global thermometer data leads to even more warming? What are the chances of that? Either a warmer-still present, or cooling down the past, both of which produce a greater warming trend over time. And none of the adjustments take out a gradual urban heat island (UHI) warming around thermometer sites, which likely exists at virtually all of them — because no one yet knows a good way to do that.

It is no coincidence that leftists believe in the efficacy of central planning and cling tenaciously to a belief in catastrophic anthropogenic global warming. The latter justifies the former, of course. And both beliefs exemplify the left’s penchant for magical thinking, about which I’ve written several times (e.g., here, here, here, here, and here).

Magical thinking is the pretense of knowledge in the nth degree. It conjures “knowledge” from ignorance and hope. And no one better exemplifies magical thinking than our hopey-changey president.

Related reading: Walter E. Williams, “The Experts Have Been Wrong About a Lot of Things, Here’s a Sample“, The Daily Signal, July 25, 2018

# Modeling, Science, and Physics Envy

Climate Skeptic notes the similarity of climate models and macroeconometric models:

The climate modeling approach is so similar to that used by the CEA to score the stimulus that there is even a climate equivalent to the multiplier found in macro-economic models. In climate models, small amounts of warming from man-made CO2 are multiplied many-fold to catastrophic levels by hypothetical positive feedbacks, in the same way that the first-order effects of government spending are multiplied in Keynesian economic models. In both cases, while these multipliers are the single most important drivers of the models’ results, they also tend to be the most controversial assumptions. In an odd parallel, you can find both stimulus and climate debates arguing whether their multiplier is above or below one.

Here is my take, from “Modeling Is Not Science“:

The principal lesson to be drawn from the history of massive government programs is that those who were skeptical of those programs were entirely justified in their skepticism. Informed, articulate skepticism of the kind I counsel here is the best weapon — perhaps the only effective one — in the fight to defend what remains of liberty and property against the depredations of massive government programs.

Skepticism often is met with the claim that such-and-such a model is the “best available” on a subject. But the “best available” model — even if it is the best available one — may be terrible indeed. Relying on the “best available” model for the sake of government action is like sending an army into battle — and likely to defeat — on the basis of rumors about the enemy’s position and strength.

With respect to the economy and the climate, there are too many rumor-mongers (“scientists” with an agenda), too many gullible and compliant generals (politicians), and far too many soldiers available as cannon-fodder (the paying public).

Scientists and politicians who stand by models of unfathomably complex processes are guilty of physics envy, at best, and fraud, at worst.

# Physics Envy

Max Borders offers a critique of economic modeling, in which he observes that

a scientist’s model, while useful in limited circumstances, is little better than a crystal ball for predicting big phenomena like markets and climate. It is an offshoot of what F. A. Hayek called the “pretence of knowledge.” In other words, modeling is a form of scientism, which is “decidedly unscientific in the true sense of the word, since it involves a mechanical and uncritical application of habits of thought to fields different from those in which they have been formed.” (“The Myth of the Model,” The Freeman, June 10, 2010, volume 60, issue 5)

I’ve said a lot (e.g., here, here, here, here, here, here, here, and here) about modeling, economics, the social sciences in general, and the pseudo-science of climatology.

Models of complex, dynamic systems — especially social systems — are manifestations of physics envy, a term used by Stephen Jay Gould. He describes it in The Mismeasure of Man (1981) as

the allure of numbers, the faith that rigorous measurement could guarantee irrefutable precision, and might mark the transition between subjective speculation and a true science as worthy as Newtonian physics.

But there’s more to science than mere numbers. Quoting, again, from The Mismeasure of Man:

Science is rooted in creative interpretation. Numbers suggest, constrain, and refute; they do not, by themselves, specify the content of scientific theories. Theories are built upon the interpretation of numbers, and interpreters are often trapped by their own rhetoric. They believe in their own objectivity, and fail to discern the prejudice that leads them to one interpretation among many consistent with their numbers.

Ironically, The Mismeasure of Man offers a strongly biased and even dishonest interpretation of numbers (among other things). When a leading critic of physics envy falls prey to it, you know that he’s on to something.