Bigger, Stronger, and Faster — but Not Quicker?

SUBSTANTIALLY REVISED 12/22/16; FURTHER REVISED 01/22/16

There’s some controversial IQ research which suggests that reaction times have slowed and people are getting dumber, not smarter. Here’s Dr. James Thompson’s summary of the hypothesis:

We keep hearing that people are getting brighter, at least as measured by IQ tests. This improvement, called the Flynn Effect, suggests that each generation is brighter than the previous one. This might be due to improved living standards as reflected in better food, better health services, better schools and perhaps, according to some, because of the influence of the internet and computer games. In fact, these improvements in intelligence seem to have been going on for almost a century, and even extend to babies not in school. If this apparent improvement in intelligence is real we should all be much, much brighter than the Victorians.

Although IQ tests are good at picking out the brightest, they are not so good at providing a benchmark of performance. They can show you how you perform relative to people of your age, but because of cultural changes relating to the sorts of problems we have to solve, they are not designed to compare you across different decades with say, your grandparents.

Is there no way to measure changes in intelligence over time on some absolute scale using an instrument that does not change its properties? In the Special Issue on the Flynn Effect of the journal Intelligence Drs Michael Woodley (UK), Jan te Nijenhuis (the Netherlands) and Raegan Murphy (Ireland) have taken a novel approach in answering this question. It has long been known that simple reaction time is faster in brighter people. Reaction times are a reasonable predictor of general intelligence. These researchers have looked back at average reaction times since 1889 and their findings, based on a meta-analysis of 14 studies, are very sobering.

It seems that, far from speeding up, we are slowing down. We now take longer to solve this very simple reaction time “problem”.  This straightforward benchmark suggests that we are getting duller, not brighter. The loss is equivalent to about 14 IQ points since Victorian times.

So, we are duller than the Victorians on this unchanging measure of intelligence. Although our living standards have improved, our minds apparently have not. What has gone wrong? [“The Victorians Were Cleverer Than Us!” Psychological Comments, April 29, 2013]

Thompson discusses this and other relevant research in many posts, which you can find by searching his blog for Victorians and Woodley. I’m not going to venture my unqualified opinion of Woodley’s hypothesis, but I am going to offer some (perhaps) relevant analysis based on — you guessed it — baseball statistics.

It seems to me that if Woodley’s hypothesis has merit, it ought to be confirmed by the course of major-league batting averages over the decades. Other things being equal, quicker reaction times ought to produce higher batting averages. Of course, there’s a lot to hold equal, given the many changes in equipment, playing conditions, player conditioning, “style” of the game (e.g., greater emphasis on home runs), and other key variables over the course of more than a century.

Undaunted, I used the Play Index search tool at Baseball-Reference.com to obtain single-season batting statistics for “regular” American League (AL) players from 1901 through 2016. My definition of a regular player is one who had at least 3 plate appearances (PAs) per scheduled game in a season. That’s a minimum of 420 PAs in a season from 1901 through 1903, when the AL played a 140-game schedule; 462 PAs in the 154-game seasons from 1904 through 1960; and 486 PAs in the 162-game seasons from 1961 through 2016. I found 6,603 qualifying player-seasons, and a long string of batting statistics for each of them: the batter’s age, his batting average, his number of at-bats, his number of PAs, etc.

The raw record of batting averages looks like this, fitted with a 6th-order polynomial to trace the shifts over time:

FIGURE 1
batting-average-analysis-unadjusted-ba-1901-2016

Everything else being the same, the best fit would be a straight line that rises gradually, falls gradually, or has no slope. The undulation reflects the fact that everything hasn’t stayed the same. Major-league baseball wasn’t integrated until 1947, and integration was only token for a couple of decades after that. For example: night games weren’t played until 1935, and didn’t become common until after World War II; a lot of regular players went to war, and those who replaced them were (obviously) of inferior quality — and hitting suffered more than pitching; the “deadball” era ended after the 1919 season and averages soared in the 1920s and 1930s; fielders’ gloves became larger and larger.

The list goes on, but you get the drift. Playing conditions and the talent pool have changed so much over the decades that it’s hard to pin down just what caused batting averages to undulate rather than move in a relatively straight line. It’s unlikely that batters became a lot better, only to get worse, then better again, and then worse again, and so on.

Something else has been going on — a lot of somethings, in fact. And the 6th-order polynomial captures them in an undifferentiated way. What remains to be explained are the differences between official BA and the estimates yielded by the 6th-order polynomial. Those differences are the stage 1 residuals displayed in this graph:

FIGURE 2
batting-average-analysis-stage-1-residuals

There’s a lot of variability in the residuals, despite the straight, horizontal regression line through them. That line, by the way, represents a 6th-order polynomial fit, not a linear one. So the application of the equation shown in figure 1 does an excellent job of de-trending the data.

The variability of the stage 1 residuals has two causes: (1) general changes in the game and (2) the performance of individual players, given those changes. If the effects of the general changes can be estimated, the remaining, unexplained variability should represent the “true” performance of individual batters.

In stage 2, I considered 16 variables in an effort to isolate the general changes. I began by finding the correlations between each of the 16 candidate variables and the stage 1 residuals. I then estimated a regression equation with stage 1 residuals as the dependent variable and the most highly correlated variable as the explanatory variable. I next found the correlations between the residuals of that regression equation and the remaining 15 variables. I introduced the most highly correlated variable into a new regression equation, as a second explanatory variable. I continued this process until I had a regression equation with 16 explanatory variables. I chose to use the 13th equation, which was the last one to introduce a variable with a highly significant p-value (less than 0.01). Along the way, because of collinearity among the variables, the p-values of a few others became high, but I kept them in the equation because they contributed to its overall explanatory power.

Here’s the 13-variable equation (REV13), with each coefficient given to 3 significant figures:

R1 = 1.22 – 0.0185WW – 0.0270DB – 1.26FA + 0.00500DR + 0.00106PT + 0.00197Pa + 0.00191LT – 0.000122Ba – 0.00000765TR + 0.000816DH – 0.206IP + 0.0153BL – 0.000215CG

Where,

R1 = stage 1 residuals

WW = World War II years (1 for 1942-1945, 0 for all other years)

DB = “deadball” era (1 for 1901-1919, 0 thereafter)

FA = league fielding average for the season

DR = prevalence of performance-enhancing drugs (1 for 1994-2007, 0 for all other seasons)

PT = number of pitchers per team

Pa = average age of league’s pitchers for the season

LT = fraction of teams with stadiums equipped with lights for night baseball

Ba = batter’s age for the season (not a general condition but one that can explain the variability of a batter’s performance)

TR = maximum distance traveled between cities for the season

DH = designated-hitter rule in effect (0 for 1901-1972, 1 for 1973-2016)

IP = average number of innings pitched per pitcher per game (counting all pitchers in the league during a season)

BL = fraction of teams with at least 1 black player

CG = average number of complete games pitched by each team during the season

The r-squared for the equation is 0.035, which seems rather low, but isn’t surprising given the apparent randomness of the dependent variable. Moreover, with 6,603 observations, the equation has an extremely significant f-value of 1.99E-43.

A positive coefficient means that the variable increases the value of the stage 1 residuals. That is, it causes batting averages to rise, other things being equal. A negative coefficient means the opposite, of course. Do the signs of the coefficients seem intuitively right, and if not, why are they the opposite of what might be expected? I’ll take them one at a time:

World War II (WW)

A lot of the game’s best batters were in uniform in 1942-1945. That left regular positions open to older, weaker batters, some of whom wouldn’t otherwise have been regulars or even in the major leagues. The negative coefficient on this variable captures the war’s effect on hitting, which suffered despite the fact that a lot of the game’s best pitchers also went to war.

Deadball era (DB)

The so-called deadball era lasted from the beginning of major-league baseball in 1871 through 1919 (roughly). It was called the deadball era because the ball stayed in play for a long time (often for an entire game), so that it lost much of its resilience and became hard to see because it accumulated dirt and scuffs. Those difficulties (for batters) were compounded by the spitball, the use of which was officially curtailed beginning with the 1920 season. (See this and this.) As figure 1 indicates, batting averages rose markedly after 1919, so the negative coefficient on DB is unsurprising.

Performance-enhancing drugs (DR)

Their rampant use seems to have begun in the early 1990s and trailed off in the late 2000s. I assigned a dummy variable of 1 to all seasons from 1994 through 2007 in an effort to capture the effect of PEDs. The coefficient suggests that the effect was (on balance) positive.

Number of pitchers per AL team (PT)

This variable, surprisingly, has a positive coefficient. One would expect the use of more pitchers to cause BA to drop. PT may be a complementary variable, one that’s meaningless without the inclusion of related variable(s). (See IP.)

Average age of AL pitchers (Pa)

The stage 1 residuals rise with respect to Pa rise until Pa = 27.4 , then they begin to drop. This variable represents the difference between 27.4 and the average age of AL pitchers during a particular season. The coefficient is multiplied by 27.4 minus average age; that is, by a positive number for ages lower than 27.4, by zero for age 27.4, and by a negative number for ages above 27.4. The positive coefficient suggests that, other things being equal, pitchers younger than 27.4 give up hits at a lower rate than pitchers older than 27.4. I’m agnostic on the issue.

Night baseball, that is, baseball played under lights (LT)

It has long been thought that batting is more difficult under artificial lighting than in sunlight. This variable measures the fraction of AL teams equipped with lights, but it doesn’t measure the rise in night games as a fraction of all games. I know from observation that that fraction continued to rise even after all AL stadiums were equipped with lights. The positive coefficient on LT suggests that it’s a complementary variable. It’s very highly correlated with BL, for example.

Batter’s age (Ba)

The stage 1 residuals don’t vary with Ba until Ba = 37 , whereupon the residuals begin to drop. The coefficient is multiplied by 37 minus the batter’s age; that is, by a positive number for ages lower than 37, by zero for age 37, and by a negative number for ages above 37. The very small negative coefficient probably picks up the effects of batters who were good enough to have long careers and hit for high averages at relatively advanced ages (e.g., Ty Cobb and Ted Williams). Their longevity causes them to be “over represented” in the sample.

Maximum distance traveled by AL teams (TR)

Does travel affect play? Probably, but the mode and speed of travel (airplane vs. train) probably also affects it. The tiny negative coefficient on this variable — which is highly correlated with several others — is meaningless, except insofar as it combines with all the other variables to account for the stage 1 residuals. TR is highly correlated with the number of teams (expansion), which suggests that expansion has had little effect on hitting.

Designated-hitter rule (DH)

The small positive coefficient on this variable suggests that the DH is a slightly better hitter, on average, than other regular position players.

Innings pitched per AL pitcher per game (IP)

This variable reflects the long-term trend toward the use of more pitchers in a game, which means that batters more often face rested pitchers who come at them with a different delivery and repertoire of pitches than their predecessors. IP has dropped steadily over the decades, presumably exerting a negative effect on BA. This is reflected in the rather large, negative coefficient on the variable, which means that it’s prudent to treat this variable as a complement to PT (discussed above) and CG (discussed below), both of which have counterintuitive signs.

Integration (BL)

I chose this variable to approximate the effect of the influx of black players (including non-white Hispanics) since 1947. BL measures only the fraction of AL teams that had at least one black player for each full season. It begins at 0.25 in 1948 (the Indians and Browns signed Larry Doby and Hank Thompson during the 1947 season) and rises to 1 in 1960, following the signing of Pumpsie Green by the Red Sox during the 1959 season. The positive coefficient on this variable is consistent with the hypothesis that segregation had prevented the use of players superior to many of the whites who occupied roster slots because of their color.

Complete games per AL team (CG)

A higher rate of complete games should mean that starting pitchers stay in games longer, on average, and therefore give up more hits, on average. The negative coefficient seems to contradict that hypothesis. But there are other, related variables (PT and CG), so this one should be thought of as a complementary variable.

Despite all of that fancy footwork, the equation accounts for only a small portion of the stage 1 residuals:

FIGURE 3
batting-average-analysis-stage-2-estimates

What’s left over — the stage 2 residuals — is (or should be) a good measure of comparative hitting ability, everything else being more or less the same. One thing that’s not the same, and not yet accounted for is the long-term trend in home-park advantage, which has slightly (and gradually) weakened. Here’s a graph of the inverse of the trend, normalized to the overall mean value of home-park advantage:

FIGURE 4
batting-average-analysis-long-term-trend-in-ballpark-factors-adjustment-normed

To get a true picture of a player’s single-season batting average, it’s just a matter of adding the stage 2 residual for that player’s season to the baseline batting average for the entire sample of 6,603 single-season performances. The resulting value is then multiplied by the equation given in figure 4. The baseline is .280, which is the overall average  for 1901-2016, from which individual performances diverge. Thus, for example, the stage 2 residual for Jimmy Williams’s 1901 season, adjusted for the long-term trend shown in figure 4, is .022. Adding that residual to .280 results in an adjusted (true) BA of .302, which is 15 points (.015) lower than Williams’s official BA of .317 in 1901.

Here are the changes from official to adjusted averages, by year:

FIGURE 5
batting-average-analysis-ba-adjustments-by-year

Unsurprisingly, the pattern is roughly a mirror image of the 6th-order polynomial regression line in figure 1.

Here’s how the adjusted batting averages (vertical axis) correlate with the official batting averages (horizontal axis):

FIGURE 6
batting-average-analysis-adjusted-vs-official-ba

The red line represents the correlation between official and adjusted BA. The dotted gray line represents a perfect correlation. The actual correlation is very high (r = 0.93), and has a slightly lower slope than a perfect correlation. High averages tend to be adjusted downward and low averages tend to be adjusted upward. The gap separates the highly inflated averages of the 1920s and 1930s (lower right) from the less-inflated averages of most other decades (upper left).

Here’s a time-series view of the official and adjusted averages:

FIGURE 7
batting-average-analysis-official-and-adjusted-ba-time-series

The wavy, bold line is the 6th-order polynomial fit from figure 1. The lighter, almost-flat line is a 6th-order polynomial fit to the adjusted values. The flatness is a good sign that most of the general changes in game conditions have been accounted for, and that what’s left (the gray plot points) is a good approximation of “real” batting averages.

What about reaction times? Have they improved or deteriorated since 1901? The results are inconclusive. Year (YR) doesn’t enter the stage 2 analysis until step 15, and it’s statistically insignificant (p-value = 0.65). Moreover, with the introduction of another variable in step 16, the sign of the coefficient on YR flips from slightly positive to slightly negative.

In sum, this analysis says nothing definitive about reaction times, even though it sheds a lot of light on the relative hitting prowess of American League batters over the past 116 years. (I’ll have more to say about that in a future post.)

It’s been great fun but it was just one of those things.

Corresponding with a Collaborator

I correspond with a fellow whom I’ve known for more than forty years. He’s a pleasant person with a good sense of humor and an easy-going personality. He’s also a chameleon.

By which I mean that he takes on the ideological coloration of his surroundings. He agrees with his companions of the moment. It’s therefore unsurprising that he proudly calls himself a “centrist.” Though he wouldn’t put it this way, his centrism involves compromises between good and evil — the necessary result of which is more evil.

“Centrist,” in his case, is just another word for “collaborator.”

A recent exchange will tell you all that you need to know about him. It began with an e-mail from a third party, in which this was quoted:

IF YOU HAD A HUNCH THE NEWS SYSTEM WAS SOMEWHAT RIGGED AND YOU COULDN’T PUT YOUR FINGER ON IT, THIS MIGHT HELP YOU SOLVE THE PUZZLE.

ABC News executive producer Ian Cameron is married to Susan Rice, National Security Adviser.

CBS President David Rhodes is the brother of Ben Rhodes, Obama’s Deputy National Security Adviser for Strategic Communications.

ABC News correspondent Claire Shipman is married to former White House Press Secretary Jay Carney.

ABC News and Univision reporter Matthew Jaffe is married to Katie Hogan, Obama’s Deputy Press Secretary.

ABC President Ben Sherwood is the brother of Obama’s Special Adviser Elizabeth Sherwood.

CNN President Virginia Moseley is married to former Hillary Clinton’s Deputy Secretary Tom Nides.

Ya think there might be a little bias in the news?

The chameleon’s comment:

I share your concern about MSM bias, but am not as troubled by it. (I stopped watching the Big 3s’ evening news 50 years ago because I couldn’t get a straight view on the Vietnam War.)

My comment on his comment:

You may have stopped watching, and I did too, but millions haven’t. And too many of them are swallowing it whole, which is a big reason for the leftward drift of the country over the past 50 years. (JFK could pass for a conservative today.) So I’m very troubled by it.

His reply to me:

But at my absolute center is a belief in universal suffrage.
In a nation of 150m or so (potential) voters, tens of millions are going to be swayed by CBS or, egads, Fox. If it weren’t those sources, it would be something else like them.

I can’t fix that, and see trying as futile. That’s why I’m not troubled. (My lack of concern also stems from seeing the USA as fundamentally on the right track. The latest evidence for that is the rejection of Trump about to occur. And yes, we’ll get Hillary’s excesses in consequence — but Congress will put on the brakes. We survived the Carter presidency when I’d have preferred Ford.)

Let’s parse that.

But at my absolute center is a belief in universal suffrage. What’s sacred about universal suffrage? If suffrage should encompass everyone who’s looking for a free ride at the expense of others — which it does these days — it should certainly include children and barnyard animals. Why should suffrage of any kind be the vehicle for violating constitutional limits on the power of the central government? That’s what it has come to, inasmuch as voters since the days of TR (at least) have been enticed to elect presidents and members of Congress who have blatantly seized unconstitutional powers, with the aid of their appointed lackeys and the connivance of a supine Supreme Court.

In a nation of 150m or so (potential) voters, tens of millions are going to be swayed by CBS or, egads, Fox. If it weren’t those sources, it would be something else like them. True, and all the more reason to keep the power of the central government within constitutional limits.

I can’t fix that, and see trying as futile. That’s why I’m not troubled. You, and I, and every adult can strive to “fix it” in ways big and small. Voting is one way, though probably the least effective (as an individual act). Speaking and writing on the issues is another way. I blog in the hope that some of what I say will trickle into the public discourse.

My lack of concern also stems from seeing the USA as fundamentally on the right track. It’s on the right track only if you think that the decades-long, leftward movement toward a powerful, big-spending, paternalistic government is the right track. That may very well suit a lot of people, but it also doesn’t suit a lot of people. Even FDR never won more than 61 percent of the popular vote, and his numbers dwindled as time went on. But perhaps you’re a utilitarian who believes that the pleasure A obtains from poking B in the eye somehow offsets B’s pain. You may not believe that you believe it, but that’s the import of your worship of universal suffrage, which is nothing more than blind allegiance to the primitive kind of utilitarianism known as majority rule.

The latest evidence for that is the rejection of Trump about to occur. Trump hasn’t yet lost, and even if he does, that won’t be evidence of anything other than desperation on the part of the operatives of the regulatory-welfare state and their various constituencies. Rejection, in any case, would be far from unanimous, so rejection is the wrong word — unless you believe, as you seem to do, that there’s a master “social conscience” which encompasses all Americans.

And yes, we’ll get Hillary’s excesses in consequence — but Congress will put on the brakes. Not if the Dems gain control of the Senate (a tie will do it if HRC is elected), and the ensuing Supreme Court appointees continue to ratify unconstitutional governance.

We survived the Carter presidency when I’d have preferred Ford. There have been more disastrous presidencies than Carter’s, why not mention them? In any event “survival” only means that the nation hasn’t yet crashed and burned. It doesn’t mean that there hasn’t been irreparable damage. Mere survival is a low hurdle (witness the Soviet Union, which survived for 74 years). Nor is mere survival an appropriate standard for a nation with as much potential as this one — potential that has been suppressed by the growth of the central government. So much loss of liberty, so much waste. That’s why I’m troubled, even if I can do little or nothing about it.

In closing, your political philosophy is an amalgam of “all is for the best … in the best of all possible worlds,” “What, me worry?,” “I’m all right, Jack,” and “Befehl ist Befehl.”

I won’t send the reply because I’m too nice a guy. And because it would pointless to challenge anyone who’s so morally obtuse — but likeable.

Election 2016

UPDATED 11/07/16

Most aggregations of polls give Clinton a narrow lead, which (according to the polls) has increased in the past few days. Some reliable, independent polls tell a different story. Start here:

FIGURE 1
clintons-lead-deficit-in-5-polls-since-oct-1
In this graph and the next one, I plot all of the values against the dates on which polling was conducted or bets were made, not the dates on which results were released. In the case of multi-day polling, I use the central date of the polling period. Therefore, all of the polls are slightly out-of-date, a fact that one should consider when interpreting the numbers.

These are the three aggregations and two polls plotted in figure 1:

  • the aggregation of 4-way polls (Clinton, Trump, Johnson, Stein) at RealClearPolitics (RCP), averaged over 5 days
  • the aggregation of 2-way polls (Clinton vs. Trump) at RCP, averaged over 5 days
  • the IBD/TIPP poll, which has a good track record and a high rating from FiveThirtyEight
  • the USC/LA Times poll, which is another reputable Trump-leaning one.

This graph provides a close up of recent changes in the three aggregations and two polls:

FIGURE 2
day-to-day-change-in-5-poll-average

Trump’s momentum may have slowed, but it won’t take much to push him over the top.

If Trump ekes out 51 percent of the two-party vote, he’ll win upwards of 300 electoral votes. (That estimate is based on my model of the relationship between the popular-vote and electoral-vote outcomes in elections since World War II.)  How would he get there? Here’s a scenario that fits the demographics of the various States:

  • Obama beat Romney 332-206 in the electoral-vote tally four years ago.
  • Clinton could take two States won by Romney in 2012: Georgia (16 EVs) and Utah (6).
  • Trump could more than offset those 22 EVs by taking several States won by Obama in 2012: Florida (29), Iowa (6), Maine (4), Michigan (16), Nevada (6), New Mexico (5), Ohio (16), Pennsylvania (20), and Wisconsin (10).

That would leave Clinton with 240 EVs to Trump’s 298. There are many plausible variations on the scenario that would leave Trump with a majority of EVs, or result in a tie.

Stay tuned.