girl fraction – Politics and Prosperity

REVISED 01/01/11 and 01/02/11, with the addition of new material (clearly indicated).

Economist-mathematician Steven Landsburg recently offered a problem and later posted a purported solution to it. Landsburg is so confident that his solution is the correct one that he has offered to bet significant sums of money that it’s the correct one. And, as of this morning, Landsburg still insists that he has the right answer.

The problem is one (among many) that Google has posed to candidates for employment. Landsburg states it as follows:

There’s a certain country where everybody wants to have a son. Therefore each couple keeps having children until they have a boy; then they stop. What fraction of the population is female? [The actual wording of the question, according to this source, is slightly different, but Landsburg’s paraphrase is faithful to the meaning.]

He adds:

Well, of course, you can’t know for sure, because, by some extraordinary coincidence, the last 100,000 families in a row might have gotten boys on the first try. But in expectation, what fraction of the population is female? In other words, if there were many such countries, what fraction would you expect to observe on average?

I first heard this problem decades ago, and so, perhaps, did you. It comes up in job interviews at places like Google. The answer they expect is simple, definitive and wrong.

And no, it’s not wrong because of small discrepancies between the number of male and female births, or because of anything else that’s extraneous to the spirit of the problem. It’s just really wrong. The correct answer, unlike the expected one, is not simple.

According to Landsburg, the “obvious” — but wrong — answer is that one-half of the children are boys and one-half of the children are girls. Landsburg rejects the “obvious” answer, with this explanation:

I’ll start with the case where there’s just one couple. Here are some possible family configurations, with their probabilities:

From this we see that the expected number of boys is

which adds to 1. And the expected number of girls is

which also adds to 1. Sure enough, the expected number of girls is equal to the expected number of boys.

But the expected fraction of girls is

which adds to 1-log(2), or about 30.6%.

For a population of k families, a similar calculation gives an answer of approximately (but not exactly) (1/2) – (1/4k), which, when k is large, is approximately (but not exactly) 1/2.

Elsewhere, Landsburg offers to make bets with readers who disagree with his analysis, and to settle matters through the use of simulation. But (a) I’m not interested in betting and (b) I prefer to treat the problem as one of mathematical expectation, which Landsburg also (rightly) prefers. He suggests the use of simulation only as a way of convincing some skeptics of the correctness of his analysis.

Interestingly, Landsburg’s “solution” — an expected girl fraction of 0.3068 for one family and, presumably, not quite 0.5 for the entire country — is at odds another person’s solution (girl fraction ~0.61),to which Landsburg points favorably. This discrepancy suggests some confusion on Landsburg’s part, which is evident in his depiction of the possible configurations of a single family (the children in the family, actually). He takes a special case — which omits the possibility of a first-born girl. He then generalizes from that special case.

__________________________________________________________________________________________

This section added 01/01/11 and revised slightly for clarity on 01/02/11:

What I took for an omission on Landburg’s part (the possibility of a first-born girl), isn’t an omission — in Landsburg’s view. His estimate of the girl fraction for a family is for a completed family (his term, not mine). Thus the configurations B, GB, GGB, GGGB, etc.

But there’s no such thing (in the context of a single family) as a completed family. (In a large number of families, there may be a completed families, but every one of them will be matched by an equal number of uncompleted families. More about that below.) No particular couple ever has a boy with an a priori probability = 1, which is what Landsburg implies (inadvertently, I’m sure) when he focuses on B, GB, GGB, GGGB, etc., where the B in each case signifies the end of a possible sequence of children.

On the contrary, if PB = 1/2 at all times (and not 1 at arbitrary times) every boy must be accompanied by a girl, with equal probability. (Alternatively, shades of Schrödinger’s cat, the probability wave collapses to PB =1 when Landsburg decides that enough kids are enough.) Here’s a schematic depiction of what happens when Landsburg doesn’t play with the probabilities:

On the left side of the vertical line, the probable first-born boy, being only probable, is followed by a probable boy or girl, and so on. On the right side, the probable first-born girl is followed by a probable boy or girl, and so on. On both sides, the possible configurations take the form Child 1, Child 1 + Child 2, Child 1 + Child 2 + Child 3, etc.

Landburg, in effect, has restricted his view of the possible configurations to the left side of the diagram.With the whole diagram in view, it’s obvious that the fraction of girls in each stage, and through each stage, is always 1/2.

End of section. Original post continues below the line.

________________________________________________________________________________________

To analyze the problem correctly it’s helpful to spell out all of its conditions and (sometimes implicit) assumptions:

The basic rule — couples have children until a boy is born, but not after that — means that no couple in country X will have more than 1 boy, but the number of girls is limited only to the number of children that a couple generates before having a boy.

A couple can generate children endlessly, if necessary. That is to say, there’s no limit on the number of children a couple may produce and no limit on the time in which they may produce them.

The probability that any given birth will result in a boy (B) or girl (G) is 1/2 for each; that is PB = PG = 1/2. (The actual fractions, I gather, are about 0.52 boys and 0.48 girls per birth.) These probabilities never vary, and are always the same for every couple.

Given the open-ended nature of the problem, it’s possible that some couples will have an infinite number of children without producing a boy. But, given the preceding statement, the first-born of half the couples will be a boy; those couples will have no more children.

The situation begins at a finite time (t = 0) and only those children born to the couples in country X after t = 0 are counted.

Children are born at a uniform rate, so that all the first-borns are born at t = 1; all the second-borns at t = 2; and so on. (This assumption and the preceding one don’t affect the results, but they allow for a simple illustration of the problem.)

In computing the fraction of boys and girls in the population, it’s assumed that there are no abortions or miscarriages, and that no children die at birth or later.

Perhaps the solution to the problem will be easier to see if the problem is recast, so that B = blue ball, G = green ball, and “couple” becomes “player”:

There’s a large but finite number of urns. Each is full of colored balls. Half of them are blue (B); half of them are green (G).

Positioned at each urn is a player whose job it is to make a blind draw of one ball from his urn at a regular interval (say, 1 minute). Each ball is kept by the player who draws it.

As each ball is drawn, the keeper of the urn from which it is drawn replaces it with a new ball of the same color, and mixes the balls thoroughly to ensure the randomness of the next draw.

When a player draws a B, he keeps it but doesn’t draw any more balls.

When a player draws a G, he keeps it and draws another ball (from the replenished urn) a minute later. This continues until the player draws a B.

It’s possible that some players will never draw a B.

All of the other assumptions stated earlier apply in this case (e.g., PB = PG = 1/2, PB and PG never vary and are always the same for every player).

Consider the following illustration of the results of the first four rounds of play:

Illustration of the General Case ( with 10,000 urns)
	B	G	Total	G fraction
Minute 1	5,000	5,000	10,000	1/2
Total	5,000	5,000	10,000	1/2

Minute 2	2,500	2,500	5,000	1/2
Total	7,500	7,500	15,000	1/2

Minute 3	1,250	1,250	2,500	1/2
Total	8,750	8,750	17,500	1/2

Minute 4	625	625	1,250	1/2
Total	9,375	9,375	18,750	1/2

The 5,000 players who draw a B at minute 1 stop drawing, but they keep the Bs that they draw. The 5,000 players who draw a G at minute 1 keep their Gs and make another draw at minute 2. That draw results in the selection of 2,500 Bs and 2,500 Gs, which are kept by the players who draw them. The 2,500 players who draw a G at minute 2 make another draw at minute 3, which results in the selection of 1,250 Bs and 1,250 Gs, and so on.

At this point, it’s important to note that the stopping rule has no effect on the fractions of B and G drawn in any round of play. Given that PB and PG are always the same for each and every player, as stated in the list of assumptions, it doesn’t matter whether or when players drop out of the game or join the game, or under what conditions they drop out or join, as long as PB = PG = 1/2, always and for every player. Given those conditions — which are central (implicit) assumptions of the original Google problem — every round of play, from the first one onward, results in equal (expected) numbers of B and G.

_______________________________________________________________________________________

This section added and revised 01/02/11.

In the example of 10,000 players with 10,000 urns, the number of players at minute 5 would be an odd number, and the number of players at minute 6 and beyond wouldn’t be an integer. But the example is about expected values (as it should be), so the lack of an even, whole number of players after minute 4 wouldn’t affect the import of the example.

To show what happens in the “end game,” I turn to a slightly different example, which begins with a number of players such that there can be exactly two left in the penultimate round, after all others have drawn a B. What happens when one of the two players draws a B while the other draws a G? Good question. First of all, the draws to and including that round will have resulted in an equal number of B and G. What happens next is a matter of pure chance. The final player — the one whose penultimate draw is a G — has an equal chance of drawing a G or a B on his next and final draw. The chance of drawing a B doesn’t suddenly jump to 1, nor does the chance of drawing a G suddenly jump to 1. The game could end there, with equal numbers of B and G having been drawn and a final draw to be made with PB = PG = 1/2. But there’s no reason to expect that the final draw will be a B, to the exclusion of a G, or vice versa.

Schematically:

(The Gs drawn by players 8191 and 8192 in rounds 1-11 would have been matched, in each round, by other players who draw Bs in those rounds. Players 8191 and 8192, in this example, would be the only players left for round 12.)

End of section. Original post continues below the line.

_______________________________________________________________________________________

Mathematically, the expected numbers of B and G (EB and EG) drawn by a single player are as follows:

EB = (1)(1/2) + (1/2)(1/2) + (1/4)(1/2) + (1/8)(1/2) + … = 1

EG = (1)(1/2) + (1/2)(1/2) + (1/4)(1/2) + (1/8)(1/2) + … = 1

The expected fraction of G is:

EG/(EG + EB) = (1/2)(1/2) + (1/4)(1/2) + (1/8)(1/2) + … = 1/2,

which reduces to PG/(PG + PB) = 1/2

Given N players, the expected numbers of B and G are:

EB = (N)(1/2) + (N/2)(1/2) + (N/4)(1/2) + (N/8)(1/2) + … = N

EG = (N)(1/2) + (N/2)(1/2) + (N/4)(1/2) + (N/8)(1/2) + … = N

Again, given the “rules” of the game (i.e., of the original Google problem), the expected fraction of G is always 1/2, at every point in the game and in its expected (but never reached) outcome.

The stopping rule is a red herring, intended (I suspect) to draw attention from the essential fact that EB = EG, always and for everyone, no matter how many players (couples) there are or when and under what conditions they join or leave the game (start or stop having children).

* * *

To Prof. Landsburg, should he read this post:

If you don’t immediately spot a fatal flaw in my analysis, why not have some members of U of R’s stat department check it out? That seems to me to be the best way to settle the issue.

If you conclude that my analysis is correct (in the essentials, at least), you won’t owe me any money because we haven’t made a bet. (You couldn’t send me money, anyway, unless you are able to penetrate my anonymity.) Just acknowledge my contribution prominently in a post on your blog and add a link to my blog in your sidebar (perhaps under the heading “Unclassified Blogs”).

* * *

This isn’t the first time that Landsburg has attracted my attention:
Landsburg Is Half-Right
Rawls Meets Bentham
The Case of the Purblind Economist

Tag: girl fraction

Take Landsburg’s Money, Revised

Take Landsburg’s Money

Share this:

Share this: