Probability 2 - Boy or Girl paradox
I continue to be amazed by the fact that apparently simple probability problems are often so difficult to resolve. It is particularly surprizing in view of the fact that so much science seems to depend on probability these days: the particle physicists' results are small probabilistic anomalies in incredibly expensive experiments, quantum physicists say that nature is inherently probabilistic, the medicine we take is determined by 'random' trials, evolution apparently occurs by 'random' mutation, etc, etc. (I don't necessarily concur with all this, by the way.)
I have been thinking more about this fact since the blog post of Peter Cameron, which derived from another post by Alex Bellos. The particular problem they consider is this: if someone tells you that they have two children at least one of whom is a boy born on Tuesday, what is the probability that the second child is a boy?
Actually, the simpler version of this problem in which there is no mention of the day of birth (possibly introduced by Martin Gardner who died 22nd May), has been discussed and disputed in short and in long, in circles high and low, under the name "The Boy or Girl Paradox".
In short, my view is that the answer to this question is 1/2 (under the simplifying assumptions clearly intended; that is, that the birth rates of boys and girls are equal, etc).
However a closely related problem which I will explain would yield the answer 1/3, the answer preferred by mathematicians.
I think this simple problem benefits from the distinction between states and actions, the states being the quality of being a girl or a boy, and the actions being the declarations made. In the real world problem you need to consider the probability of the action as well as the probability of being in a state.
Let's now consider the problem in detail: the simplifying assumptions mentioned above mean that given a family with two children the following four cases are equally likely: BoyBoy, BoyGirl, GirlBoy, GirlGirl. In the case BoyBoy the likelihood of a declaration that there is a boy is 1, in the case of GirlGirl the likelihood of a declaration that there is a girl is 1, but in the other two cases there is a half chance that the declaration is 'boy', and a half chance that the declaration is 'girl'. Now let's look at the total weight of a declaration 'boy': it is 1+1/2+1/2. The weight which corresponds to the case BoyBoy is 1. Hence the proportion of 'boy' answers which correspond to the state two boys is
1/(1+1/2+1/2)=1/2.
Now a slightly different problem: suppose I interview people with two children, and ask if they have a boy in the family; if they answer yes, what is the probability that the second child is a boy?
Again the simplifying assumptions mentioned above mean that given a family with two children the following four cases are equally likely: BoyBoy, BoyGirl, GirlBoy, GirlGirl. In the case BoyBoy the likelihood that they respond 'yes' is 1, in the case of GirlGirl the likelihood that they respond 'yes' is 0; in both the other two cases the likelihood of 'yes' is 1, not 1/2. Now let's look at the total weight of a response 'yes': it is 1+1+1. The weight which corresponds to the case BoyBoy is 1. Hence the proportion of 'yes' answers which correspond to the state two boys is
1/(1+1+1)=1/3.
If there is also mention of the day of birth there are at least three interpretions:
1) a person declares that they have two children at least one a boy born on Tuesday;
2) when asked if they have a boy, the response is 'yes, and born on Tuesday';
3) when asked if they have a boy born on Tuesday, the response is 'yes'.
The probability of the second child being a boy in the first case is 1/2, in the second
1/3 and in the third 13/27.
In this post I have tried (exaggeratedly) to argue in as simple language as possible, because a simple problem should only require the appropriate distinctions, not a big machinery. However I have earlier proposed (with Sabadini and de Francesco Albasini) a mathematical context, in which many of the perplexities are seen to arise from considering (normalized) probabilities, rather than weights of actions. The precise mathematical point is that normalization does not behave well with respect to sequential operations (which include abstraction). Another simple perplexing example where abstraction does not work well with normalization is Simpson's paradox, the mathematical origin of which is the fact that not always a/b+c/d=(a+c)/(b+d).
An extreme example of Simpson's paradox is the following:
Consider treatments A,B and diseases X,Y.
Treatment A with disease X on one person cures the person (100% cure rate - best possible)
Treatment B with disease X on 99 people cures 98 (worse than 100%).
So A has a better success rate than B with disease X.
Treatment A with disease Y on 99 people cures 1 person (this is better than 0% cure rate)
Treatment B with disease Y on one person fails to cure (0% cure rate - worst possible).
So A is better than B with disease Y.
In both diseases X and Y the treatment A is better than B.
However treatment A saves 2 people in 100, B saves 98 in a hundred.
B is worse than A???
I have been thinking more about this fact since the blog post of Peter Cameron, which derived from another post by Alex Bellos. The particular problem they consider is this: if someone tells you that they have two children at least one of whom is a boy born on Tuesday, what is the probability that the second child is a boy?
Actually, the simpler version of this problem in which there is no mention of the day of birth (possibly introduced by Martin Gardner who died 22nd May), has been discussed and disputed in short and in long, in circles high and low, under the name "The Boy or Girl Paradox".
In short, my view is that the answer to this question is 1/2 (under the simplifying assumptions clearly intended; that is, that the birth rates of boys and girls are equal, etc).
However a closely related problem which I will explain would yield the answer 1/3, the answer preferred by mathematicians.
I think this simple problem benefits from the distinction between states and actions, the states being the quality of being a girl or a boy, and the actions being the declarations made. In the real world problem you need to consider the probability of the action as well as the probability of being in a state.
Let's now consider the problem in detail: the simplifying assumptions mentioned above mean that given a family with two children the following four cases are equally likely: BoyBoy, BoyGirl, GirlBoy, GirlGirl. In the case BoyBoy the likelihood of a declaration that there is a boy is 1, in the case of GirlGirl the likelihood of a declaration that there is a girl is 1, but in the other two cases there is a half chance that the declaration is 'boy', and a half chance that the declaration is 'girl'. Now let's look at the total weight of a declaration 'boy': it is 1+1/2+1/2. The weight which corresponds to the case BoyBoy is 1. Hence the proportion of 'boy' answers which correspond to the state two boys is
1/(1+1/2+1/2)=1/2.
Now a slightly different problem: suppose I interview people with two children, and ask if they have a boy in the family; if they answer yes, what is the probability that the second child is a boy?
Again the simplifying assumptions mentioned above mean that given a family with two children the following four cases are equally likely: BoyBoy, BoyGirl, GirlBoy, GirlGirl. In the case BoyBoy the likelihood that they respond 'yes' is 1, in the case of GirlGirl the likelihood that they respond 'yes' is 0; in both the other two cases the likelihood of 'yes' is 1, not 1/2. Now let's look at the total weight of a response 'yes': it is 1+1+1. The weight which corresponds to the case BoyBoy is 1. Hence the proportion of 'yes' answers which correspond to the state two boys is
1/(1+1+1)=1/3.
If there is also mention of the day of birth there are at least three interpretions:
1) a person declares that they have two children at least one a boy born on Tuesday;
2) when asked if they have a boy, the response is 'yes, and born on Tuesday';
3) when asked if they have a boy born on Tuesday, the response is 'yes'.
The probability of the second child being a boy in the first case is 1/2, in the second
1/3 and in the third 13/27.
In this post I have tried (exaggeratedly) to argue in as simple language as possible, because a simple problem should only require the appropriate distinctions, not a big machinery. However I have earlier proposed (with Sabadini and de Francesco Albasini) a mathematical context, in which many of the perplexities are seen to arise from considering (normalized) probabilities, rather than weights of actions. The precise mathematical point is that normalization does not behave well with respect to sequential operations (which include abstraction). Another simple perplexing example where abstraction does not work well with normalization is Simpson's paradox, the mathematical origin of which is the fact that not always a/b+c/d=(a+c)/(b+d).
An extreme example of Simpson's paradox is the following:
Consider treatments A,B and diseases X,Y.
Treatment A with disease X on one person cures the person (100% cure rate - best possible)
Treatment B with disease X on 99 people cures 98 (worse than 100%).
So A has a better success rate than B with disease X.
Treatment A with disease Y on 99 people cures 1 person (this is better than 0% cure rate)
Treatment B with disease Y on one person fails to cure (0% cure rate - worst possible).
So A is better than B with disease Y.
In both diseases X and Y the treatment A is better than B.
However treatment A saves 2 people in 100, B saves 98 in a hundred.
B is worse than A???
Labels: probability