## Tuesday, August 05, 2014

### The sleeping beauty problem: how some philosphers and physicists calculate probability

Next post in this series

Recently there have been surprising discussions and disputes amongst philosophers and physicists about an elementary probability problem called the Sleeping Beauty problem. The following remarks are, as usual in this blog, the result of discussions with Nicoletta Sabadini.

The problem goes as follows: A beauty is told that the following procedure will be carried out. On Sunday a fair coin will be tossed without her knowing the result. She will go to sleep. Then on Monday one of two possibilities will occur.

In the case that the toss of the coin resulted in tails she will be wakened and asked her opinion of the probability that the result of coin was heads. She will then have her memory of what happened on Monday erased and will be put to sleep. On Tuesday  (again in the case of tails, without a further toss of the coin) she will be wakened and asked her estimate of the result of the coin toss being heads.

In the case of heads, on Monday she will be asked her estimate of the probability that the result of the coin toss was heads. In that case she will not be asked again.

It seems clear intuitively that, when this procedure is carried out, in all three responses  she has learnt nothing about the result of the coin toss, and that she should answer in each case $1/2$.

Strangely a considerable number of philosophers and physicists make an elementary error in the calculation of probabilities and believe that she should answer $1/3$.

Update: I see also that a recipient of the COPSS Presidents' Award (the Nobel Prize of Statistics, according to Wikipedia), Jeffrey S. Rosenthal, believes the $1/3$ answer.

The philosophers who come to this conclusion include professors at Princeton (Adam Elga) and Oxford (Nick Bostrom). The physicists include a Senior Research Associate at Caltech, Sean Carroll, who manages to involve quantum mechanics and the multiverse.  Another seems to be a Dirac and Milner prizewinner, Joe Polchinski. (Not all physicists agree - for example, the maverick (or main stream?) physicist Lubos Motl is scathing about the answer $1/3$.)

Update: Lubos Motl has reblogged this post on his blog, with additional interesting comments.

Let's see how they manage to come to the $1/3$ conclusion and what their elementary error is.

We can describe the procedure by a simple Markov chain, pictured below, in which the initial state (Sunday) is $0$, the state $1$ is Monday in the case of heads, the state $2$ is Tuesday and subsequent dates after heads, the state $3$ is Monday in the case of tails, the state $4$ is Tuesday and subsequent dates in the case tails.
Now the argument of the philosophers/physicists mentioned above is that when the beauty is awakened she knows she is in one of states 1, 3 or 4 but she doesn't know which.
They claim that being in each of these three states has the same probability, and hence this probability must be $1/3$ (the sum must be $1$).  But in state $1$ she knows the coin was heads, and in $3$ and $4$ the coin was tails. So in a third of cases the coin is heads, so she must answer $1/3$.

The egregious error!
In a Markov process one cannot in general talk about the probability of being in a state. One must talk about the probability of passing from state $x$ to state $y$ in $n$ steps.
(If the process is ergodic then in a limiting distribution there is some sense of talking about the probability of being in a state, but that is not the case here.)

The three probabilities discussed above are (i) the probability $p_1$ of going from $0$ to $1$ in one step, (ii) the probability $p_3$ of going from $0$ to $3$ in one step, and (iii) the probability $p_4$ of going from $0$ to $4$ in two steps. Each of these three values is clearly $1/2$.

What however should the beauty calculate for the probability $p_{3,4}$ of being either in $3$ in one step or $4$ in two steps? The philosophers/physicists are claiming that $p_{3,4}=2p_1$. I claim $p_{3,4}=p_1$. To see this we need to think in general how to calculate the probability of going from state $x$ to $y$ in $m$ or $n$ steps in a Markov chain.

Take $x$ to be a fixed initial state, which we wont mention again. The sum over all $y$ of $p_{one}(y)$ (the probability of reaching $y$ in one step) is $1$.    The sum over all $y$ of $p_{two}(y)$ (the probability of reaching $y$ in two steps) is $1$.  Hence taking $p_{one,two}(y)=p_{one}(y)+p_{two}(y)$ as the physicists/philosophers do would yield $\sum_yp_{one,two}(y)=2$ whereas the total probability should be one.
A reasonable definition for $p_{one,two}$ would be $p_{one,two}(y)=\frac{1}{2}( p_{one}(y)+p_{two}(y))$.

This definition yields in the sleeping beauty problem that $p_{3,4}=\frac{1}{2}(p_3+p_4)= \frac{1}{2}=p_1$.

Labels: Luboš Motl said...

Dear Dr Walters, it's a nice text! I agree with the things you know to have been written by myself, but I also find your comment in the error in the Markov chain context enlightening. I was trying to formulate as clearly as I could what's wrong about treating the "states" of Monday and Tuesday in the heads case as mutually exclusive, even though they are not mutually exclusive as they follow from each other by the flow of time. I think that your description why it's an error in terms of the Markov chain probability disclaimers is crisper.

Wouldn't you agree that this text is posted as a guest blog on The Reference Frame, my blog? Please reply to firstname dot lastname at gmail dot com if you can.

Thanks,
Lubos Motl

11:59 PM Luboš Motl said...

Otherwise concerning the maverick/mainstream question, an interesting one. I know that some people may view me as a maverick. But among the physicists who know what they're doing, e.g. about theoretical/particle physicists with at least 3,000 citations, I think that 90+ percent would agree that I am more right than those who disagree with me in most physics-related disagreements. It's just that the number of competent physicists like that isn't high and they don't spend time on the Internet and in the media so they're more underrepresented in the "ensemble" or "discourse" than the awakenings of the sleeping beauty while there is "tails".

12:13 AM DN said...

A hint that your reasoning is wrong is that there is no use of p_1=0.5 in your eventual calculation (upon defining p_one,two). But if p_1=0 (she is not interviewed in case of heads), then clearly she can deduce p_3,4 = 1.

You could say that the generalized thirder claim is that the ratio of interviews after heads and tails matters, even if there is more than 1 interview after, say, tails (consider the SB with 0.5 interview after heads, 1 interview after tails for comparison).

4:02 AM RSM said...

First of all: "The philosophers/physicists are claiming that p3,4 = 2p1." This is a straw man, as no philosophers or physicists are claiming that. In this case, with 3 and 4 being states along a non-branching path, such a claim is obviously wrong. But to imply that the thirder position rests on that claim is a non sequitur.

Consider a room full of 100 computers, all running the same program started at random times. The program has four
states. When the program starts, it draws a random 32-bit integer. Assume a uniform PDF. If the number drawn is even,
the program goes to state S1; if the number drawn is odd, it goes to state S3. Execution then proceeds as follows:

S1: Clear the screen, display "awake", wait one hour, then go to S2.
S2: Clear the screen, display "awake", wait one hour, then terminate the program.
S3: Clear the screen, display "awake", wait one hour, then go to S2.
S4: Clear the screen, display "asleep", wait one hour, then terminate the program.

The program is restarted automatically every time it terminates. The initialization and termination of the program,
and the transitions between states, all take a negligible amount of time, compared to the one-hour duration of each
state.

X walks into the room, picks a computer at random, and reads the display. The display says "awake". X does not
know how long the program has been running on this particular machine, but he knows all of the details given above.

What would X say is the probability that the random number drawn by this computer, for the present run of the
program, was odd?

If he were a philosopher or a physicist, he would condition his probability on the observation that the display says "awake". Unfortunately, because doing that would violate the precept that "In a Markov process one cannot in general talk about the probability of being in a state. One must talk about the probability of passing from state x to state y in n steps", he would be making an egregious error if he did so.

8:09 PM RSM said...

Small but important correction to my post: For the S3 line, it should say "go to S4", not "go to S2".

9:08 AM Robert Walters said...

A response to RSM (including the correction): You have changed the problem to one that might be described by an ergodic Markov chain. Of course for an ergodic chain there is a sense to the probability of a state, at least approximately and in the long run. You have carefully hidden the initial state. But the sleeping beauty is not ergodic, and further concerns probabilities in the first few steps from a precise initial state.

10:29 AM RSM said...

Robert,

Ergodicity aside, what epistemic difference can you point to between X and Beauty? Both have equivalent information at their disposal.

Let X happen upon a single computer that just happens by coincidence to be running a copy of the program. In this case, the program runs once and terminates without restarting. He sees the output "awake". He knows the content of the program it is running. How is this case ergodic? What probability should he assign now?

"You have carefully hidden the initial state." Meaning, not that the initial state is unknown (it is known; we can inspect the code), but the time at which the initial state occurred relative to the present is not known. Isn't that also Beauty's problem?

"But the sleeping beauty is not ergodic, and further concerns probabilities in the first few steps from a precise initial state." What I said above. The program is never more than two steps from its initial state. Whether it is one copy running once or a million copies repeated endlessly, the problem for X remains the same.

1:43 PM Robert Walters said...

Response to RSM: You confuse the probability of passing from one state to another in a number of steps with the long term fraction of time in passing a state. These two things are quite different. It is the first that is requested of the beauty.

12:09 AM