### Probability

I made the suggestion recently that the Kolmogorov presentation of probability was too abstract, and that this was the cause of elementary confusions like those common in the Monty Hall problem.

I must admit to being quite unsure of the foundations of probability theory. The fact that simply stated problems are easy to make errors about hints to me that the gap between theory and practice is too wide. I make some tentative remarks below, which are made more formal in a small paper we have just written.

1. First, I think probability should be about explicitly described systems. (Peter Cameron in his blog post describes two possible systems ("protocols") which might be behind a particular problem, yielding different results.) This would imply that one needs a notion of system. In this context probabilities are weightings of actions in a particular state of a system. Such a weighting represents information about the cause of actions. It is a kind of primitive dynamical information.

2. The Kolmogorov view is that independence is defined in terms of the probabilities of events. This is the attitude that we are observing behaviours, rather than considering systems. For systems there is a clear operation of composition in parallel. Actions may be appear to be independent behaviourally without being actually parallel.

3. The fact that the sum of all probabilities must be 1 is another evidence of a behavioural view. A system may have big reasons for making a choice, or small reasons, in both cases, say, with a half and half probability. This doesn't affect observed behaviours. However in composing systems large reasons in one system may overwhelm small reasons in another.

To make this more concrete: suppose I am in a situation where I am deciding to eat an apple or a pear (with equal interest), and suddenly I perceive that a car is driving straight towards me, and that with equal choice I must decide whether to run to the left or right to avoid an accident. This is a state in a composition of two systems. It is clear that the weight behind choosing an apple or a pear is overwhelmed by the choice between jumping left or right.

At a system level weight is more fundamental than probability, which arises as a normalization of weight.

Update: I see now that Peter Cameron has come to a conclusion (which I quote in full, but with my italics; it refers to a particular calculation which you can see at his blog):

"Mathematically the conditional probability that someone has two boys, given that one of their two children is a boy born on Tuesday, is 13/27. If we started defining the probability of some event in terms of the algorithm that led to the statement being made, all our textbooks would need to be rewritten from the ground up. I think the best way to proceed is to say, we do know how to calculate probability (and how to interpret it), but this requires careful thought, and sometimes our intuition lets us down."

Update: I made a too brief comment on Peter Cameron's blog, namely that

"I agree with John Faben [another commenter] that the calculation of probability requires information about the algorithm involved."

It is somewhat difficult in blogs to follow which comments are later referred to, but it seems that Peter replied to my comment with the following:

"Sorry to have to disagree…

Any calculation in probability can be done unambiguously by the rules (based on Kolmogorov’s axioms) provided we specify carefully what is the “sample space” and what is the probability measure on the sample space. If you start bringing in other factors like the algorithm used to generate a statement then all you are doing is changing the measure. That is why I said in my example that I am a covert Bayesian. I happen to think that the probability measure that applies in a given situation depends on everything I know about the situation (which may include information about the algorithm used to generate some statement)."

He wrote more which I do not reproduce here.

I have been trying to understand why is it difficult to apply the existing theory (Kolmogorov) to apparently simple real world problems, a fact admitted by Peter. (There is a rumour that even Erdos had difficulty with Monty Hall.)

My suggested answer to this is that there is too great a gap between real world problems and the theory, and perhaps there could be a more detailed theory in between reducing the gap.

The more detailed theory would be prior to the construction of the probability space.

It would consist in making more precise the real world problem, by describing a mathematically formal system (algorithm, or protocol). This is not beyond the capacity of mathematics to do.

Peter seems to agree when he says "everything I know about the situation", which is another way of saying the precise system under consideration.

There is a sleight of hand in the statement "If you start bringing in other factors like the algorithm used to generate a statement then all you are doing is changing the measure". The algorithm is needed to determine the measure; different algorithms determine different measures.

I must admit to being quite unsure of the foundations of probability theory. The fact that simply stated problems are easy to make errors about hints to me that the gap between theory and practice is too wide. I make some tentative remarks below, which are made more formal in a small paper we have just written.

1. First, I think probability should be about explicitly described systems. (Peter Cameron in his blog post describes two possible systems ("protocols") which might be behind a particular problem, yielding different results.) This would imply that one needs a notion of system. In this context probabilities are weightings of actions in a particular state of a system. Such a weighting represents information about the cause of actions. It is a kind of primitive dynamical information.

2. The Kolmogorov view is that independence is defined in terms of the probabilities of events. This is the attitude that we are observing behaviours, rather than considering systems. For systems there is a clear operation of composition in parallel. Actions may be appear to be independent behaviourally without being actually parallel.

3. The fact that the sum of all probabilities must be 1 is another evidence of a behavioural view. A system may have big reasons for making a choice, or small reasons, in both cases, say, with a half and half probability. This doesn't affect observed behaviours. However in composing systems large reasons in one system may overwhelm small reasons in another.

To make this more concrete: suppose I am in a situation where I am deciding to eat an apple or a pear (with equal interest), and suddenly I perceive that a car is driving straight towards me, and that with equal choice I must decide whether to run to the left or right to avoid an accident. This is a state in a composition of two systems. It is clear that the weight behind choosing an apple or a pear is overwhelmed by the choice between jumping left or right.

At a system level weight is more fundamental than probability, which arises as a normalization of weight.

Update: I see now that Peter Cameron has come to a conclusion (which I quote in full, but with my italics; it refers to a particular calculation which you can see at his blog):

"Mathematically the conditional probability that someone has two boys, given that one of their two children is a boy born on Tuesday, is 13/27. If we started defining the probability of some event in terms of the algorithm that led to the statement being made, all our textbooks would need to be rewritten from the ground up. I think the best way to proceed is to say, we do know how to calculate probability (and how to interpret it), but this requires careful thought, and sometimes our intuition lets us down."

Update: I made a too brief comment on Peter Cameron's blog, namely that

"I agree with John Faben [another commenter] that the calculation of probability requires information about the algorithm involved."

It is somewhat difficult in blogs to follow which comments are later referred to, but it seems that Peter replied to my comment with the following:

"Sorry to have to disagree…

Any calculation in probability can be done unambiguously by the rules (based on Kolmogorov’s axioms) provided we specify carefully what is the “sample space” and what is the probability measure on the sample space. If you start bringing in other factors like the algorithm used to generate a statement then all you are doing is changing the measure. That is why I said in my example that I am a covert Bayesian. I happen to think that the probability measure that applies in a given situation depends on everything I know about the situation (which may include information about the algorithm used to generate some statement)."

He wrote more which I do not reproduce here.

I have been trying to understand why is it difficult to apply the existing theory (Kolmogorov) to apparently simple real world problems, a fact admitted by Peter. (There is a rumour that even Erdos had difficulty with Monty Hall.)

My suggested answer to this is that there is too great a gap between real world problems and the theory, and perhaps there could be a more detailed theory in between reducing the gap.

The more detailed theory would be prior to the construction of the probability space.

It would consist in making more precise the real world problem, by describing a mathematically formal system (algorithm, or protocol). This is not beyond the capacity of mathematics to do.

Peter seems to agree when he says "everything I know about the situation", which is another way of saying the precise system under consideration.

There is a sleight of hand in the statement "If you start bringing in other factors like the algorithm used to generate a statement then all you are doing is changing the measure". The algorithm is needed to determine the measure; different algorithms determine different measures.

Labels: probability

## 1 Comments:

I've been giving this quite a bit of thought over the last week and I'm not convinced that there is such a big problem with the existing formulation of probability (although I do concede that it does seem to trick people up a fair bit).

If you know a particular algorithm is being used, then that will certainly affect probabilities you assign to certain events. For example, in the Monty Hall case if you knew Monty's algorithm was only to open a second door if you'd chosen correctly your course of action would be very different to what you'd do if you knew his algorithm was only to open the door if you'd chosen incorrectly.

However, I don't that in the absence of knowledge about algorithms, there is nothing you can do. Rather, your lack of knowledge should be encapsulated in your probabilities. To take another example, if you had a dice you knew was loaded to land on 6, you wouldn't assign a probability of 1/6 to each possible throw outcome. But, if you had a dice which you knew

couldbe loaded but you had no reason to think one side was more likely to be loaded than the other, you would still reasonably assign a probability of 1/6 to each side.Back to the Boys and Girls problem, I also think you can proceed in a fairly reasonable way without knowing the algorithm used by your friend to decide to say "I have a boy born on Tuesday". I would would actually argue that, in the face of this statement the most sensible probability to assign to two boys is not 13/27 but 1/2. The argument follows a similar pattern to the one on your other post, but with an additional observation. If you are only interested in a ratio P(A|X)/P(A) and you are struggling to work out P(A)...after all your friend may have said "Tennis anyone?" instead...it may not matter because that ratio is actually equal to P(A|X and Z)/P(A|Z) if A is a subset of Z and X and Z are independent. What I have in mind here is that A is "your friend says 'a boy born on Tuesday'", Z is the set "your friend says 'a boy born on Monday' or 'a girl born on Monday' or 'a boy born on Tuesday', etc". X is "your friend has two boys" and, to me, it seems like a very reasonable prior to assume X and Z are independent.

I am planning a blog post of my own on this topic sometime soon.

## Post a Comment

<< Home