Repeated positive-expectation gambles can have very negative actual outcomes

You shouldn’t ‘expect’ an expected value.

In this post we will explore how playing positive expected value games is not always a good idea, even if you can play them repeatedly. This runs counter to common beliefs regarding expected values. We lean on the excellent example from Ole Peters' article in Nature.

Consider a simple positive expectation game

Say you have a current wealth of $x$ and you are offered a simple gamble: we flip a fair coin and if the result is heads you gain $0.5x$ and if it is tails you lose $0.4x$. We can write this as

$$ \Delta x = \begin{cases} &\Delta x_\text{H} &= +0.5x, & p_\text{H} = 0.5 \\\
&\Delta x_\text{T} &= -0.4x, & p_\text{T} = 0.5
\end{cases} $$

The expected value (EV) of this gamble is $+0.05x$. Some questions for the reader:

  • Do you take this gamble?
  • What if you were offered the opportunity to play this game $N$ times?
  • How much would you pay to play this game $N$ times?

I think the most common answer to the first two questions would be yes, at least for risk-neutral types. This is a risky gamble, but surely, we should expect to come out ahead given enough repetitions, it is a +EV gamble after all. After a quick calculation involving the compounding of the expected returns, I believe many people would pay a fortune to be offered the opportunity to play this game repeatedly. With an initial bankroll of $x= 1$, the expected value of wealth after playing for 10 rounds would be $x\times(1.05)^{10} = 1\times 1.62 = 1.62$. After playing 100 rounds, 131.50. After playing 500 rounds, the expected wealth is 39.3 billion! So, how much would you pay to play this game 500 times?

This gamble is actually a terrible idea

Let’s see what actually happens when individuals take this gamble repeatedly.

Figure 1 Figure 1. The blue line gives the expected wealth for the number of played rounds, while the thin red lines show the wealth trajectories for 100 players as they play 500 rounds. The solid red line is the typical wealth trajectory. Note the logarithmic scale.

In the plot above, the thin red lines show the wealth trajectories for 100 individuals who have chosen to play the game for 500 rounds, all starting with $1. The thick blue line represents the trajectory of expected wealth. The solid red line represents typical wealth trajectory, with its slope being the actual growth rate of this repeated gamble.

What’s going on here? Everyone is losing wealth, even though each gamble is +EV and the expected wealth path has a positive slope and grows rapidly.

This happens because the system is not ergodic

A naïve player of this game would expect to become fabulously wealthy in time based on his computation of the expected value, but we can see that anyone that repeatedly plays this game should actually expect to go broke.

To understand why this is the case, we need to understand the concept of ergodicity. Very loosely speaking, an ergodic observable is one where the mean value for a group at any point in time is the same as the mean value for an individual over all times. Estimating the proportion of heads obtained by 1,000 people flipping a coin is equivalent to a single person flipping a coin 1,000 times, so the proportion is an ergodic observable.

An example of a non-ergodic observable would be the number of deaths obtained when playing Russian roulette. Asking 1,000 individuals to play a single round has a very different dynamic to asking one person to play 1,000 rounds in a row with the same gun. In this setting, the ensemble average is not the same as the time average.

The ordinary expected value we have computed for the gamble is an ensemble average. Due to the multiplicative dynamics of the gamble, path dependence plays an important role and wealth or even change in wealth are not ergodic observables. Therefore, the ensemble average (or space average) and the time average are not the same, and so it is not appropriate for an individual to apply the ensemble based expected value of wealth growth for decision making.

To be clear, the expected value of wealth is not wrong per se, with an infinite ensemble of players, the average wealth will indeed grow in time. But this is meaningless to an individual who does not benefit from the lucky members of the infinite ensemble who grow and stay wealthy.

How do we evaluate this gamble properly?

To understand if this gamble is a good idea or not, we need to identify an ergodic observable whose expectation value reflects the systems behaviour over time and not just over an ensemble. Under multiplicative dynamics, Peters (2015, eq. 5) shows that the rate of change in the logarithm of wealth is an ergodic observable

$$ \frac{1}{\Delta t} \text{E}[ \ln W]. $$

The fact that this quantity is ergodic means that when we compute the (ensemble) expected value, it will also reflect the behaviour of the observable in time and not just the mean value over infinite gamblers. Plugging our gamble in to this formula we obtain $$ \frac{1}{\Delta t} \text{E}[ \ln W] = \frac{1}{2}\ln{1.5} + \frac{1}{2}\ln{0.6} \approx -0.053 $$

Raising $e$ to this value allows us to calculate the exponential growth rate, and shows us that an individual is expected to lose ~5.1% per round on average over time (1-0.051=0.949 is the slope of the thick red line in figure 1), therefore it is unwise to repeatedly take this gamble. Contrast this with the ‘expected’ +5% increase in wealth using the ordinary expected value.

Using the Kelly criterion

Assuming one has the prospect of being able to make many bets in time, the Kelly criterion can be used to estimate what proportion of wealth should be staked on any one gamble to maximise your long-term wealth. For simple gambles, the Kelly fraction, or the proportion of your current wealth that should be bet can be calculated as $$ f^* = p - \frac{1-p}{b} $$ where $p$ is the probability of success and $b$ is the net fractional odds received on the gamble. For our gamble, the implied odds are $$ \text{implied odds} = \frac{1.5-(1-0.6)}{1-0.6} = 2.25 $$ which is simply our return if we win divided by how much it costs to place the bet. Plugging this in to the formula above to get $f^*$ we obtain $$ f^* = 0.5 - \frac{1-0.5}{2.25-1} = 0.1. $$ Therefore we should not risk any more than 10% of our wealth in a gamble with these payoffs and probabilities. However, the gamble offered to us asks us to risk 40%, rapidly increasing the risk of ruin. The plot below gives the wealth paths of individuals who are allowed to take the same odds but bet only 10% of their wealth in each round.

Figure 1

We do not need expected utility theory

Expected utility theory is one way to approach decision making with probabilistic outcomes. However, utility theory ultimately leans on humans differing psychological responses to positive and negative outcomes as a framework for decision making. One could absolutely chose (or craft) a utility function that produces equivalent results to maximising the geometric growth rate of wealth (changes in $\log(\text{Wealth})$ is a popular utility function after all), but this skips the mathematical understanding of the system and relies on assumptions about how humans feel about the differentials in the potential outcomes, particularly that we feel the negative impact of losses more than the positive impact if we had won the same amount. How would you actually come up with a good utility function that describes your preferences? Do you think you would independently arrive at a utility function that provides good outcomes in this scenario without understanding the dynamics of wealth growth over multiple rounds?

I believe it is important to note that we do not have to invoke expected utility theory, risk-aversion, unknowable utility/loss functions, etc, to prevent us from losing money over time on this gamble, it suffices to just properly understand the dynamics.

Update on 2021-07-6: I believe this tweet from Ole Peters highlights this point well:

Evolution is clearly not maximising the utility function of successful viruses (they cannot express preferences after all). Under the multiplicative dynamics of virus replication, the force of evolution that optimises time-average growth rates ensures a virus’s success.

Lessons

The ordinary expected value we have computed for the gamble is an ensemble average, so the blue ‘expected growth’ line in figure 1 gives the average wealth over an infinite number of players, not what an individual player should expect. In this multiplicative, non-ergodic setting, the ‘expected’ in expected value is a complete misnomer.

It is thus not always wise to make repeated +EV decisions if the system is not ergodic or if there is a risk of ruin. However, this consideration does not seem to be widely understood or widely known:

If a game is ‘favorable’ from the point of view of the expectation value and you have the choice of repeating it many times, then it is wise to do so.

H. Chernoff and L. E. Moses. Elementary Decision Theory.

Clearly, this statement is false.

Things to take away

  • Understand if your system is ergodic. Is the time average of the observable equal to the ensemble average? (It probably isn’t).
  • Don’t blindly optimise for expected value or ensemble averages, you might end up with terrible outcomes.
  • You do not benefit from the positive outcomes observed by the other versions of you in ‘parallel universes’.

More

If you are interested in the ideas discussed in this post, gambles, expected values, ergodicity, utility theory and decision theory, check out these excellent papers:

and the blog:

Also check out

Next
Previous

Related