Population peaks, random walks and Catalan numbers

Is humanity at peak population? In other words, is the total number of humans in the world currently the highest it’s ever been?

It seems like a simple question. And the answer seems obvious too: yes.

The last significant reduction in the human population occurred during the Black Death in the 1350s. Since then, it has been rising without interruption. It’s risen especially quickly in the last century, from about 2.5 billion in 1950, to 8 billion by 2022. And although the growth rate is now slowing, the population is still rising: it’s estimated at about 8.1-8.2 billion as of February 2025. The UN projects that it will continue to rise until it reaches over 10 billion in the late 21st century, at which point it will start to decline. But clearly, given that we’re still on the upward slope, the population right now is bigger than it’s ever been.

That’s certainly true on a large scale. If we were to take a look at the current population estimate every day this year, the number would always be higher than the previous day. But it’s not so simple on the small scale.

Imagine that we could know, precisely, every time a person was born or died anywhere in the world. In other words, imagine that the Worldometer tally wasn’t an estimate, but a realtime monitor with perfect accuracy. At any given moment, would that tally be at its all-time peak? Well, every time someone dies and the tally decreases by one, the population is not at its peak: it’s one below the previous value. During the brief interval before another birth occurs, the human population is not at its all-time peak. If the next event is another death, it’ll be at least two more events before the population is at peak again. So, considering the population at unit precision, even though births happen more frequently than deaths and the population is rising overall, for a significant proportion of the time it’s not true to say that the population is currently the highest it’s ever been.

What proportion is this? A friend recently put it to me that it was possible for the population to be below peak for more than half the time and yet for it still to be rising overall. At first I considered that this was absurd, but the more I thought about the idea, the more it seemed possible. In fact, it seemed to be an interesting paradox, of the “logically possible but counterintuitive result” type, which I can’t find any reference to elsewhere.

To explain how this is possible, I’ll first try to demonstrate the counterintuitive result with a couple of examples. I’ll then attempt to derive a mathematical formula and calculate some values.

Demonstration with examples

Consider a case where the birth and death rates are equal. Every change to the population has an equal chance of being a birth or a death. In this scenario, the population isn’t rising overall: it stays the same.

What proportion of the time is the population at its peak value in this scenario? Well, every interval immediately after a death is an interval when it’s not. And this is 50% of all intervals.

However, not every interval immediately after a birth is necessarily at peak. If the previous three events were death-death-birth, then the population is not at peak, since it’s one below where it was before those three events.

There are only three other possible sequences of three events ending in a birth: birth-birth-birth, birth-death-birth and death-birth-death. Since the probability of these options are all equal, then death-death-birth accounts for 25% of the intervals ending with a birth.

Adding these together, we know the population must be below its peak value for at least 50% of the time (following every death) plus 25% of the other 50% (following every death-death-birth), so 50% + 12.5% = at least 62.5% of the time.

It’s also below peak for some non-zero proportion of the other intervals. This is because none of the other three sequences guarantees peak population at the end of it, either. If any of them were to come immediately after a sequence of four deaths, for example, then they would not be enough to return the population to its value before those deaths.

Therefore, when the population is steady, peak population occurs for some proportion of time which we have not fully calculated, but which we know is somewhat less than 37.5%.

This is significantly below 50%. It’s easy to imagine that one could change the scenario so that the birth rate, instead of being equal to the death rate, is slightly higher but only by a very tiny proportion. This would be enough to give us a situation where the population would be rising overall. But the difference from the previous scenario would be so small that the proportion of time spent at peak population would still be around 37.5%, and therefore significantly less than 50%.

In fact, let’s do that explicitly. Suppose births outweigh deaths by a ratio of 11 to 9. This means the chance of a population event being a birth is 55%, while the chance of it being a death is 45%. The probabilities of the four possible three-event sequences are now a little more complicated, but it works out that the probability of the last three events being death-death-birth is just over 11%. Add these together, and we know that at least 56% of the time, i.e. the majority of the time, the population is below its peak value even though it is rising overall.

Derivation of a formula

The situation we have been considering is something mathematicians call a random walk. In fact, it is the simplest case of a random walk: a single variable (let’s call it x) randomly incrementing by ‘+1’ or ‘-1’. If we specify that the probability of ‘+1’ is r, the probability of ‘-1’ will therefore be (1-r). What we want to know is, for a random walk of ‘+1’s and ‘-1’s, with r as the probability of a ‘+1’, what proportion of events results in the value x reaching its maximum so far?

To work this out, we need to consider the last n events in the walk. Let’s start by considering the case n=1, i.e. the single last event.

If the last event is ‘-1’, then we know that x is not at its maximum. This occurs with a probability of (1-r).

The remaining sequences, those with the last event ‘+1’, occur with a probability of r. Some will result in a maximum, and some won’t, but we don’t yet know how many. So the proportion of all sequences which don’t result in a maximum is:

    \[ (1-r) + something \]

To work out the something, we have to go back further in the sequence. Looking at the case n=2 (i.e. the last two events) doesn’t help. The four possible sequences are ‘+1+1’, ‘-1+1’, ‘+1-1’ and ‘-1-1’. The latter two have already been identified at step n=1 as resulting in a non-maximal x, so we’re not interested in those. We’re only considering the two cases which end in ‘+1’. But neither of those either prove that x has reached a maximum, or that it hasn’t. Either could have been preceded by ‘-1-1-1’, making the result non-maximal. But ‘+1+1’ could clearly reach a maximum, and ‘-1+1’ would make x equal its value before those events, which could have been a maximum.

So, we must move to n=3. We’re interested in sequences which end with ‘+1’ (i.e. weren’t already ruled out at n=1) and in which ‘-1’ occurs more often than ‘+1’, since these are the ones which will leave the value of x still below what it would have been before that sequence occurred. Of the four possibilities (‘+1+1+1’, ‘-1+1+1’, ‘+1-1+1’, ‘-1-1+1’), only one (‘-1-1+1’) fits this description.

The probability of this happening is calculated by multiplying the probabilities of the individual events by each other. Since there is one event ‘+1’ with the probability r, and two events ‘-1’ with probability (1-r), the probability of ‘-1-1+1’ is:

    \[ r(1-r)^2 \]

So, we now know that in addition to the (1-r) sequences which end in ‘-1’, the r(1-r)^2 sequences ending in ‘-1-1+1’ also result in a non-maximal value of x.

The remaining sequences, ending in ‘+1+1+1’, ‘-1+1+1’ or ‘+1-1+1’, may or may not be maximal. So at n=3 we have established that the proportion of all sequences which don’t result in a maximum is:

    \[ (1-r) + r(1-r)^2 + something \]

The case n=4 doesn’t help to identify any other non-maximal results, for the same reason that n=2 doesn’t. In fact, all cases with even n will not help us. This is because they cannot contain more ‘-1’ events than ‘+1’ events without having been ruled out at an earlier stage.

So, at n=5, we have 12 sequences to consider (four possible two-event sequences, multiplied by the three sequences we still had to consider at n=3). These sequences are (dropping the ‘1’ for clarity):

+++++
-++++
+-+++
--+++
++-++
-+-++
+--++
---++
+++-+
-++-+
+-+-+
--+-+

Two of these, ‘---++’ and ‘--+-+’, contain more ‘-1’ events than ‘+1’ events. So there are the ones we know will result in a non-maximal x. Each occurs with a probability of:

    \[ r^{(number of +1s)} * (1-r)^{(number of -1s)} \]

In this case both are r^2(1-r)^3. And because there are two of them, we add two lots of that probability to our proportion of known non-maximal sequences, which is now:

    \[ (1-r) + r(1-r)^2 + 2r^2(1-r)^3 + something \]

In fact, at every odd n, the new sequences we can identify as non-maximal will always contain k ‘+1’ events and (k+1) ‘-1’ events, where n=(2k+1). This is because, if there are more ‘+1’s than ‘-1’s, we can’t identify it as a non-maximal sequence, and if the ‘-1’s outnumber the ‘+1’s by any more than 1, that sequence will already have been identified as non-maximal when considered at an earlier n.

At every odd n, there will always be a new set of sequences that we can identify as definitely non-maximal, and some remaining sequences that we can’t. Therefore, to find the probability that any new event in a simple random walk results in a non-maximal value, we need to add the probabilities up over an infinite number of steps, looking at increasingly long odd-numbered final sub-lengths of the sequence.

At each step k, for k=0 to infinity, we need to identify the number of possible sequences of length n=(2k+1) which satisfy the following properties:
a) contains k ‘+1’ events
b) contains (k+1) ‘-1’ events
c) there is no right-most sub-sequence within the sequence (i.e. a sub-sequence which includes the end of the sequence), which contains more ‘-1’ events than ‘+1’ events

We’ve already seen that the probability of these sequences occurring at step k will be:

    \[ mr^{k}(1-r)^{k+1} \]

where m is the number of such sequences. Can we calculate the number of them?

In fact, the number of these sequences at each step k is given by the Catalan numbers. Looking at the Online Encyclopedia of Integer Sequences’s page for the Catalan numbers, several of the properties described in the comments seem very similar to the properties we’re looking for. In particular, Jon Perry’s comment from 16 Nov 2012:

“Number of sequences consisting of n ‘x’ letters and n ‘y’ letters such that (counting from the left) the ‘x’ count >= ‘y’ count. For example, for n=3 we have xxxyyy, xxyxyy, xxyyxy, xyxxyy and xyxyxy.”

Switch left for right, and ‘x’ and ‘y’ for ‘+1’ and ‘-1’, and we can see that adding another ‘-1’ to the front of each of these sequences gives us the sequences we’re looking for, where ‘-1’s outnumber ‘+1’s by exactly one, but they have never outnumbered them in any shorter sub-sequence, starting from the right.

So, the number of sequences we can identify at step k as definitely non-maximal is C(k), the kth Catalan number, and therefore the probability of these sequences occurring is:

    \[ C(k)r^{k}(1-r)^{k+1} \]

To find the probability that the random walk is not at its peak value at any given step, we therefore need to calculate the infinite sum:

    \[ \sum_{k=0}^{\infty} C(k)r^{k}(1-r)^{k+1} \]

The kth Catalan number C(k) can be calculated as:

    \[ C(k)=\frac{(2k)!}{(k+1)!k!} \]

Therefore we can also write our infinite sum as:

    \[ \sum_{k=0}^{\infty} \frac{(2k)!}{(k+1)!k!}r^{k}(1-r)^{k+1} \]

Note that this gives the probability that the total value x is not maximal. The original question we were interested in was the probability that x is maximal. Calling this p, it is simple to derive from the previous formula:

    \[ p=1-\sum_{k=0}^{\infty} \frac{(2k)!}{(k+1)!k!}r^{k}(1-r)^{k+1} \]

Now we can plug some numbers in and calculate values.

Since I have limited mathematical training, I don’t know any techniques for algebraically calculating these infinite sums. But using a spreadsheet to calculate the individual terms and sum them up to a fairly large value of k (about 80) allows me to estimate some interesting values.

rp
0.1Very small but non-zero
0.25Very small but non-zero
0.333…< 2e-7
0.5~0.061
0.55~0.189
0.60.333…
0.666…0.5
0.750.666…
0.90.888…

Conclusion

So, to answer the original question from my friend, is it possible for a population to be rising overall, but also be below its peak value for more than half the time? The answer is yes.

According to my calculations, when r is 0.5, i.e. when population is steady, population is at its peak for much less than half the time: a mere 6.1% of the time in fact. And the time at peak population seems to converge on 50% when r is 2/3, i.e. when births outnumber deaths 2:1.

So, for values of r between 1/2 and 2/3, i.e. birth to death ratios between 1:1 and 2:1, the population will be at peak for less than half the time, while also growing overall.

Only when r is greater than 2/3, i.e. the birth to death ratio is greater than 2:1, will peak population be the most common situation.

However, looking at Worldometer‘s daily and yearly birth and death tallies, it looks like the birth-to-death ratio for the global human population is currently about 2.12. That means that r is currently about 0.68, which makes p about 0.53.

Therefore, the human population is currently at its peak value more often than not – though just barely.

Addendum 1

Modelling population change as a random walk is a huge over-simplification. For one thing, my analysis of the random walk doesn’t consider time at all: just the proportion of events which result in peak or non-peak values. In using that as an analogy for how much time real population spends at a peak value, I’ve assumed that the births and deaths which correspond to the random walk’s increments are spaced out evenly over time.

I’ve also assumed that, like the random walk’s events, births and deaths in the real world occur randomly. This may well not be the case, if for example either are biased towards particular times of day or seasons of the year.

This doesn’t necessarily even itself out, either. For example, imagine a population that has 9 births and 1 death in a day. But instead of being evenly spaced out, all of these happen in quick succession immediately after midnight, and the death happens last. No further births or deaths happen for the rest of the day. In this case, the population rises, and the majority of events result in a peak. But the proportion of time spent at peak population is tiny, since for most of the day the death is the last event to have occurred.

For a more plausible example, suppose that an analysis of births and deaths by hour of the day, or month, shows that there is some clustering of births around a particular time, but that deaths are distributed more evenly. Then it seems likely that the deaths would have more effect on the time spent at peak population than the random walk analysis would suggest.

Addendum 2

I’ve found a better way of showing that the number of non-peak-determining (i.e., determining a non-peak value) sub-sequences for each odd n=2k+1 length sub-sequence of the random walk is the kth Catalan number. This is to map them to Dyck words.

A valid Dyck word in the language Dyck-1 is a properly nested sequence of brackets ‘(’ and ‘)’. To be properly nested, the number of opening and closing brackets need to be equal, and brackets must be opened, reading from the left, before they are closed. Therefore, the valid Dyck words of length 4 are ‘()()’ and ‘(())’, but not ‘)(()’.

Another way of stating the requirement for a valid Dyck word is that if you tally the occurences of ‘(’ and ‘)’ from left to right, the running tally of open brackets ‘(’ must always be equal to or greater than the tally of close brackets ‘)’. (This is similar to Jon Perry’s observation about strings of x’s and y’s mentioned above.)

So, a valid Dyck word has the following properties:
a) contains k open brackets ‘(’
b) contains k close brackets ‘)’
c) is therefore of even length 2k
d) tallying ‘(’ and ‘)’ from left to right, the number of ‘(’ is always equal to or greater than ‘)’

The sub-sequences we are interested in counting at the end of our random walk are those odd length sequences which determine a non-peak value because they contain more ‘-1’ events than ‘+1’ events. However, it doesn’t include those which have already been identified as non-peak because they contain a sub-sequence of their own with this property.

In our case, we are tallying from the right and looking for sub-sequences of length n=2k+1 which always have an equal to or greater tally of ‘+1’ events than ‘-1’ events for the first 2k events tallied, and then a final (although actually the first in the sequence) ‘-1’ event which determines that the random walk’s value is non-peak.

So, a valid sub-sequence word has the following properties:
a) contains k ‘+1’ events
b) contains k ‘-1’ events plus an extra ‘-1’ event at the end
c) is therefore of odd length 2k+1
d) tallying ‘+1’ and ‘-1’ from right to left, the number of ‘+1’s is always equal to or greater than ‘-1’s until the final ‘-1’

These sequences are therefore equivalent to Dyck words of length 2k with an extra ‘)’ added at the end. The number of Dyck words with k pairs of brackets is known to be the kth Catalan number. Therefore the number of sub-sequences which determine a non-peak value, for each n=2k+1, is the kth Catalan number.

Acknowledgements

Thanks to Ben Ross for the original claim that population peaks happen much less frequently than we think, to Chris Turtle (surprisingly, not the Sheffield academic who studies population!) for calculating the numbers of non-peak-determining sequences, and to both for the fruitful discussions which helped to develop this idea.

Leave a Reply

Your email address will not be published. Required fields are marked *