Two non-rating-based handicap systems

for table tennis or other racket sports

Posted by Lense Swaenen on May 24, 2022 · 11 mins read

A Dutch summary of this post can be found here.

This blogpost contains a little bit of game design that I’ve used in my table tennis youth coaching sessions. This is nicely at the crossroads of some distinct interests of mine:

  • mathematics/game theory,
  • board games and game design,
  • sports, racket sports and in particular table tennis

What is a game?

Let’s start of with some unneccesary digression (inspired by a podcast of unnecessary detail?) into the fascinating art of defining ‘trivial’ concepts (like how would you define ‘time’?). Let’s look at what makes a game, and how do games distinguish themselves from toys, puzzles, activities and sports. This is also a regular subject of (playful) debate among the Dice Tower board game reviewers, when discussing games like Telestrations.

Unable to dig up some good graphic I once encountered on the web, I think definitions should be something like:

  • All concepts share an element of leasure and entertainment
  • Sports commonly have an element of physical activity (I’d be one to argue chess is a game, not a sport)
  • Games and puzzles might distinguish from activities by having a goal. I think having a goal often goes hand in hand with being skill-based too.
  • Games distinguish from puzzles in being mostly competitive and/or having an element of randomness for replayability in case of a solo or cooperative game. Once a puzzle is solved, it is solved and replaying is not very interesting.
  • The game element to sports depends on whether considering competitive sports, or just physical activity like running
  • Don’t know yet how to incorporate the aspect of ‘rules’, which is the main separator between games and toys (remember the pie face challenge?).

Challenging the definitions are ‘things’ like Telestrations (game or activity?), solitaire and cooperative games like escape room games (games or puzzles?), darts (dexterity game or sport?). Perhaps I should make my own decision tree graphic at some point.

Anyways, after much ado, we establish that games are typically skill-based and competitive. And table tennis as a competitive sport is definitely a game.

What is a good game?

The next question: what makes a ‘good’ game? I’m convinced that part of the reason we are now in the golden era of board games, is because of much improved game design. Two features to many modern board games are

  • Avoid player elimination
  • Have a predictable (and limited) play time

Games like Carcasonne and Catan fulfill these features, classics like Risk and Monopoly do not. These are ideas I actively use as a table tennis coach. When playing the table tennis game ‘around the table’, I generally prefer the pupils to keep track of penalty points, rather than have player elimination. The critical aspect to game design that I’ve been homing in on, is the balancing act of making a game balanced yet rewarding. With ‘balanced’, I mean that all players should have a good chance of winning. Still, the game should reward skill and more importantly effort. We can make a game very balanced by assigning a random winner, but there would be no incentive for players to play their best.

The challenge for me as a table tennis coach is that we have quite a spread of skill levels in our youth group, skill levels are not accurately quantified, and quantifying skill level subjectively can be a sensitive matter. Most adult table tennis players play regular competitive matches, which gives them a pretty accurate rating. Those ratings can be used to balance games by using a handicap, like those that can be found formal handicap tables (perhaps something to write about in the future too). A lot of beginner youth have never played competitively and don’t have a formal rating.

This post presents two methods of ‘handicap’ I’ve come up with to make a game of table tennis between two players of very different skill levels more interesting then under standard rules.

So let this be the segue into the standard rules for a table tennis match, and what that means for skill level differences. Table tennis matches are typically played in sets: First to 11 wins a set. First to 3 sets wins the match. (We ignore the two point difference needed to win a set). This has two benefits versus playing a single point to determine the winner:

  • It makes for an enjoyable match length of 15-30 minutes
  • It reduces the randomness of whoever wins, by amplifying skill level differences (like in the law of large numbers), which is a good thing when players are closely matched.

Why not play a single big set (for example first to 50 points) then? I believe that amplifies skill differences too much / reduces randomness too much. In other words, upsets, which make sports exciting for the audience, would be too rare. And a big set might be more difficult to track without a score board.

A mathematical model

Let’s make a mathematical model of a single set, to show how multiple points amplify skill level differences. We assume the skill level difference between two players results in a probability $p_A$ that player $A$ wins a point and a probability $p_B = 1 - p_A$ that player $B$ wins a point. A single point is modeled by a Bernoulli distribution. We assume that every point is identical (no serve advantage) and independent. The probability of winning a set (or our simplified concept of it) can be calculated as a function of the probability of winning a point. If we play 21 points, one and only one of the two players will have 11 or more points and will have won the set. The sum of 21 Bernoulli distributions give a Binomial distribution:

\[p^{set}_A = P\left(\sum_{i=1}^{21} X_i \geq 11\right)\]

If we call

\[Y = \sum_{i=1}^{21} X_i\]

we can rewrite

\[p^{set}_A = P(Y \geq 11) = \sum_{p=11}^{21} P(Y = p)\]

$P(Y = p)$ can be rewritten into a formula with some binomial coefficient and powers of $p_A$ and $(1-p_A)$. However, we don’t bother, because we can use the scipy.stats toolbox which has cdf (cumulative density function) routines for many random distributions, including the Bernoulli one.

%matplotlib notebook

import numpy as np
import matplotlib.pyplot as plt

import scipy.stats

A quick Monte Carlo check that scipy.stats.binom.cdf does as expected:

r = np.random.rand(1000000, 21) 
np.sum(np.sum(r <= 0.6, axis=1) >= 11)/r.shape[0]
0.825433
1 - scipy.stats.binom.cdf(10.5, 21, 0.6)
0.8256221336382272

We plot the probability of winning a set versus the probability or winning a single point

ps_point = np.linspace(0, 1, 50)    

plt.figure()
plt.plot(ps_point, [1 - scipy.stats.binom.cdf(0, 1, p) for p in ps_point], '--', label='single point')
plt.plot(ps_point, [1 - scipy.stats.binom.cdf(1, 3, p) for p in ps_point], '--', label='2-point set')
plt.plot(ps_point, [1 - scipy.stats.binom.cdf(10, 21, p) for p in ps_point], '--', label='11-point set')
plt.xlabel('Single point probability p_A')
plt.legend()
plt.gca().set_aspect('equal')

The skill level difference amplification is very clear. We can have two levels of this to add the probability to win a match too.

ps_point = np.linspace(0, 1, 50)    

plt.figure()
plt.plot(ps_point, [1 - scipy.stats.binom.cdf(0, 1, p) for p in ps_point], '--', label='single point')
ps_set = [1 - scipy.stats.binom.cdf(10, 21, p) for p in ps_point]
plt.plot(ps_point, ps_set, '--', label='11-point set', color='C2')

ps_match = [1 - scipy.stats.binom.cdf(2, 5, p) for p in ps_set]
plt.plot(ps_point, ps_match, '--', label='match', color='C3')

plt.xlabel('Single point probability p_A')
plt.legend()
plt.gca().set_aspect('equal')

ps_point[20], ps_match[20]
(0.4081632653061224, 0.054253275897441666)

The amplification is huge. If you have a 40% chance of winning a single point, you have only a 5% chance of winning a match. This also sheds a different light onto the existing ratings to me. Table tennis ratings, due to the many matches played over the course of a single season (when compared to tennis for example) are pretty accurate. Subjectively, I’d say you have about a 70% win probability against one rating lower, and a 85% win probability against two ratings lower. I could try to analyse this more quantatively from the table tennis database which is publicly accessible (as already done in this blog post). Such a 70% match win probability corresponds to only a 53% point win probability. From that point of view the differences between players are in fact tiny, but they are real! I also see some relationship to the classical racketlon discussion of which sport is most beneficial to have as a best sport (such that you can create the biggest possible point gap in your sport).

ps_match[26], ps_point[26]
(0.7028585920087836, 0.5306122448979591)

Handicap methods

Thanks for bearing with me, as we make it to the central point: I present 2 scoring/match variants that do have extended multi-point play, but counter the skill level amplification. In their ‘vanilla’ form, they both reduce the match win probability to the point win probability, and therefore act as a handicap system. Not a perfect one that yields 50-50 probabilities though. I haven’t found any such (workable) method yet.

Handicap method 1

The first handicap method uses a deck of cards (or part of it), which gets shuffled before the match. The deck of cards is put upside down on the table, close to the net where it doesn’t affect play. Players play points and after every scored point, the winner of that point gets to flip the top card of the deck. The player who flips the ‘ace of spades’ is the player who wins the match.

The probability of winning the match in this system is exactly equal to the probability of winning individual points, as the outcome of the match is determined by a single point. The trick is that it is unknown to both players which point that will be. Therefore, every point still retains the incentive for both player to play their best. The main ‘downside’ to this system, is the match length become much more variable, and can be over after a single point. A classical table tennis set has at least 11 points, and at most (without the two-point difference rule) 21 points, so the average is 16. We can match the average set length by playing with 31 playing cards, but the spread is from 1 point to 31 points. Usually, I use half a deck of cards: 26 cards. My poor man’s way of countering this is to be the game runner who inserts the ‘ace of spades’ randomly into the deck with a bias away from the top of the deck… If my pupils were to find out though, that might lower the effort incentive on the first few points.

Handicap method 2

Handicap method 2 tries to counter this variable play length, at the cost of some other downsides.

In handicap method 2, players play a fixed number of 20 points, for example for a score of 14-6 between Alice and Bob. Then, at the very end, the person with the highest number of points (Alice with 14 in this case), rolls a 20-sided die (preferably a big one, like pictured below). If the die roll is lower or equal to 14 Alice wins, otherwise Bob wins.

The probability of winning equals the expected ratio of points scored, which again equals the win probability of individual points. In this system, always 20 points are played per set. Where the length of the set can be easily tuned with the deck of cards, here we are bound the commonly available dice, or we need to use a phone to draw our random number (which Google Search can do when you search for random number. The main downside however is that my pupils don’t really like this system in practice, because they can end a set having scored more points, and still lose by the die roll, which seems to feel unfair. Also a die roll of 20 gives the feeling that the whole set could just as well not have been played.

Overall, handicap method 1 seems to be preferred by my pupils. The 20-sided die has gained some infamousy.

I haven’t found any nice way to modify either method to balance it even more (so reducing win probabilities below the single point win probability). The die roll method is probably the easiest one to balance even more: Say the score after $M = 20$ points is 14-6. Add a constant $N$, like $N = 10$, to both, such that we get 24-16. Now let Alice draw a random number from 1 to 40 (the Google search way is probably easiest), and if a number $\leq 24$ pops up, Alice wins. This is equivalent to having the match start with a 10-10 score and playing till the sum is 40. The win probability $p_A$ will be modified to

\[p_A^{mod} = \frac{M p_A + N}{M + 2*N} = \frac{M}{M+2N} p_A + \frac{2N}{M+2N} \frac{1}{2} = \lambda p_A + (1-\lambda)\frac{1}{2}\]

So the final win probability is a homotopy between $p_A$ and 50%, with $\lambda \to 0$ for small $N$ and $\lambda \to 1$ for large $N$, which means $p_A^{mod} \to p_A$ and $p_A^{mod} \to \frac{1}{2}$ respectively. A downside of this system, is that it leaves a non-zero win probability when no effort is made whatsoever. Even the strongest traditional handicap system (starting with up to 9-0 set score) does not have that. The modification with adding $N$ to both, reduces the effort incentive a lot.

Closing remarks

I have no idea if these concepts are novel or not. I have not encountered them anywhere, but have also not thoroughly researched them. I’d be very interested to hear about similar methods, in particular methods that work even better!


Want to leave a comment?

Very interested in your comments but still figuring out the most suited approach to this. For now, feel free to send me an email.