Like the 2021 post, we compute what would have been the optimal team for Megatombike 2022, a local fantasy cycling competition.

This year, I didn’t get round to making a post on how I assembled my team. For the better, in hindsight, as it performed terribly. I hoped to get a team of overachievers as a combination of

up-and-coming talents who are still undervalued (like De Lie, Pidcock and Evenepoel)
former big riders who had a poor season last year (Froome, Dumoulin, …)

Especially the latter turned out to be a bad idea. Also, out of the expensive, high scoring riders, I kicked out Van Aert to make room for Alaphilippe at the last moment…

Anyways, on to more succesful teams. The winner of this year, once again, was Hala Die Mannschaft with a score of 2435, this year by a margin of almost 100 points over the second team. Very impressive!

Rider	Points
LÓPEZ Miguel Ángel	74
JAKOBSEN Fabio	115
VAN AERT Wout	401
CAMPENAERTS Victor	74
VINE Jay	74
FOSS Tobias	45
POGAČAR Tadej	473
EVENEPOEL Remco	394
VALVERDE Alejandro	176
MOHORIČ Matej	125
LAMPAERT Yves	22
MARTÍNEZ Daniel Felipe	191
SHEFFIELD Magnus	80
VADER Milan	0
DE LIE Arnaud	191

Note that the overall scores this year are about 60% higher than last year, because of an extended calender incorporating many smaller races

The optimal team

I have copied some math and code from previous year to show the mathematical model and how to solve that using the combination of the Pyomo optimization modelling languange along with open-source MILP (mixed integer linear programming) solver CBC. However, we have a new kid on the block as well, stay tuned!

Mathematical model

Let us repeat the mathematical model once more:

\[\begin{align*} \text{maximize}_{x_i} \quad & \sum_{i=1}^n p_i x_i \\ \text{subject to} \quad & \sum_{i=1}^n c_i x_i \leq C \\ & \sum_{i=1}^n x_i = N \\ & x_i \in \{0,1\} \end{align*}\]

where $p_i$ is the number of points collected by a rider, $c_i$ is the cost of a rider, $C$ is the total team cost limit (= 1.750.000), and $N$ is the total number of riders (= 15). $x_i$ is a binary variable which either selects or does not select rider $i$.

So, we are maximizing the total number of points, given all the constraints.

Constraint $\sum_{i=1}^n x_i = N = 15$ is how this problem differs from the standard 0-1 knapsack problem.

Data

%matplotlib notebook

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_csv(r"C:\Users\swaenenl\OneDrive - Sioux Group B.V\Documents\Personal\website\notebooks\megatombike\2022_results\2022_results\riders.csv", delimiter=';')

df.head(15)

	id	name	price	points	id.1	name.1	selected
0	527	POGAČAR Tadej	437000	473	18	UAE Team Emirates	1.0
1	322	VAN AERT Wout	320000	401	11	Jumbo-Visma	1.0
2	395	EVENEPOEL Remco	152500	394	14	Quick-Step Alpha Vinyl Team	1.0
3	328	VINGEGAARD Jonas	144000	247	11	Jumbo-Visma	1.0
4	113	VLASOV Aleksandr	174000	213	4	BORA - hansgrohe	0.0
5	315	LAPORTE Christophe	96500	213	11	Jumbo-Visma	1.0
6	96	HIGUITA Sergio	87500	195	4	BORA - hansgrohe	1.0
7	561	VAN DER POEL Mathieu	260000	195	19	Alpecin-Fenix	0.0
8	219	MARTÍNEZ Daniel Felipe	75000	191	8	INEOS Grenadiers	1.0
9	335	DE LIE Arnaud	15000	191	12	Lotto Soudal	1.0
10	189	KÜNG Stefan	121500	187	7	Groupama - FDJ	1.0
11	61	BILBAO Pello	102500	187	3	Bahrain - Victorious	1.0
12	548	MERLIER Tim	144000	181	19	Alpecin-Fenix	0.0
13	383	VALVERDE Alejandro	159500	176	13	Movistar Team	0.0
14	434	MATTHEWS Michael	150000	153	15	Team BikeExchange - Jayco	0.0

The csv file, once again delivered to me by organizer Bob, has the riders already sorted according to their points. The table does have some duplicates which we need to fix. We see 2 Belgian riders in the top 3! I believe one of the reasons of some of my Megatombike successes around 2012 were due to me not having a bias towards Belgian riders unlike most MTB participants. Back then, that was an advantage. Nowadays, not so much!

df = df.drop_duplicates(['name'])

df.sort_values('points', ascending=False).head(15).sum()

id                                                       4826
name        POGAČAR TadejVAN AERT WoutEVENEPOEL RemcoVINGE...
price                                                 2439000
points                                                   3597
id.1                                                      169
name.1      UAE Team EmiratesJumbo-VismaQuick-Step Alpha V...
selected                                                  7.0
dtype: object

Summing the values and points of the best 15 riders gives a team that scores 3597 points, but is way too expensive (2.4 million). Still, this doesn’t exceed the 1.75 million budget as much as last year (then it was 2.85 million). And in fact, starting from the top, we would be able to buy the 9 highest scoring riders, for a point total of 2522. Enough to beat out Hala Die Mannschaft!

df.sort_values('points', ascending=False).head(9).sum()

id                                                       2876
name        POGAČAR TadejVAN AERT WoutEVENEPOEL RemcoVINGE...
price                                                 1746500
points                                                   2522
id.1                                                      100
name.1      UAE Team EmiratesJumbo-VismaQuick-Step Alpha V...
selected                                                  5.0
dtype: object

Pyomo + CBC solve

import pyomo.environ as pyo

points = {}
cost = {}
count = {}
dups = []
for i, el in df.iterrows():
    name = el['name']
    
    points[name] = el['points']
    cost[name] = el['price']
    count[name] = 1.

cost_limit = 1750000
count_required = 15

M = pyo.ConcreteModel()
M.ITEMS = pyo.Set(initialize=list(points.keys()))
M.x = pyo.Var(M.ITEMS, within=pyo.Binary)

M.points = pyo.Objective(expr=sum(points[i]*M.x[i] for i in M.ITEMS), sense=pyo.maximize)
M.weight = pyo.Constraint(expr=sum(cost[i]*M.x[i] for i in M.ITEMS) <= cost_limit)
M.count = pyo.Constraint(expr=sum(count[i]*M.x[i] for i in M.ITEMS) == count_required)

Let’s run this thing!

opt = pyo.SolverFactory("cbc.exe")
results = opt.solve(M)

print(results)

Problem: 
- Lower bound: -inf
  Upper bound: inf
  Number of objectives: 1
  Number of constraints: 2
  Number of variables: 963
  Sense: unknown
Solver: 
- Status: ok
  Message: CBC 2.10.3 optimal, objective -3263; 0 nodes, 19 iterations, 0.025 seconds
  Termination condition: optimal
  Id: 0
  Error rc: 0
  Time: 0.07400059700012207
Solution: 
- number of solutions: 0
  number of solutions displayed: 0

It never ceases to amaze how fast MILP solvers can solve (with guaranteed optimality) these NP-hard problems. If we take the point of view that the number of candidate solutions is dictated by choosing a combination of 15 elements out of 963, we can use the scipy implementation of $N$-choose-$k$ to get an impression of the size of the solution space and the NP-hardness of it:

import scipy.special

scipy.special.comb(963, 15)

3.893159932009228e+32

The results structure output reports a computation time of ~65 milliseconds to find the optimal solution among $3.89 \times 10^{32}$ candidate solutions. Of course, it does so much faster than brute forcing its way through all those candidates.

%%timeit
opt = pyo.SolverFactory("cbc.exe")
results = opt.solve(M)

111 ms ± 412 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

If we do a custom timing of the solve, that computation time doubles to 120-130 milliseconds. CBC is included in Pyomo as an executable and the interfacing is file-system-based: Pyomo writes to files which get loaded by cbc.exe (which you have to put somewhere in the path, e.g. in the same directory as the Jupyter notebook I was using.

But you were interested in the optimal team, right?! Here goes:

df['selected'] = [v.value for v in M.component_data_objects(pyo.Var)]
df[df['selected'] > 0.5].sort_values('points', ascending=False)

	id	name	price	points	id.1	name.1	selected
0	527	POGAČAR Tadej	437000	473	18	UAE Team Emirates	1.0
1	322	VAN AERT Wout	320000	401	11	Jumbo-Visma	1.0
2	395	EVENEPOEL Remco	152500	394	14	Quick-Step Alpha Vinyl Team	1.0
3	328	VINGEGAARD Jonas	144000	247	11	Jumbo-Visma	1.0
5	315	LAPORTE Christophe	96500	213	11	Jumbo-Visma	1.0
6	96	HIGUITA Sergio	87500	195	4	BORA - hansgrohe	1.0
8	219	MARTÍNEZ Daniel Felipe	75000	191	8	INEOS Grenadiers	1.0
9	335	DE LIE Arnaud	15000	191	12	Lotto Soudal	1.0
10	189	KÜNG Stefan	121500	187	7	Groupama - FDJ	1.0
11	61	BILBAO Pello	102500	187	3	Bahrain - Victorious	1.0
21	445	ARENSMAN Thymen	22000	134	16	Team DSM	1.0
29	226	RODRIGUEZ Carlos	57500	120	8	INEOS Grenadiers	1.0
31	252	MEINTJES Louis	41500	117	9	Intermarché - Wanty - Gobert Matériaux	1.0
35	507	AYUSO Juan	40000	110	18	UAE Team Emirates	1.0
41	247	HERMANS Quinten	30000	103	9	Intermarché - Wanty - Gobert Matériaux	1.0

df[df['selected'] > 0.5].sum()

id                                                       4464
name        POGAČAR TadejVAN AERT WoutEVENEPOEL RemcoVINGE...
price                                                 1742500
points                                                   3263
id.1                                                      159
name.1      UAE Team EmiratesJumbo-VismaQuick-Step Alpha V...
selected                                                 15.0
dtype: object

This team would have scored a whopping 3263 points: more than 800 points more than the winner. Easier said than done, of course!

A little bit more analysis: We saw before how the total price of the top 15 scorers doesn’t exceed the budget as much as last year. Now we see that the optimal team has 9 out of the 11 highest scoring riders in it. Only numbers 4 and 7 are not selected: Vlasov and van der Poel.

These observations may indicate that riders are relatively cheap this year, and one can buy almost all of the highest scoring riders. Once you get to the point that you can buy all 15 highest scoring riders with the given budget, the knapsack aspect of the fantasy cycling competition becomes a bit moot, and it becomes only about predicting who will be the highest scoring riders.

However, if one looks at the 15 most expensive riders this season, only 2 of them make it to the optimal team, and there is quite a discrepancy between rider prices and scores. In fact, the 15 most expensive riders together would have scored in total 2226 points, which would not even have won the 2022 MTB competition!

Anyways, I’m not sure whether this all means I should recommend the organisers to make the budgets tighter again, or not…

df.sort_values(['price'], ascending=False).head(15)

	id	name	price	points	id.1	name.1
0	527	POGAČAR Tadej	437000	473	18	UAE Team Emirates
16	318	ROGLIČ Primož	377000	148	11	Jumbo-Visma
1	322	VAN AERT Wout	320000	401	11	Jumbo-Visma
147	385	ALAPHILIPPE Julian	280000	32	14	Quick-Step Alpha Vinyl Team
7	561	VAN DER POEL Mathieu	260000	195	19	Alpecin-Fenix
30	505	ALMEIDA João	219000	118	18	UAE Team Emirates
541	208	BERNAL Egan	193000	0	8	INEOS Grenadiers
22	209	CARAPAZ Richard	182500	133	8	INEOS Grenadiers
92	504	ACKERMANN Pascal	175500	47	18	UAE Team Emirates
4	113	VLASOV Aleksandr	174000	213	4	BORA - hansgrohe
141	64	COLBRELLI Sonny	161000	33	3	Bahrain - Victorious
20	237	YATES Adam	160500	135	8	INEOS Grenadiers
13	383	VALVERDE Alejandro	159500	176	13	Movistar Team
55	185	GAUDU David	154000	76	7	Groupama - FDJ
98	297	WOODS Michael	153000	46	10	Israel - Premier Tech

df.sort_values(['price'], ascending=False).head(15).sum()

id                                                       4818
name        POGAČAR TadejROGLIČ PrimožVAN AERT WoutALAPHIL...
price                                                 3406000
points                                                   2226
id.1                                                      170
name.1      UAE Team EmiratesJumbo-VismaJumbo-VismaQuick-S...
selected                                                  0.0
dtype: object

A final bit of analysis: In 2021, the highest team scored 1564 points, while the maximum achievable was 2223. The ratio of these two numbers is 70%. In 2022, we have a ratio of $2435/3263 \approx 75\%$. So there seems to be some consistency on how optimal you have to be to win MTB.

Auxiliary

Cheapest team that could have won

We can also turn the knapsack problem on its head: What is the cheapest team, that could have won (= total score of 2436 or more). In the code, an objective and constraint need to be swapped.

cost_limit = 1750000
count_required = 15

M = pyo.ConcreteModel()
M.ITEMS = pyo.Set(initialize=points.keys())
M.x = pyo.Var(M.ITEMS, within=pyo.Binary)

M.points = pyo.Constraint(expr=sum(points[i]*M.x[i] for i in M.ITEMS) >= 2436+1)
M.weight = pyo.Objective(expr=sum(cost[i]*M.x[i] for i in M.ITEMS), sense=pyo.minimize)
M.count = pyo.Constraint(expr=sum(count[i]*M.x[i] for i in M.ITEMS) == count_required)

opt = pyo.SolverFactory("cbc.exe")
results = opt.solve(M)
print(results)

Problem: 
- Lower bound: -inf
  Upper bound: inf
  Number of objectives: 1
  Number of constraints: 2
  Number of variables: 963
  Sense: unknown
Solver: 
- Status: ok
  Message: CBC 2.10.3 optimal, objective 908000; 0 nodes, 50 iterations, 0.072 seconds
  Termination condition: optimal
  Id: 0
  Error rc: 0
  Time: 0.11903929710388184
Solution: 
- number of solutions: 0
  number of solutions displayed: 0

df['selected'] = [v.value for v in M.component_data_objects(pyo.Var)]
df[df['selected'] > 0.5].sort_values('points', ascending=False)

	id	name	price	points	id.1	name.1	selected
2	395	EVENEPOEL Remco	152500	394	14	Quick-Step Alpha Vinyl Team	1.0
3	328	VINGEGAARD Jonas	144000	247	11	Jumbo-Visma	1.0
5	315	LAPORTE Christophe	96500	213	11	Jumbo-Visma	1.0
6	96	HIGUITA Sergio	87500	195	4	BORA - hansgrohe	1.0
8	219	MARTÍNEZ Daniel Felipe	75000	191	8	INEOS Grenadiers	1.0
9	335	DE LIE Arnaud	15000	191	12	Lotto Soudal	1.0
11	61	BILBAO Pello	102500	187	3	Bahrain - Victorious	1.0
21	445	ARENSMAN Thymen	22000	134	16	Team DSM	1.0
29	226	RODRIGUEZ Carlos	57500	120	8	INEOS Grenadiers	1.0
31	252	MEINTJES Louis	41500	117	9	Intermarché - Wanty - Gobert Matériaux	1.0
35	507	AYUSO Juan	40000	110	18	UAE Team Emirates	1.0
36	312	KOOIJ Olav	41500	109	11	Jumbo-Visma	1.0
41	247	HERMANS Quinten	30000	103	9	Intermarché - Wanty - Gobert Matériaux	1.0
53	228	SHEFFIELD Magnus	2500	80	8	INEOS Grenadiers	1.0
78	980	POZZOVIVO Domenico	0	55	9	Intermarché - Wanty - Gobert Matériaux	1.0

df[df['selected'] > 0.5].sum()

id                                                       4946
name        EVENEPOEL RemcoVINGEGAARD JonasLAPORTE Christo...
price                                                  908000
points                                                   2446
id.1                                                      151
name.1      Quick-Step Alpha Vinyl TeamJumbo-VismaJumbo-Vi...
selected                                                 15.0
dtype: object

The cheapest team that could have won, costs 908.000 euro, which means that only about 50 percent of the budget is used! This team shares 12 riders with the optimal team, and swaps out

Pogacar
Van Aert
Kung

for

Kooij
Sheffield
Pozzovivo

This corroborates the discrepancy between prices and scores this year: Many riders that scored high were quite cheap. And vice versa many expensive riders scored below par. Only Pogacar and Van Aert lived up to the expectations.

Worst team

Finally, the worst team is a trivial problem: just select 15 (cheap?) riders with no points. One way to have a more interesting worst team, could be to set a lower limit on the team cost, e.g. 1.745.000 euro.

count_required = 15

M = pyo.ConcreteModel()
M.ITEMS = pyo.Set(initialize=points.keys())
M.x = pyo.Var(M.ITEMS, within=pyo.Binary)

M.points = pyo.Objective(expr=sum(points[i]*M.x[i] for i in M.ITEMS), sense=pyo.minimize)
M.weight = pyo.Constraint(expr=sum(cost[i]*M.x[i] for i in M.ITEMS) >= cost_limit - 5000)
M.count = pyo.Constraint(expr=sum(count[i]*M.x[i] for i in M.ITEMS) == count_required)

opt = pyo.SolverFactory("cbc.exe")
results = opt.solve(M)
print(results)

Problem: 
- Lower bound: -inf
  Upper bound: inf
  Number of objectives: 1
  Number of constraints: 2
  Number of variables: 963
  Sense: unknown
Solver: 
- Status: ok
  Message: CBC 2.10.3 optimal, objective 70; 0 nodes, 19 iterations, 0.031 seconds
  Termination condition: optimal
  Id: 0
  Error rc: 0
  Time: 0.0669410228729248
Solution: 
- number of solutions: 0
  number of solutions displayed: 0

df['selected'] = [v.value for v in M.component_data_objects(pyo.Var)]
df[df['selected'] > 0.5].sort_values('points', ascending=False)

	id	name	price	points	id.1	name.1	selected
147	385	ALAPHILIPPE Julian	280000	32	14	Quick-Step Alpha Vinyl Team	1.0
232	235	VIVIANI Elia	146500	14	8	INEOS Grenadiers	1.0
245	396	HONORÉ Mikkel Frølich	115000	11	14	Quick-Step Alpha Vinyl Team	1.0
308	99	KELDERMAN Wilco	109500	6	4	BORA - hansgrohe	1.0
348	109	SCHACHMANN Maximilian	137500	4	4	BORA - hansgrohe	1.0
391	401	MASNADA Fausto	78500	1	14	Quick-Step Alpha Vinyl Team	1.0
413	305	DUMOULIN Tom	127500	1	11	Jumbo-Visma	1.0
419	17	PARET-PEINTRE Aurélien	88000	1	1	AG2R Citroën Team	1.0
541	208	BERNAL Egan	193000	0	8	INEOS Grenadiers	1.0
542	508	BENNETT George	86500	0	18	UAE Team Emirates	1.0
544	784	ZAKARIN Ilnur	71500	0	29	Gazprom - RusVelo	1.0
545	78	POELS Wout	80000	0	3	Bahrain - Victorious	1.0
546	389	CATTANEO Mattia	69500	0	14	Quick-Step Alpha Vinyl Team	1.0
548	47	MOSCON Gianni	85000	0	2	Astana Qazaqstan Team	1.0
605	167	PADUN Mark	80000	0	6	EF Education-EasyPost	1.0

df[df['selected'] > 0.5].sum()

id                                                       3902
name        ALAPHILIPPE JulianVIVIANI EliaHONORÉ Mikkel Fr...
price                                                 1759500
points                                                     71
id.1                                                      142
name.1      Quick-Step Alpha Vinyl TeamINEOS GrenadiersQui...
selected                                                 15.0
dtype: object

Last year, we could find a team that used all the money and scored no points. This year, if we spent nearly all the money, we’re still getting about 70 points. Julien Alaphilippe and Tom Dumoulin are two of the worst riders one could have chosen… which I did…

Scipy solution

This final section is for those interested in the mathematical optimization aspect.

I feel like there is quite a lot of movement in the MILP solver landscape. In the land of commercial solvers, some new, mostly Chinese, solvers are speeding their ways up to the tops of the benchmarks. And some other contenders are pulling out of the benchmarks due to some Gurobi scandal. The developers of the Chinese COPT seem to be confident that they will overtake Gurobi in the next years: https://www.youtube.com/watch?v=iqiBXoJQVD8&t=258s

In open source land, HiGHs is the big newcomer. A major development with respect to last years post, is that in the summer of 2022, with Scipy version 1.9.0, the scipy.optimize submodule got extended with a milp routine, which runs HiGHs in the backend. I think this is quite a big deal. Scipy is a much more common Python library/dependency than Pyomo. For example, the Anaconda conda-forge channel has (at the time of writing) 600k installs for Pyomo, while Scipy has 31M, which is 50 times more! Moreover you don’t need to install an open source MILP solver like CBC separately. This makes MILP solvers much more broadly accessible. The main current downside is that the interface is much more barebones, and you don’t get all the nice modeling tools (named variables etc.) that you get with Pyomo.

Below, we work out the MTB knapsack problem with scipy.optimize.milp. This is really easy to do, as the Scipy documentation already has a knapsack example: https://scipy.github.io/devdocs/tutorial/optimize.html#knapsack-problem-example All we need to add is the constraint that the number of riders needs to equal 15.

import scipy.optimize as sciopt

scipy.__version__

'1.9.3'

points_arr = np.array(list(points.values()))
cost_arr = np.array(list(cost.values()))
count_arr = np.array(list(count.values()))

bounds = sciopt.Bounds(0, 1)  # 0 <= x_i <= 1
integrality = np.full_like(points_arr, True)  # x_i are integers

cost_constraints = sciopt.LinearConstraint(A=cost_arr, lb=0, ub=cost_limit)
count_constraints = sciopt.LinearConstraint(A=count_arr, lb=count_required, ub=count_required)

res = sciopt.milp(c=-points_arr, constraints=[cost_constraints, count_constraints],
           integrality=integrality, bounds=bounds)

res.fun

-3263.0

sel = np.array(df['selected'])
np.max(np.abs(res.x - sel))

The solver finds the exact same solution with the same maximum point total of 3263.

%%timeit
res = sciopt.milp(c=-points_arr, constraints=[cost_constraints, count_constraints],
           integrality=integrality, bounds=bounds)

20.9 ms ± 716 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

If we benchmark it, we see that on this problem Scipy+HiGHs is 3 - 6 times faster than the Pyomo+CBC combination (depending on which timing number you take)! Part of it is probably due to the solver being interfaced as a normal library rather than a seperate executable running in a different process.

SCIP

Finally, I’d like to report some related news on the SCIP MILP solver, which has always dwelled a bit in between open source and commercial solvers (both in license and in benchmark results). In November 2022, SCIP switched to the fully open source Apache 2.0 license. SCIP can be used from Python through the PySCIPOpt package. This has 130k downloads through conda-forge so seems the least popular option at the moment, though that may change due to the recent license changes. I haven’t gotten round to giving it a go. I wouldn’t be surprised if it turns out to be the fastest open source solver on this knapsack problem.

Want to leave a comment?

Very interested in your comments but still figuring out the most suited approach to this. For now, feel free to send me an email.

← Previous Post Next Post →

MegaTomBike 2022 Aftermath

and the new Scipy MILP solver

The optimal team

Mathematical model

Data

Pyomo + CBC solve

Auxiliary

Cheapest team that could have won

Worst team

Scipy solution

SCIP

Want to leave a comment?

The optimal team

Mathematical model

Data

Pyomo + CBC solve

Auxiliary

Cheapest team that could have won

Worst team

Scipy solution

SCIP

Related posts

Want to leave a comment?