Well, that was a disappointing season of MegaTomBike…
This post is the successor to this one, posted end of March 2021, right when the deadline for team submission was for the 2021 edition of MegaTomBike. MegaTomBike is a local fantasy cycling competition. The team I had submitted was generated through a mathematical optimization approach, namely as a type of knapsack problem. A mathematical approach had proven succesful for me in previous competition, but this year alas…
The last race on the calendar was on October 9th. This post contains some aftermath. My team, Poeske Scherens Fanclub finished 62nd out of 87 teams, with a score of 1089 points. The final winning team was Hala Die Mannschaft with a score of 1564, winning out by a mere 4 points over the second placed team Bekke - Algemeen. The third team lagged quite a bit further behind. The composition of the winning team is:
| Rider | Points |
|---|---|
| Geraint Thomas | 121 |
| Egan Bernal | 132 |
| Tadej Pogačar | 315 |
| Mathieu van der Poel | 160 |
| Wout van Aert | 316 |
| Julian Alaphilippe | 226 |
| Quinn Simmons | 1 |
| Thomas Pidcock | 86 |
| Mark Cavendish | 54 |
| Remco Evenepoel | 83 |
| Tim Merlier | 47 |
| Gianni Vermeersch | 19 |
| Arjen Livyns | 0 |
| Mauri Vansevenant | 3 |
| Heinrich Haussler | 1 |
The setup of MegaTomBike is to create a team of exactly 15 riders, with a cost not more than 1.750.000 euro. The team collects points during the season; the team with the most points wins.
The team I created was optimal in some mathematical sense, namely as the solution of a knapsack-type problem, where I searched for a team of 15 riders which satisfies the cost constraint, and maximizes the total number of points. The challenge of course is that the number of points collected by the end of the season is unknown, so one needs to make a prediction / use a proxy. The proxy I used were the UCI points from the season before. Now, we can conclude this was quite a poor proxy.
As was already discussed in the final section of the preceding blog post, the costs that the organizers of MegaTomBike have assigned to riders is not random, but actually also some kind rider level determined from cycling stats from the past few seasons. This in itself is a reasonable predictor, much like the weather of today is a pretty good predictor for the weather of tomorrow. Setting the costs like this, is exactly how a competition like MegaTomBike is made interesting. In the end, my way of team creation was kind of maximizing the disparity between both predictors.
Now the season is over, we can revisit the knapsack approach and use the actual points to determine what the optimal team for this season would have been, and see how close the MegaTomBike winner managed to get. Many thanks to organizer Bob for sharing a csv file with final points and costs to facilitate this analysis (and not having to do all kinds of string matching like in the previous blog post).
Let us repeat the mathematical model once more:
\[\begin{align*} \text{maximize}_{x_i} \quad & \sum_{i=1}^n p_i x_i \\ \text{subject to} \quad & \sum_{i=1}^n c_i x_i \leq C \\ & \sum_{i=1}^n x_i = N \\ & x_i \in \{0,1\} \end{align*}\]where $p_i$ is the number of points collected by a rider, $c_i$ is the cost of a rider, $C$ is the total team cost limit (= 1.750.000), and $N$ is the total number of riders (= 15). $x_i$ is a binary variable which either selects or does not select rider $i$.
So, we are maximizing the total number of points, given all the constraints.
Constraint \(\sum_{i=1}^n x_i = N = 15\) is how this problem differs from the standard 0-1 knapsack problem.
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import re
df = pd.read_csv(r"lense_riders.txt", delimiter=';')
df['name_'] = df['search_name'].map(lambda name: re.sub('\W+',' ', name)) #Fix Ben O'Connor error...
df = df[['name_', 'value', 'points']]
Let us look at the riders who scored the most points. Fellow countryman Wout van Aert did best! On the UCI ranking, he finished second after Tadej Pogacar, but obviously the MegaTomBike ranking is much more prestigious.
df.sort_values('points', ascending=False).head(15)
| name_ | value | points | |
|---|---|---|---|
| 465 | Wout van Aert | 242000 | 316 |
| 549 | Tadej Pogacar | 324000 | 315 |
| 461 | Primoz Roglic | 400000 | 256 |
| 147 | Julian Alaphilippe | 291000 | 226 |
| 240 | Richard Carapaz | 156000 | 195 |
| 67 | Sonny Colbrelli | 140500 | 160 |
| 582 | Mathieu van der Poel | 249000 | 160 |
| 267 | Adam Yates | 143000 | 149 |
| 326 | Michael Woods | 145500 | 142 |
| 382 | Alejandro Valverde | 198500 | 136 |
| 239 | Egan Bernal | 174500 | 132 |
| 523 | Jasper Stuyven | 140000 | 130 |
| 468 | Jonas Vingegaard | 40000 | 126 |
| 78 | Matej Mohoric | 80500 | 126 |
| 256 | Richie Porte | 135000 | 124 |
Summing the values and points of the best 15 riders gives that scores 2693 points, but is way too expensive (2.85 million). But now we do know an upperbound of the maximum knapsack score: 2693 points.
df.sort_values('points', ascending=False).head(15).sum()
name_ Wout van AertTadej PogacarPrimoz RoglicJulian ...
value 2859500
points 2693
selected 8.0
dtype: object
Like before, we use Pyomo as the optimization modeling environment.
import pyomo.environ as pyo
points = {}
cost = {}
count = {}
for i, el in df.iterrows():
points[el['name_']] = el['points']
cost[el['name_']] = el['value']
count[el['name_']] = 1.
cost_limit = 1750000
count_required = 15
M = pyo.ConcreteModel()
M.ITEMS = pyo.Set(initialize=list(points.keys()))
M.x = pyo.Var(M.ITEMS, within=pyo.Binary)
M.points = pyo.Objective(expr=sum(points[i]*M.x[i] for i in M.ITEMS), sense=pyo.maximize)
M.weight = pyo.Constraint(expr=sum(cost[i]*M.x[i] for i in M.ITEMS) <= cost_limit)
M.count = pyo.Constraint(expr=sum(count[i]*M.x[i] for i in M.ITEMS) == count_required)
Let’s run this thing!
opt = pyo.SolverFactory("cbc.exe")
results = opt.solve(M)
print(results)
Problem:
- Lower bound: -inf
Upper bound: inf
Number of objectives: 1
Number of constraints: 2
Number of variables: 979
Sense: unknown
Solver:
- Status: ok
Message: CBC 2.10.3 optimal, objective -2223; 4 nodes, 90 iterations, 0.054 seconds
Termination condition: optimal
Id: 0
Error rc: 0
Time: 0.1037588119506836
Solution:
- number of solutions: 0
number of solutions displayed: 0
We got our solution with 979 variables in the blink of an eye: 0.054 seconds. Even though the knapsack problem is NP-hard, it is a common misconception that reasonably sized problems can not be solved to optimality pretty fast. More famous is the Traveling Salesman Problem of which the Traveling Sam Problem formulated by Nerdland is a fun instance with 24 variables. Even though there are $24! \approx 6.2 \times 10^{23}$ solutions, which is way too big to be brute forced, good algorithms exists which can solve a 24 city problem to (provable) optimality in less than a second. In fact, the current world record for a TSP having been solved to proven optimality stands at 57,912 stops and is a Dutch cycling route along national monuments.
So what does NP-hardness then mean? NP-hardness is a statement about asymptotic complexity behavior for the number of cities $N \to \infty$, and that in this limit, the computation times grow exponentially fast (kind of). But it might be that fast growing behavior only appears beyond $N > 10^{100}$. Not saying that this is the case for TSPs or knapsack problems, but the statement about reasonably sized problems being solvable in reasonable time stands.
Apologies for this intermezzo, as you are obviously more interested in the optimal team, so here it is:
df['selected'] = [v.value for v in M.component_data_objects(pyo.Var)]
df[df['selected'] > 0.5].sort_values('points', ascending=False)
| name_ | value | points | selected | |
|---|---|---|---|---|
| 465 | Wout van Aert | 242000 | 316 | 1.0 |
| 549 | Tadej Pogacar | 324000 | 315 | 1.0 |
| 147 | Julian Alaphilippe | 291000 | 226 | 1.0 |
| 240 | Richard Carapaz | 156000 | 195 | 1.0 |
| 67 | Sonny Colbrelli | 140500 | 160 | 1.0 |
| 267 | Adam Yates | 143000 | 149 | 1.0 |
| 78 | Matej Mohoric | 80500 | 126 | 1.0 |
| 468 | Jonas Vingegaard | 40000 | 126 | 1.0 |
| 150 | Kasper Asgreen | 114500 | 118 | 1.0 |
| 265 | Dylan van Baarle | 59000 | 103 | 1.0 |
| 18 | Ben O Connor | 34000 | 98 | 1.0 |
| 255 | Thomas Pidcock | 10500 | 86 | 1.0 |
| 66 | Damiano Caruso | 70000 | 81 | 1.0 |
| 199 | Neilson Powless | 29000 | 70 | 1.0 |
| 156 | Mark Cavendish | 2500 | 54 | 1.0 |
df[df['selected'] > 0.5].sort_values('points', ascending=False).sum()
name_ Wout van AertTadej PogacarJulian AlaphilippeRi...
value 1736500
points 2223
selected 15.0
dtype: object
This team would have made 2223 points, which is 659 points more than the current winner!
Many of the big names and big scorers are selected. The highest scorers which are not selected are interestingly Primoz Roglic and Mathieu van der Poel.
df.sort_values('points', ascending=False).head(10)
| name_ | value | points | selected | |
|---|---|---|---|---|
| 465 | Wout van Aert | 242000 | 316 | 1.0 |
| 549 | Tadej Pogacar | 324000 | 315 | 1.0 |
| 461 | Primoz Roglic | 400000 | 256 | 0.0 |
| 147 | Julian Alaphilippe | 291000 | 226 | 1.0 |
| 240 | Richard Carapaz | 156000 | 195 | 1.0 |
| 67 | Sonny Colbrelli | 140500 | 160 | 1.0 |
| 582 | Mathieu van der Poel | 249000 | 160 | 0.0 |
| 267 | Adam Yates | 143000 | 149 | 1.0 |
| 326 | Michael Woods | 145500 | 142 | 0.0 |
| 382 | Alejandro Valverde | 198500 | 136 | 0.0 |
df['efficiency'] = 1000*df['points']/df['value']
df[df['selected']>0].sort_values('efficiency', ascending=False)
| name_ | value | points | selected | efficiency | |
|---|---|---|---|---|---|
| 156 | Mark Cavendish | 2500 | 54 | 1.0 | 21.600000 |
| 255 | Thomas Pidcock | 10500 | 86 | 1.0 | 8.190476 |
| 468 | Jonas Vingegaard | 40000 | 126 | 1.0 | 3.150000 |
| 18 | Ben O Connor | 34000 | 98 | 1.0 | 2.882353 |
| 199 | Neilson Powless | 29000 | 70 | 1.0 | 2.413793 |
| 265 | Dylan van Baarle | 59000 | 103 | 1.0 | 1.745763 |
| 78 | Matej Mohoric | 80500 | 126 | 1.0 | 1.565217 |
| 465 | Wout van Aert | 242000 | 316 | 1.0 | 1.305785 |
| 240 | Richard Carapaz | 156000 | 195 | 1.0 | 1.250000 |
| 66 | Damiano Caruso | 70000 | 81 | 1.0 | 1.157143 |
| 67 | Sonny Colbrelli | 140500 | 160 | 1.0 | 1.138790 |
| 267 | Adam Yates | 143000 | 149 | 1.0 | 1.041958 |
| 150 | Kasper Asgreen | 114500 | 118 | 1.0 | 1.030568 |
| 549 | Tadej Pogacar | 324000 | 315 | 1.0 | 0.972222 |
| 147 | Julian Alaphilippe | 291000 | 226 | 1.0 | 0.776632 |
The following seven riders below proved to be among the most popular riders throughout the competition, and all followed through with excellent performances too. Five of them actually are in the optimal team too. Note by the way, how I closed my previous blog post with the observation that Mark Cavendish is a steal, but was not selected by my optimization due to him not having ridden and scored UCI points the year before. I decided not to swap him out for Dutch youngster Olav Kooij. Unfortunaly, Olav only scored 2 points, versus Cavendish’ 54 points.
popular = ['Wout van Aert', 'Mathieu van der Poel', 'Julian Alaphilippe',
'Tadej Pogacar', 'Primoz Roglic',
'Mark Cavendish', 'Thomas Pidcock']
sum(points[p] for p in popular)
1413
sum(cost[p] for p in popular)
1519000
A team consisting of all those riders would have already scored 1413 points, much better than I did, enough for a 15th spot in the ranking, and still leaving over 200.000 euro to spend. Let’s see how far we could get with a team having all these riders. In the code, we add equality constraints in which we set the binary selection variable for those riders to 1.
cost_limit = 1750000
count_required = 15
M = pyo.ConcreteModel()
M.ITEMS = pyo.Set(initialize=points.keys())
M.x = pyo.Var(M.ITEMS, within=pyo.Binary)
M.points = pyo.Objective(expr=sum(points[i]*M.x[i] for i in M.ITEMS), sense=pyo.maximize)
M.weight = pyo.Constraint(expr=sum(cost[i]*M.x[i] for i in M.ITEMS) <= cost_limit)
M.count = pyo.Constraint(expr=sum(count[i]*M.x[i] for i in M.ITEMS) == count_required)
M.popular = pyo.ConstraintList()
for p in popular:
M.popular.add(M.x[p] == 1)
opt = pyo.SolverFactory("cbc.exe")
results = opt.solve(M)
print(results)
Problem:
- Lower bound: -inf
Upper bound: inf
Number of objectives: 1
Number of constraints: 9
Number of variables: 979
Sense: unknown
Solver:
- Status: ok
Message: CBC 2.10.3 optimal, objective -1957; 0 nodes, 22 iterations, 0.035 seconds
Termination condition: optimal
Id: 0
Error rc: 0
Time: 0.09378385543823242
Solution:
- number of solutions: 0
number of solutions displayed: 0
df['selected'] = [v.value for v in M.component_data_objects(pyo.Var)]
df[df['selected'] > 0.5].sort_values('points', ascending=False)
| name_ | value | points | selected | efficiency | |
|---|---|---|---|---|---|
| 465 | Wout van Aert | 242000 | 316 | 1.0 | 1.305785 |
| 549 | Tadej Pogacar | 324000 | 315 | 1.0 | 0.972222 |
| 461 | Primoz Roglic | 400000 | 256 | 1.0 | 0.640000 |
| 147 | Julian Alaphilippe | 291000 | 226 | 1.0 | 0.776632 |
| 582 | Mathieu van der Poel | 249000 | 160 | 1.0 | 0.642570 |
| 468 | Jonas Vingegaard | 40000 | 126 | 1.0 | 3.150000 |
| 265 | Dylan van Baarle | 59000 | 103 | 1.0 | 1.745763 |
| 18 | Ben O Connor | 34000 | 98 | 1.0 | 2.882353 |
| 255 | Thomas Pidcock | 10500 | 86 | 1.0 | 8.190476 |
| 199 | Neilson Powless | 29000 | 70 | 1.0 | 2.413793 |
| 156 | Mark Cavendish | 2500 | 54 | 1.0 | 21.600000 |
| 163 | Mikkel Frolich Honore | 25500 | 53 | 1.0 | 2.078431 |
| 76 | Gino Mader | 18500 | 46 | 1.0 | 2.486486 |
| 976 | Gianni Marchand | 2500 | 26 | 1.0 | 10.400000 |
| 111 | Ide Schelling | 14000 | 22 | 1.0 | 1.571429 |
With these 7 popular riders, the maximum achievable score is 1957 which is very good, but also quite a drop from 2223 points.
We can also ask the opposite question: What is the best team we could have created without any of these 7 popular riders. In the code, now the binary variables of these riders are put to 0.
cost_limit = 1750000
count_required = 15
M = pyo.ConcreteModel()
M.ITEMS = pyo.Set(initialize=points.keys())
M.x = pyo.Var(M.ITEMS, within=pyo.Binary)
M.points = pyo.Objective(expr=sum(points[i]*M.x[i] for i in M.ITEMS), sense=pyo.maximize)
M.weight = pyo.Constraint(expr=sum(cost[i]*M.x[i] for i in M.ITEMS) <= cost_limit)
M.count = pyo.Constraint(expr=sum(count[i]*M.x[i] for i in M.ITEMS) == count_required)
M.popular = pyo.ConstraintList()
for p in popular:
M.popular.add(M.x[p] == 0)
opt = pyo.SolverFactory("cbc.exe")
results = opt.solve(M)
print(results)
Problem:
- Lower bound: -inf
Upper bound: inf
Number of objectives: 1
Number of constraints: 9
Number of variables: 979
Sense: unknown
Solver:
- Status: ok
Message: CBC 2.10.3 optimal, objective -1940; 0 nodes, 6 iterations, 0.031 seconds
Termination condition: optimal
Id: 0
Error rc: 0
Time: 0.08577203750610352
Solution:
- number of solutions: 0
number of solutions displayed: 0
df['selected'] = [v.value for v in M.component_data_objects(pyo.Var)]
df[df['selected'] > 0.5].sort_values('points', ascending=False)
| name_ | value | points | selected | efficiency | |
|---|---|---|---|---|---|
| 240 | Richard Carapaz | 156000 | 195 | 1.0 | 1.250000 |
| 67 | Sonny Colbrelli | 140500 | 160 | 1.0 | 1.138790 |
| 267 | Adam Yates | 143000 | 149 | 1.0 | 1.041958 |
| 326 | Michael Woods | 145500 | 142 | 1.0 | 0.975945 |
| 239 | Egan Bernal | 174500 | 132 | 1.0 | 0.756447 |
| 523 | Jasper Stuyven | 140000 | 130 | 1.0 | 0.928571 |
| 78 | Matej Mohoric | 80500 | 126 | 1.0 | 1.565217 |
| 468 | Jonas Vingegaard | 40000 | 126 | 1.0 | 3.150000 |
| 256 | Richie Porte | 135000 | 124 | 1.0 | 0.918519 |
| 264 | Geraint Thomas | 133500 | 121 | 1.0 | 0.906367 |
| 215 | David Gaudu | 128500 | 120 | 1.0 | 0.933852 |
| 150 | Kasper Asgreen | 114500 | 118 | 1.0 | 1.030568 |
| 265 | Dylan van Baarle | 59000 | 103 | 1.0 | 1.745763 |
| 18 | Ben O Connor | 34000 | 98 | 1.0 | 2.882353 |
| 100 | Wilco Kelderman | 121000 | 96 | 1.0 | 0.793388 |
The maximum achievable score then is 1940, which is slightly less than the best team having all those riders.
We can also turn the knapsack problem on its head: What is the cheapest team, that could have won (= total score of 1564 or more). In the code, the objective and the cost constraint need to be interchanged.
cost_limit = 1750000
count_required = 15
M = pyo.ConcreteModel()
M.ITEMS = pyo.Set(initialize=points.keys())
M.x = pyo.Var(M.ITEMS, within=pyo.Binary)
M.points = pyo.Constraint(expr=sum(points[i]*M.x[i] for i in M.ITEMS) >= 1564+1)
M.weight = pyo.Objective(expr=sum(cost[i]*M.x[i] for i in M.ITEMS), sense=pyo.minimize)
M.count = pyo.Constraint(expr=sum(count[i]*M.x[i] for i in M.ITEMS) == count_required)
opt = pyo.SolverFactory("cbc.exe")
results = opt.solve(M)
print(results)
Problem:
- Lower bound: -inf
Upper bound: inf
Number of objectives: 1
Number of constraints: 2
Number of variables: 979
Sense: unknown
Solver:
- Status: ok
Message: CBC 2.10.3 optimal, objective 936500; 2 nodes, 47 iterations, 0.125 seconds
Termination condition: optimal
Id: 0
Error rc: 0
Time: 0.17966532707214355
Solution:
- number of solutions: 0
number of solutions displayed: 0
df['selected'] = [v.value for v in M.component_data_objects(pyo.Var)]
df[df['selected'] > 0.5].sort_values('points', ascending=False)
| name_ | value | points | selected | efficiency | |
|---|---|---|---|---|---|
| 465 | Wout van Aert | 242000 | 316 | 1.0 | 1.305785 |
| 240 | Richard Carapaz | 156000 | 195 | 1.0 | 1.250000 |
| 67 | Sonny Colbrelli | 140500 | 160 | 1.0 | 1.138790 |
| 78 | Matej Mohoric | 80500 | 126 | 1.0 | 1.565217 |
| 468 | Jonas Vingegaard | 40000 | 126 | 1.0 | 3.150000 |
| 265 | Dylan van Baarle | 59000 | 103 | 1.0 | 1.745763 |
| 18 | Ben O Connor | 34000 | 98 | 1.0 | 2.882353 |
| 202 | Rigoberto Uran | 93500 | 90 | 1.0 | 0.962567 |
| 255 | Thomas Pidcock | 10500 | 86 | 1.0 | 8.190476 |
| 199 | Neilson Powless | 29000 | 70 | 1.0 | 2.413793 |
| 156 | Mark Cavendish | 2500 | 54 | 1.0 | 21.600000 |
| 163 | Mikkel Frolich Honore | 25500 | 53 | 1.0 | 2.078431 |
| 76 | Gino Mader | 18500 | 46 | 1.0 | 2.486486 |
| 976 | Gianni Marchand | 2500 | 26 | 1.0 | 10.400000 |
| 977 | Finn Fisher Black | 2500 | 21 | 1.0 | 8.400000 |
df[df['selected'] > 0.5].sum()
name_ Ben O ConnorSonny ColbrelliGino MaderMatej Moh...
value 936500
points 1570
selected 15.0
efficiency 69.569662
dtype: object
The cheapest team that could have won, costs 936.500 euro, which means that only 53 percent of the budget is used. There are a lot of riders shared with the optimal team, among who again Wout van Aert. What a guy!
Finally, the worst team is a trivial problem: just select 15 (cheap?) riders with no points. One way to have a more interesting worst team, could be to set a lower limit on the team cost, e.g. 1.700.000 euro.
count_required = 15
M = pyo.ConcreteModel()
M.ITEMS = pyo.Set(initialize=points.keys())
M.x = pyo.Var(M.ITEMS, within=pyo.Binary)
M.points = pyo.Objective(expr=sum(points[i]*M.x[i] for i in M.ITEMS), sense=pyo.minimize)
M.weight = pyo.Constraint(expr=sum(cost[i]*M.x[i] for i in M.ITEMS) >= 1700000)
M.count = pyo.Constraint(expr=sum(count[i]*M.x[i] for i in M.ITEMS) == count_required)
M.popular = pyo.ConstraintList()
for p in popular:
M.popular.add(M.x[p] == 0)
opt = pyo.SolverFactory("cbc.exe")
results = opt.solve(M)
print(results)
Problem:
- Lower bound: -inf
Upper bound: inf
Number of objectives: 1
Number of constraints: 9
Number of variables: 979
Sense: unknown
Solver:
- Status: ok
Message: CBC 2.10.3 optimal, objective 0; 0 nodes, 0 iterations, 0.013 seconds
Termination condition: optimal
Id: 0
Error rc: 0
Time: 0.058875083923339844
Solution:
- number of solutions: 0
number of solutions displayed: 0
df['selected'] = [v.value for v in M.component_data_objects(pyo.Var)]
df[df['selected'] > 0.5].sort_values('points', ascending=False)
| name_ | value | points | selected | efficiency | |
|---|---|---|---|---|---|
| 15 | Bob Jungels | 91500 | 0 | 1.0 | 0.0 |
| 94 | Emanuel Buchmann | 116000 | 0 | 1.0 | 0.0 |
| 226 | Rudy Molard | 67500 | 0 | 1.0 | 0.0 |
| 227 | Thibaut Pinot | 172500 | 0 | 1.0 | 0.0 |
| 284 | Andrea Pasqualon | 77000 | 0 | 1.0 | 0.0 |
| 308 | Chris Froome | 131500 | 0 | 1.0 | 0.0 |
| 333 | John Degenkolb | 99000 | 0 | 1.0 | 0.0 |
| 450 | Dylan Groenewegen | 135000 | 0 | 1.0 | 0.0 |
| 471 | Fabio Aru | 88000 | 0 | 1.0 | 0.0 |
| 515 | Vincenzo Nibali | 158000 | 0 | 1.0 | 0.0 |
| 550 | Jan Polanc | 68500 | 0 | 1.0 | 0.0 |
| 735 | Eduard Prades | 114500 | 0 | 1.0 | 0.0 |
| 819 | Ilnur Zakarin | 99500 | 0 | 1.0 | 0.0 |
| 879 | Nairo Quintana | 207500 | 0 | 1.0 | 0.0 |
| 925 | Niki Terpstra | 79000 | 0 | 1.0 | 0.0 |
df[df['selected'] > 0.5].sum()
name_ Bob JungelsEmanuel BuchmannRudy MolardThibaut ...
value 1705000
points 0
selected 15.0
efficiency 0.0
dtype: object
We managed to find a team that uses nearly all the money and still scores no points. This result could also have been achieved without a MILP solver, and by selection all riders with no points, and then sorting them from expensive to cheap and take the 15 most expensive.
As the bottleneck is in the quality of predicting scores (when combined with knapsack), perhaps I should consider building a machine-learning model to predict scores? Or perhaps apply a cool Bayesian rating system like Trueskill to the cycling data? Maybe this can be inspiration for extending Trueskill, which can handle both individual competition as well as team competition, but not a hybrid. The hybrid one is very typical in modern cycling where riders have both individual and team goals, which can sometimes be conflicting.
Very interested in your comments but still figuring out the most suited approach to this. For now, feel free to send me an email.