Like the 2021 post, we compute what would have been the optimal team for Megatombike 2022, a local fantasy cycling competition.
This year, I didn’t get round to making a post on how I assembled my team. For the better, in hindsight, as it performed terribly. I hoped to get a team of overachievers as a combination of
Especially the latter turned out to be a bad idea. Also, out of the expensive, high scoring riders, I kicked out Van Aert to make room for Alaphilippe at the last moment…
Anyways, on to more succesful teams. The winner of this year, once again, was Hala Die Mannschaft with a score of 2435, this year by a margin of almost 100 points over the second team. Very impressive!
Rider | Points |
---|---|
LÓPEZ Miguel Ángel | 74 |
JAKOBSEN Fabio | 115 |
VAN AERT Wout | 401 |
CAMPENAERTS Victor | 74 |
VINE Jay | 74 |
FOSS Tobias | 45 |
POGAČAR Tadej | 473 |
EVENEPOEL Remco | 394 |
VALVERDE Alejandro | 176 |
MOHORIČ Matej | 125 |
LAMPAERT Yves | 22 |
MARTÍNEZ Daniel Felipe | 191 |
SHEFFIELD Magnus | 80 |
VADER Milan | 0 |
DE LIE Arnaud | 191 |
Note that the overall scores this year are about 60% higher than last year, because of an extended calender incorporating many smaller races
I have copied some math and code from previous year to show the mathematical model and how to solve that using the combination of the Pyomo optimization modelling languange along with open-source MILP (mixed integer linear programming) solver CBC. However, we have a new kid on the block as well, stay tuned!
Let us repeat the mathematical model once more:
\[\begin{align*} \text{maximize}_{x_i} \quad & \sum_{i=1}^n p_i x_i \\ \text{subject to} \quad & \sum_{i=1}^n c_i x_i \leq C \\ & \sum_{i=1}^n x_i = N \\ & x_i \in \{0,1\} \end{align*}\]where $p_i$ is the number of points collected by a rider, $c_i$ is the cost of a rider, $C$ is the total team cost limit (= 1.750.000), and $N$ is the total number of riders (= 15). $x_i$ is a binary variable which either selects or does not select rider $i$.
So, we are maximizing the total number of points, given all the constraints.
Constraint \(\sum_{i=1}^n x_i = N = 15\) is how this problem differs from the standard 0-1 knapsack problem.
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv(r"C:\Users\swaenenl\OneDrive - Sioux Group B.V\Documents\Personal\website\notebooks\megatombike\2022_results\2022_results\riders.csv", delimiter=';')
df.head(15)
id | name | price | points | id.1 | name.1 | selected | |
---|---|---|---|---|---|---|---|
0 | 527 | POGAČAR Tadej | 437000 | 473 | 18 | UAE Team Emirates | 1.0 |
1 | 322 | VAN AERT Wout | 320000 | 401 | 11 | Jumbo-Visma | 1.0 |
2 | 395 | EVENEPOEL Remco | 152500 | 394 | 14 | Quick-Step Alpha Vinyl Team | 1.0 |
3 | 328 | VINGEGAARD Jonas | 144000 | 247 | 11 | Jumbo-Visma | 1.0 |
4 | 113 | VLASOV Aleksandr | 174000 | 213 | 4 | BORA - hansgrohe | 0.0 |
5 | 315 | LAPORTE Christophe | 96500 | 213 | 11 | Jumbo-Visma | 1.0 |
6 | 96 | HIGUITA Sergio | 87500 | 195 | 4 | BORA - hansgrohe | 1.0 |
7 | 561 | VAN DER POEL Mathieu | 260000 | 195 | 19 | Alpecin-Fenix | 0.0 |
8 | 219 | MARTÍNEZ Daniel Felipe | 75000 | 191 | 8 | INEOS Grenadiers | 1.0 |
9 | 335 | DE LIE Arnaud | 15000 | 191 | 12 | Lotto Soudal | 1.0 |
10 | 189 | KÜNG Stefan | 121500 | 187 | 7 | Groupama - FDJ | 1.0 |
11 | 61 | BILBAO Pello | 102500 | 187 | 3 | Bahrain - Victorious | 1.0 |
12 | 548 | MERLIER Tim | 144000 | 181 | 19 | Alpecin-Fenix | 0.0 |
13 | 383 | VALVERDE Alejandro | 159500 | 176 | 13 | Movistar Team | 0.0 |
14 | 434 | MATTHEWS Michael | 150000 | 153 | 15 | Team BikeExchange - Jayco | 0.0 |
The csv file, once again delivered to me by organizer Bob, has the riders already sorted according to their points. The table does have some duplicates which we need to fix. We see 2 Belgian riders in the top 3! I believe one of the reasons of some of my Megatombike successes around 2012 were due to me not having a bias towards Belgian riders unlike most MTB participants. Back then, that was an advantage. Nowadays, not so much!
df = df.drop_duplicates(['name'])
df.sort_values('points', ascending=False).head(15).sum()
id 4826
name POGAČAR TadejVAN AERT WoutEVENEPOEL RemcoVINGE...
price 2439000
points 3597
id.1 169
name.1 UAE Team EmiratesJumbo-VismaQuick-Step Alpha V...
selected 7.0
dtype: object
Summing the values and points of the best 15 riders gives a team that scores 3597 points, but is way too expensive (2.4 million). Still, this doesn’t exceed the 1.75 million budget as much as last year (then it was 2.85 million). And in fact, starting from the top, we would be able to buy the 9 highest scoring riders, for a point total of 2522. Enough to beat out Hala Die Mannschaft!
df.sort_values('points', ascending=False).head(9).sum()
id 2876
name POGAČAR TadejVAN AERT WoutEVENEPOEL RemcoVINGE...
price 1746500
points 2522
id.1 100
name.1 UAE Team EmiratesJumbo-VismaQuick-Step Alpha V...
selected 5.0
dtype: object
import pyomo.environ as pyo
points = {}
cost = {}
count = {}
dups = []
for i, el in df.iterrows():
name = el['name']
points[name] = el['points']
cost[name] = el['price']
count[name] = 1.
cost_limit = 1750000
count_required = 15
M = pyo.ConcreteModel()
M.ITEMS = pyo.Set(initialize=list(points.keys()))
M.x = pyo.Var(M.ITEMS, within=pyo.Binary)
M.points = pyo.Objective(expr=sum(points[i]*M.x[i] for i in M.ITEMS), sense=pyo.maximize)
M.weight = pyo.Constraint(expr=sum(cost[i]*M.x[i] for i in M.ITEMS) <= cost_limit)
M.count = pyo.Constraint(expr=sum(count[i]*M.x[i] for i in M.ITEMS) == count_required)
Let’s run this thing!
opt = pyo.SolverFactory("cbc.exe")
results = opt.solve(M)
print(results)
Problem:
- Lower bound: -inf
Upper bound: inf
Number of objectives: 1
Number of constraints: 2
Number of variables: 963
Sense: unknown
Solver:
- Status: ok
Message: CBC 2.10.3 optimal, objective -3263; 0 nodes, 19 iterations, 0.025 seconds
Termination condition: optimal
Id: 0
Error rc: 0
Time: 0.07400059700012207
Solution:
- number of solutions: 0
number of solutions displayed: 0
It never ceases to amaze how fast MILP solvers can solve (with guaranteed optimality) these NP-hard problems. If we take the point of view that the number of candidate solutions is dictated by choosing a combination of 15 elements out of 963, we can use the scipy
implementation of $N$-choose-$k$ to get an impression of the size of the solution space and the NP-hardness of it:
import scipy.special
scipy.special.comb(963, 15)
3.893159932009228e+32
The results
structure output reports a computation time of ~65 milliseconds to find the optimal solution among $3.89 \times 10^{32}$ candidate solutions. Of course, it does so much faster than brute forcing its way through all those candidates.
%%timeit
opt = pyo.SolverFactory("cbc.exe")
results = opt.solve(M)
111 ms ± 412 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
If we do a custom timing of the solve, that computation time doubles to 120-130 milliseconds. CBC is included in Pyomo as an executable and the interfacing is file-system-based: Pyomo writes to files which get loaded by cbc.exe
(which you have to put somewhere in the path, e.g. in the same directory as the Jupyter notebook I was using.
But you were interested in the optimal team, right?! Here goes:
df['selected'] = [v.value for v in M.component_data_objects(pyo.Var)]
df[df['selected'] > 0.5].sort_values('points', ascending=False)
id | name | price | points | id.1 | name.1 | selected | |
---|---|---|---|---|---|---|---|
0 | 527 | POGAČAR Tadej | 437000 | 473 | 18 | UAE Team Emirates | 1.0 |
1 | 322 | VAN AERT Wout | 320000 | 401 | 11 | Jumbo-Visma | 1.0 |
2 | 395 | EVENEPOEL Remco | 152500 | 394 | 14 | Quick-Step Alpha Vinyl Team | 1.0 |
3 | 328 | VINGEGAARD Jonas | 144000 | 247 | 11 | Jumbo-Visma | 1.0 |
5 | 315 | LAPORTE Christophe | 96500 | 213 | 11 | Jumbo-Visma | 1.0 |
6 | 96 | HIGUITA Sergio | 87500 | 195 | 4 | BORA - hansgrohe | 1.0 |
8 | 219 | MARTÍNEZ Daniel Felipe | 75000 | 191 | 8 | INEOS Grenadiers | 1.0 |
9 | 335 | DE LIE Arnaud | 15000 | 191 | 12 | Lotto Soudal | 1.0 |
10 | 189 | KÜNG Stefan | 121500 | 187 | 7 | Groupama - FDJ | 1.0 |
11 | 61 | BILBAO Pello | 102500 | 187 | 3 | Bahrain - Victorious | 1.0 |
21 | 445 | ARENSMAN Thymen | 22000 | 134 | 16 | Team DSM | 1.0 |
29 | 226 | RODRIGUEZ Carlos | 57500 | 120 | 8 | INEOS Grenadiers | 1.0 |
31 | 252 | MEINTJES Louis | 41500 | 117 | 9 | Intermarché - Wanty - Gobert Matériaux | 1.0 |
35 | 507 | AYUSO Juan | 40000 | 110 | 18 | UAE Team Emirates | 1.0 |
41 | 247 | HERMANS Quinten | 30000 | 103 | 9 | Intermarché - Wanty - Gobert Matériaux | 1.0 |
df[df['selected'] > 0.5].sum()
id 4464
name POGAČAR TadejVAN AERT WoutEVENEPOEL RemcoVINGE...
price 1742500
points 3263
id.1 159
name.1 UAE Team EmiratesJumbo-VismaQuick-Step Alpha V...
selected 15.0
dtype: object
This team would have scored a whopping 3263 points: more than 800 points more than the winner. Easier said than done, of course!
A little bit more analysis: We saw before how the total price of the top 15 scorers doesn’t exceed the budget as much as last year. Now we see that the optimal team has 9 out of the 11 highest scoring riders in it. Only numbers 4 and 7 are not selected: Vlasov and van der Poel.
These observations may indicate that riders are relatively cheap this year, and one can buy almost all of the highest scoring riders. Once you get to the point that you can buy all 15 highest scoring riders with the given budget, the knapsack aspect of the fantasy cycling competition becomes a bit moot, and it becomes only about predicting who will be the highest scoring riders.
However, if one looks at the 15 most expensive riders this season, only 2 of them make it to the optimal team, and there is quite a discrepancy between rider prices and scores. In fact, the 15 most expensive riders together would have scored in total 2226 points, which would not even have won the 2022 MTB competition!
Anyways, I’m not sure whether this all means I should recommend the organisers to make the budgets tighter again, or not…
df.sort_values(['price'], ascending=False).head(15)
id | name | price | points | id.1 | name.1 | selected | |
---|---|---|---|---|---|---|---|
0 | 527 | POGAČAR Tadej | 437000 | 473 | 18 | UAE Team Emirates | 0.0 |
16 | 318 | ROGLIČ Primož | 377000 | 148 | 11 | Jumbo-Visma | 0.0 |
1 | 322 | VAN AERT Wout | 320000 | 401 | 11 | Jumbo-Visma | 0.0 |
147 | 385 | ALAPHILIPPE Julian | 280000 | 32 | 14 | Quick-Step Alpha Vinyl Team | 0.0 |
7 | 561 | VAN DER POEL Mathieu | 260000 | 195 | 19 | Alpecin-Fenix | 0.0 |
30 | 505 | ALMEIDA João | 219000 | 118 | 18 | UAE Team Emirates | 0.0 |
541 | 208 | BERNAL Egan | 193000 | 0 | 8 | INEOS Grenadiers | 0.0 |
22 | 209 | CARAPAZ Richard | 182500 | 133 | 8 | INEOS Grenadiers | 0.0 |
92 | 504 | ACKERMANN Pascal | 175500 | 47 | 18 | UAE Team Emirates | 0.0 |
4 | 113 | VLASOV Aleksandr | 174000 | 213 | 4 | BORA - hansgrohe | 0.0 |
141 | 64 | COLBRELLI Sonny | 161000 | 33 | 3 | Bahrain - Victorious | 0.0 |
20 | 237 | YATES Adam | 160500 | 135 | 8 | INEOS Grenadiers | 0.0 |
13 | 383 | VALVERDE Alejandro | 159500 | 176 | 13 | Movistar Team | 0.0 |
55 | 185 | GAUDU David | 154000 | 76 | 7 | Groupama - FDJ | 0.0 |
98 | 297 | WOODS Michael | 153000 | 46 | 10 | Israel - Premier Tech | 0.0 |
df.sort_values(['price'], ascending=False).head(15).sum()
id 4818
name POGAČAR TadejROGLIČ PrimožVAN AERT WoutALAPHIL...
price 3406000
points 2226
id.1 170
name.1 UAE Team EmiratesJumbo-VismaJumbo-VismaQuick-S...
selected 0.0
dtype: object
A final bit of analysis: In 2021, the highest team scored 1564 points, while the maximum achievable was 2223. The ratio of these two numbers is 70%. In 2022, we have a ratio of $2435/3263 \approx 75\%$. So there seems to be some consistency on how optimal you have to be to win MTB.
We can also turn the knapsack problem on its head: What is the cheapest team, that could have won (= total score of 2436 or more). In the code, an objective and constraint need to be swapped.
cost_limit = 1750000
count_required = 15
M = pyo.ConcreteModel()
M.ITEMS = pyo.Set(initialize=points.keys())
M.x = pyo.Var(M.ITEMS, within=pyo.Binary)
M.points = pyo.Constraint(expr=sum(points[i]*M.x[i] for i in M.ITEMS) >= 2436+1)
M.weight = pyo.Objective(expr=sum(cost[i]*M.x[i] for i in M.ITEMS), sense=pyo.minimize)
M.count = pyo.Constraint(expr=sum(count[i]*M.x[i] for i in M.ITEMS) == count_required)
opt = pyo.SolverFactory("cbc.exe")
results = opt.solve(M)
print(results)
Problem:
- Lower bound: -inf
Upper bound: inf
Number of objectives: 1
Number of constraints: 2
Number of variables: 963
Sense: unknown
Solver:
- Status: ok
Message: CBC 2.10.3 optimal, objective 908000; 0 nodes, 50 iterations, 0.072 seconds
Termination condition: optimal
Id: 0
Error rc: 0
Time: 0.11903929710388184
Solution:
- number of solutions: 0
number of solutions displayed: 0
df['selected'] = [v.value for v in M.component_data_objects(pyo.Var)]
df[df['selected'] > 0.5].sort_values('points', ascending=False)
id | name | price | points | id.1 | name.1 | selected | |
---|---|---|---|---|---|---|---|
2 | 395 | EVENEPOEL Remco | 152500 | 394 | 14 | Quick-Step Alpha Vinyl Team | 1.0 |
3 | 328 | VINGEGAARD Jonas | 144000 | 247 | 11 | Jumbo-Visma | 1.0 |
5 | 315 | LAPORTE Christophe | 96500 | 213 | 11 | Jumbo-Visma | 1.0 |
6 | 96 | HIGUITA Sergio | 87500 | 195 | 4 | BORA - hansgrohe | 1.0 |
8 | 219 | MARTÍNEZ Daniel Felipe | 75000 | 191 | 8 | INEOS Grenadiers | 1.0 |
9 | 335 | DE LIE Arnaud | 15000 | 191 | 12 | Lotto Soudal | 1.0 |
11 | 61 | BILBAO Pello | 102500 | 187 | 3 | Bahrain - Victorious | 1.0 |
21 | 445 | ARENSMAN Thymen | 22000 | 134 | 16 | Team DSM | 1.0 |
29 | 226 | RODRIGUEZ Carlos | 57500 | 120 | 8 | INEOS Grenadiers | 1.0 |
31 | 252 | MEINTJES Louis | 41500 | 117 | 9 | Intermarché - Wanty - Gobert Matériaux | 1.0 |
35 | 507 | AYUSO Juan | 40000 | 110 | 18 | UAE Team Emirates | 1.0 |
36 | 312 | KOOIJ Olav | 41500 | 109 | 11 | Jumbo-Visma | 1.0 |
41 | 247 | HERMANS Quinten | 30000 | 103 | 9 | Intermarché - Wanty - Gobert Matériaux | 1.0 |
53 | 228 | SHEFFIELD Magnus | 2500 | 80 | 8 | INEOS Grenadiers | 1.0 |
78 | 980 | POZZOVIVO Domenico | 0 | 55 | 9 | Intermarché - Wanty - Gobert Matériaux | 1.0 |
df[df['selected'] > 0.5].sum()
id 4946
name EVENEPOEL RemcoVINGEGAARD JonasLAPORTE Christo...
price 908000
points 2446
id.1 151
name.1 Quick-Step Alpha Vinyl TeamJumbo-VismaJumbo-Vi...
selected 15.0
dtype: object
The cheapest team that could have won, costs 908.000 euro, which means that only about 50 percent of the budget is used! This team shares 12 riders with the optimal team, and swaps out
for
This corroborates the discrepancy between prices and scores this year: Many riders that scored high were quite cheap. And vice versa many expensive riders scored below par. Only Pogacar and Van Aert lived up to the expectations.
Finally, the worst team is a trivial problem: just select 15 (cheap?) riders with no points. One way to have a more interesting worst team, could be to set a lower limit on the team cost, e.g. 1.745.000 euro.
count_required = 15
M = pyo.ConcreteModel()
M.ITEMS = pyo.Set(initialize=points.keys())
M.x = pyo.Var(M.ITEMS, within=pyo.Binary)
M.points = pyo.Objective(expr=sum(points[i]*M.x[i] for i in M.ITEMS), sense=pyo.minimize)
M.weight = pyo.Constraint(expr=sum(cost[i]*M.x[i] for i in M.ITEMS) >= cost_limit - 5000)
M.count = pyo.Constraint(expr=sum(count[i]*M.x[i] for i in M.ITEMS) == count_required)
opt = pyo.SolverFactory("cbc.exe")
results = opt.solve(M)
print(results)
Problem:
- Lower bound: -inf
Upper bound: inf
Number of objectives: 1
Number of constraints: 2
Number of variables: 963
Sense: unknown
Solver:
- Status: ok
Message: CBC 2.10.3 optimal, objective 70; 0 nodes, 19 iterations, 0.031 seconds
Termination condition: optimal
Id: 0
Error rc: 0
Time: 0.0669410228729248
Solution:
- number of solutions: 0
number of solutions displayed: 0
df['selected'] = [v.value for v in M.component_data_objects(pyo.Var)]
df[df['selected'] > 0.5].sort_values('points', ascending=False)
id | name | price | points | id.1 | name.1 | selected | |
---|---|---|---|---|---|---|---|
147 | 385 | ALAPHILIPPE Julian | 280000 | 32 | 14 | Quick-Step Alpha Vinyl Team | 1.0 |
232 | 235 | VIVIANI Elia | 146500 | 14 | 8 | INEOS Grenadiers | 1.0 |
245 | 396 | HONORÉ Mikkel Frølich | 115000 | 11 | 14 | Quick-Step Alpha Vinyl Team | 1.0 |
308 | 99 | KELDERMAN Wilco | 109500 | 6 | 4 | BORA - hansgrohe | 1.0 |
348 | 109 | SCHACHMANN Maximilian | 137500 | 4 | 4 | BORA - hansgrohe | 1.0 |
391 | 401 | MASNADA Fausto | 78500 | 1 | 14 | Quick-Step Alpha Vinyl Team | 1.0 |
413 | 305 | DUMOULIN Tom | 127500 | 1 | 11 | Jumbo-Visma | 1.0 |
419 | 17 | PARET-PEINTRE Aurélien | 88000 | 1 | 1 | AG2R Citroën Team | 1.0 |
541 | 208 | BERNAL Egan | 193000 | 0 | 8 | INEOS Grenadiers | 1.0 |
542 | 508 | BENNETT George | 86500 | 0 | 18 | UAE Team Emirates | 1.0 |
544 | 784 | ZAKARIN Ilnur | 71500 | 0 | 29 | Gazprom - RusVelo | 1.0 |
545 | 78 | POELS Wout | 80000 | 0 | 3 | Bahrain - Victorious | 1.0 |
546 | 389 | CATTANEO Mattia | 69500 | 0 | 14 | Quick-Step Alpha Vinyl Team | 1.0 |
548 | 47 | MOSCON Gianni | 85000 | 0 | 2 | Astana Qazaqstan Team | 1.0 |
605 | 167 | PADUN Mark | 80000 | 0 | 6 | EF Education-EasyPost | 1.0 |
df[df['selected'] > 0.5].sum()
id 3902
name ALAPHILIPPE JulianVIVIANI EliaHONORÉ Mikkel Fr...
price 1759500
points 71
id.1 142
name.1 Quick-Step Alpha Vinyl TeamINEOS GrenadiersQui...
selected 15.0
dtype: object
Last year, we could find a team that used all the money and scored no points. This year, if we spent nearly all the money, we’re still getting about 70 points. Julien Alaphilippe and Tom Dumoulin are two of the worst riders one could have chosen… which I did…
This final section is for those interested in the mathematical optimization aspect.
I feel like there is quite a lot of movement in the MILP solver landscape. In the land of commercial solvers, some new, mostly Chinese, solvers are speeding their ways up to the tops of the benchmarks. And some other contenders are pulling out of the benchmarks due to some Gurobi scandal. The developers of the Chinese COPT seem to be confident that they will overtake Gurobi in the next years: https://www.youtube.com/watch?v=iqiBXoJQVD8&t=258s
In open source land, HiGHs is the big newcomer. A major development with respect to last years post, is that in the summer of 2022, with Scipy version 1.9.0, the scipy.optimize
submodule got extended with a milp
routine, which runs HiGHs in the backend. I think this is quite a big deal. Scipy is a much more common Python library/dependency than Pyomo. For example, the Anaconda conda-forge channel has (at the time of writing) 600k installs for Pyomo, while Scipy has 31M, which is 50 times more! Moreover you don’t need to install an open source MILP solver like CBC separately. This makes MILP solvers much more broadly accessible. The main current downside is that the interface is much more barebones, and you don’t get all the nice modeling tools (named variables etc.) that you get with Pyomo.
Below, we work out the MTB knapsack problem with scipy.optimize.milp
. This is really easy to do, as the Scipy documentation already has a knapsack example: https://scipy.github.io/devdocs/tutorial/optimize.html#knapsack-problem-example
All we need to add is the constraint that the number of riders needs to equal 15.
import scipy.optimize as sciopt
scipy.__version__
'1.9.3'
points_arr = np.array(list(points.values()))
cost_arr = np.array(list(cost.values()))
count_arr = np.array(list(count.values()))
bounds = sciopt.Bounds(0, 1) # 0 <= x_i <= 1
integrality = np.full_like(points_arr, True) # x_i are integers
cost_constraints = sciopt.LinearConstraint(A=cost_arr, lb=0, ub=cost_limit)
count_constraints = sciopt.LinearConstraint(A=count_arr, lb=count_required, ub=count_required)
res = sciopt.milp(c=-points_arr, constraints=[cost_constraints, count_constraints],
integrality=integrality, bounds=bounds)
res.fun
-3263.0
sel = np.array(df['selected'])
np.max(np.abs(res.x - sel))
The solver finds the exact same solution with the same maximum point total of 3263.
%%timeit
res = sciopt.milp(c=-points_arr, constraints=[cost_constraints, count_constraints],
integrality=integrality, bounds=bounds)
20.9 ms ± 716 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
If we benchmark it, we see that on this problem Scipy+HiGHs is 3 - 6 times faster than the Pyomo+CBC combination (depending on which timing number you take)! Part of it is probably due to the solver being interfaced as a normal library rather than a seperate executable running in a different process.
Finally, I’d like to report some related news on the SCIP MILP solver, which has always dwelled a bit in between open source and commercial solvers (both in license and in benchmark results). In November 2022, SCIP switched to the fully open source Apache 2.0 license. SCIP can be used from Python through the PySCIPOpt package. This has 130k downloads through conda-forge so seems the least popular option at the moment, though that may change due to the recent license changes. I haven’t gotten round to giving it a go. I wouldn’t be surprised if it turns out to be the fastest open source solver on this knapsack problem.
Very interested in your comments but still figuring out the most suited approach to this. For now, feel free to send me an email.