Lab 7 Presentation - Second Competition Round-Up

Author

Erez Buchweitz

1 Today

  • How you did
  • How I would solve the competition
  • How the winner solved the competition
  • Other methods for hyperparameter optimization

2 How you did

             group     loss 
              YATV 0.313563 **
              yyyy 0.313726 **
         posquared 0.313726 
              YATV 0.315156 **
    rubenSandwitch 0.316338 *
             Jiayi 0.316989 *
       funky_fresh 0.317581 
param_grid_turkana 0.320333 **
   optimized_model 0.320962 
               gfm 0.321301 **
              jcjc 0.321403 *
             Monte 0.323772 **
   params_columbia 0.323782 
              lab6 0.323783 **
               www 0.324882 **
             quack 0.324969 
            JNNN-2 0.326188 *
          logan301 0.326238 **
            orange 0.328230 **
            GGBond 0.328674 **
       funky_fresh 0.329883 
              gpt5 0.330024 !
        lab6_param 0.330965 **
       ljdashuaibi 0.332732 **
              yyds 0.340218 **
             green 0.341543 !
           CVQueen 0.342143 
             bfern 0.346011 
            Sodium 0.347496 !
           sparkle 0.349388 ** (too restrictive optimization)
        watermelon 0.355344 
          5090SIMP 0.357172 !
               MLA 0.360402 
               789 0.369162 
                   0.390975 
        WhatsurGPT 0.419548 !
baseline - default 0.419548
          RAFisher 0.446421 !
               Jay 0.532594 !
                 y 0.696065 ** (major bug)
          redpanda 0.724009 
               aaa 0.820612 

2.1 What do the **, *, ! mean?

  • ** - block + past/future CV
  • * - block CV
  • ! - plain CV

3 How I would solve the competition

3.1 Load data

import numpy as np
import pandas as pd

path = "../datasets/west_nile_virus"
X = pd.read_csv(f"{path}/X_cv_competition.csv")
X.Species = X.Species.astype("category")
y = pd.read_csv(f"{path}/y_cv_competition.csv").squeeze()

print(f"{X.shape=}")
X.shape=(8114, 10)

4 Create CV folds

Let’s look at how the dates distribute in our data:

X.Date = pd.to_datetime(X.Date)
pd.crosstab(X.Date.dt.year, X.Date.dt.month)
Date 5 6 7 8 9 10
Date
2007 25 176 575 2050 774 211
2009 59 578 755 374 418 65
2011 0 381 640 493 540 0

4.1 How would you split into folds?

Looks like it’s reasonable to split into 6 folds:

  • Fold 1: May-Jul 2007
  • Fold 2: Aug-Oct 2007
  • Fold 3: May-Jul 2009
  • Fold 4: Aug-Oct 2009
  • Fold 5: May-Jul 2011
  • Fold 6: Aug-Oct 2011

Let’s do this, without assuming the data are sorted by date:

cutoff_dates = pd.to_datetime(["2007-08-01", "2009-01-01", "2009-08-01", "2011-01-01", "2011-08-01"])
fold_mask = np.searchsorted(cutoff_dates, X.Date)
print(f"{pd.Series(fold_mask).value_counts()=}")
cv_pairs = [{
    "X_train": X.loc[fold_mask < i].drop("Date", axis=1),
    "y_train": y.loc[fold_mask < i],
    "X_valid": X.loc[fold_mask == i].drop("Date", axis=1),
    "y_valid": y.loc[fold_mask == i],
} for i in range(1, max(fold_mask) + 1)]
print(f"{len(cv_pairs)=}")
1    2484
2    1392
0    1327
5    1033
4    1021
3     857
Name: count, dtype: int64
len(cv_pairs)=5

4.2 Final CV configuration

So we have the following 5 train-valid pairs:

  • Valid: 2, train: 1
  • Valid: 3, train: 1,2
  • Valid: 4, train: 1,2,3
  • Valid: 5, train: 1,2,3,4
  • Valid: 6, train: 1,2,3,4,5

5 Write workflow

Let’s write a function that runs our workflow:

  1. Gets hyperparameters **params we want to test
  2. For each of the 5 train-valid pairs:
    • Trains model on the train set
    • Computes loss on the validation set
  3. Averages the loss across the 5 train-valid pairs
import lightgbm as lgb

def logloss(y, pred):
    return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))

def compute_loss_cv(average=True, cv_pairs=cv_pairs, **params):    
    loss = []
    for entry in cv_pairs:
        X_train, y_train, X_valid, y_valid = entry["X_train"], entry["y_train"], entry["X_valid"], entry["y_valid"]
        model = lgb.LGBMClassifier(**params, verbosity=-1)
        model.fit(X_train, y_train, categorical_feature="Species")
        pred = model.predict_proba(X_valid)[:,1]
        loss.append(logloss(y_valid, pred))
    return np.mean(loss) if average else loss

5.1 Dry run

Let’s make a dry run with LightGBM’s default hyperparmeters:

compute_loss_cv()
0.21052427169573265

Wow, that’s much lower than even the best test losses any team got (which was 0.31365).

5.2 How can this be?

Possible explanations:

  • Bad CV setup
  • Estimation error in CV
  • Estimation error in average test loss
  • Distribution shift

5.3 Which one is the right explanation?

  • We can rule out such a big difference coming from both kinds of estimation error by looking at standard errors
  • The CV setup looks rather benign, but we cannot rule some mistake that we don’t understand

To check whether distribution shift is a reasonable explanation, we can look at the loss on the different folds:

compute_loss_cv(average=False)
[0.6140327071331733,
 0.030937677956269873,
 0.1298259240912775,
 0.07473877875902928,
 0.2030862705389134]
  • Quite a lot of variability in the loss across folds. There certainly seems to be distribution shift
  • Whether distribution shift is the cause of the mismatch between CV and test errors is uncertain, but it looks like the best bet

5.4 If there’s distribution shift, who guarantees that minimizing CV error will result in the lowest test error?

  • There is no guarantee!
  • Since the CV validation folds span multiple different periods, we can hope that the resulting hyperparameters will be robust to the different possibilities of how 2013 will look like
  • Of course, 2013 can be something we’ve never seen before, and than there really is no guarantee..
  • Of course, we can also overfit the hyperparameters, but that’s a different problem that exists even with IID data

6 Write grid workflow

Let’s write a function that gets multiple values for each hyperparameter, and runs over all possible combinations.

Example:

  • Input - a=[1,2,3], b=[4,5], c=[6,7]
  • All possible combinations:
a=1, b=4, c=6
a=1, b=4, c=7
a=1, b=5, c=6
a=1, b=6, c=7
a=2, b=4, c=6
a=2, b=4, c=7
a=2, b=5, c=6
a=2, b=6, c=7
a=3, b=4, c=6
a=3, b=4, c=7
a=3, b=5, c=6
a=3, b=6, c=7
import itertools

def compute_loss_cv_grid(cv_pairs=cv_pairs, **params):
    keys = params.keys()
    out = []
    for vals in itertools.product(*params.values()):
        cur_params = dict(zip(keys, vals))
        cur_params["loss"] = compute_loss_cv(cv_pairs=cv_pairs, **cur_params)
        out.append(cur_params)
    out = pd.DataFrame(out)
    return out

6.1 How many hyperparameters/values to try at once?

Number of models to train and evaluate exponential in number of hyperparameters:

  • High run time
  • Need to be wary of overfitting

Let’s try two at a time.

6.2 Start optimizing

Start with possibly the two most important parameters, learning_rate and n_estimators, and fine sane values. We don’t know what scale to try, so we’ll try a wide range

learning_rate = [0.001, 0.01, 0.1, 1]
n_estimators = [10, 50, 100, 500, 1000]

res = compute_loss_cv_grid(learning_rate=learning_rate, n_estimators=n_estimators)
res.pivot(index="learning_rate", columns="n_estimators", values="loss")
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
n_estimators 10 50 100 500 1000
learning_rate
0.001 0.157110 0.154743 0.152676 0.145208 0.144221
0.010 0.152703 0.145403 0.144335 0.173802 0.211454
0.100 0.144287 0.173587 0.210524 0.327977 0.403627
1.000 inf inf inf inf inf

Best learning_rate decreases when n_estimator grows, as expected.

6.3 Why look at the full results and not just use the minimizer?

To use our judgement - we’re afraid of overfitting (robustness vs optimality).

6.4 Which values to choose?

learning_rate=0.001 with n_estimators=1000 seems to be best, but it takes ~10 times longer to train than learning_rate=0.01 and n_estimators=100, which is a close second. In the interest of closing loops faster, let’s go with 0.01 and 100. We can always go back to this later.

6.5 Why not look at more fine grained values of parameters?

Robustness vs optimality - not sure I’d belive finer grain. Let’s be content with what we have, at least for now - there’ll be time to redo this later if we want to.

7 Second iteration

Fix learning_rate=0.01 and n_estimators=100, and optimize two other hyperparameters:

res = compute_loss_cv_grid(
    learning_rate=[0.01], 
    n_estimators=[100], 
    min_child_samples=[1, 10, 50, 100, 500], 
    num_leaves=[2, 4, 8, 16, 32],
)
res.pivot(index="min_child_samples", columns="num_leaves", values="loss")
num_leaves 2 4 8 16 32
min_child_samples
1 0.147848 0.139904 0.141129 0.146136 0.154018
10 0.147789 0.139866 0.140464 0.145469 0.147719
50 0.148590 0.140970 0.141581 0.140995 0.141122
100 0.149249 0.143717 0.143350 0.141989 0.141585
500 0.149668 0.145030 0.143113 0.143077 0.143077

7.1 Which values to choose?

Best is num_leaves=4 and min_child_samples=10. But so little leaves is very restrictive, we might get stuch in a local minimum later. I’d prefer something more expressive with harsher regularization. Let’s go with num_leaves=16 and min_child_samples=50

8 Third iteration

res = compute_loss_cv_grid(
    learning_rate=[0.01], 
    n_estimators=[100], 
    min_child_samples=[50], 
    num_leaves=[16],
    subsample_freq=[1],
    subsample=[0.1, 0.3, 0.5, 0.7, 1],
    colsample_bytree=[0.1, 0.3, 0.5, 0.7, 1],
)
res.pivot(index="subsample", columns="colsample_bytree", values="loss")
colsample_bytree 0.1 0.3 0.5 0.7 1.0
subsample
0.1 0.154482 0.148266 0.146551 0.145772 0.145382
0.3 0.152098 0.145236 0.142838 0.141991 0.141882
0.5 0.150475 0.143535 0.139786 0.139119 0.139109
0.7 0.149771 0.143335 0.139904 0.139730 0.139480
1.0 0.149149 0.142943 0.141105 0.140899 0.140995

8.1 Which values to choose?

Let’s go with subsample=0.5 and colsample_bytree=1.

9 Fourth iteration

Let’s revisit hyperparameters we already tuned

res = compute_loss_cv_grid(
    learning_rate=[0.01], 
    n_estimators=[50, 100, 200, 500, 1000], 
    min_child_samples=[50], 
    num_leaves=[2, 4, 8, 16, 32, 64],
    subsample_freq=[1],
    subsample=[0.5],
)
res.pivot(index="n_estimators", columns="num_leaves", values="loss")
num_leaves 2 4 8 16 32 64
n_estimators
50 0.150326 0.146671 0.145793 0.144577 0.144013 0.144021
100 0.146804 0.141641 0.140430 0.139109 0.138633 0.138632
200 0.141689 0.137051 0.137512 0.136871 0.136542 0.136617
500 0.135850 0.137136 0.141636 0.141912 0.142949 0.143002
1000 0.135035 0.142490 0.154726 0.157556 0.160945 0.161720

9.1 Which values to choose?

Looks like it’s doubling down on num_leaves=2 and n_estimators=1000. Let’s give it what it wants

10 Fifth iteration

res = compute_loss_cv_grid(
    learning_rate=[0.01], 
    n_estimators=[200, 500, 1000, 2000, 3000], 
    min_child_samples=[50], 
    num_leaves=[2],
    subsample_freq=[1],
    subsample=[0.2, 0.5, 0.8],
)
res.pivot(index="n_estimators", columns="subsample", values="loss")
subsample 0.2 0.5 0.8
n_estimators
200 0.143651 0.141689 0.142183
500 0.138775 0.135850 0.135241
1000 0.139007 0.135035 0.134492
2000 0.140755 0.140373 0.138835
3000 0.144419 0.143348 0.142360

10.1 Which values to choose?

Stick with subsample=0.5 and n_estimators=1000. It’s a sign we’re near a local minimum!

11 Sixth iteration

Back to learning_rate and n_estimators:

res = compute_loss_cv_grid(
    learning_rate=[0.001, 0.005, 0.01, 0.05, 0.1], 
    n_estimators=[500, 1000, 2000], 
    min_child_samples=[50], 
    num_leaves=[2],
    subsample_freq=[1],
    subsample=[0.5],
)
res.pivot(index="n_estimators", columns="learning_rate", values="loss")
learning_rate 0.001 0.005 0.010 0.050 0.100
n_estimators
500 0.150937 0.140339 0.135850 0.142034 0.150572
1000 0.146906 0.135182 0.135035 0.149731 0.162379
2000 0.142059 0.135519 0.140373 0.161440 0.194760

11.1 Which values to choose?

So we stick with learning_rate=0.01 and n_estimators=1000

12 Final CV loss

params = dict(
    learning_rate=0.01, 
    n_estimators=1000, 
    min_child_samples=50, 
    num_leaves=2,
    subsample_freq=1,
    subsample=0.5,
)
compute_loss_cv(**params)
0.13503496168391

12.1 Test set loss

X_holdout = pd.read_csv(f"{path}/X_cv_holdout.csv").drop("Date", axis=1)
X_holdout.Species = X_holdout.Species.astype("category")
y_holdout = pd.read_csv(f"{path}/y_cv_holdout.csv").squeeze()

def compute_loss_test(**params):
    model = lgb.LGBMClassifier(**params, verbosity=-1)
    model.fit(X.drop("Date", axis=1), y, categorical_feature="Species")
    pred = model.predict_proba(X_holdout)[:,1]
    return logloss(y_holdout, pred)

compute_loss_test(**params)
0.31618257488091905

13 Compare with winner

winner_params = {
    "num_leaves": 10, 
    "max_depth": 3, 
    "learning_rate": 0.01, 
    "n_estimators": 200, 
    "min_child_samples": 50, 
    "subsample": 0.6, 
    "subsample_freq": 1, 
    "colsample_bytree": 0.6}

compute_loss_test(**winner_params)
0.3135625321661356

13.1 Why did winner do better than me?

Possible reasons:

  • Their CV setup better than mine
  • Their optimization was able to get lower CV loss
  • My optimization overfit
  • Estimation error on test set
  • Distribution shift happened to work in their favor
compute_loss_cv(**winner_params), compute_loss_cv(**params)
(0.13544392105605335, 0.13503496168391)

14 Winner’s CV setup

4 folds, split by row.

from sklearn.model_selection import TimeSeriesSplit

tscv = TimeSeriesSplit(n_splits=3)
cv_pairs_winner = []
for train_idx, val_idx in tscv.split(X):
    cv_pairs_winner.append({
        "X_train": X.loc[train_idx].drop("Date", axis=1),
        "y_train": y.loc[train_idx],
        "X_valid": X.loc[val_idx].drop("Date", axis=1),
        "y_valid": y.loc[val_idx],
    })
print(f"{len(cv_pairs_winner)=}")
len(cv_pairs_winner)=3

14.1 Winner’s workflow

Same for single set of paraneters

compute_loss_cv(cv_pairs=cv_pairs_winner, **winner_params), compute_loss_cv(cv_pairs=cv_pairs_winner, **params)
(0.14701005872905806, 0.14673730462706855)

14.2 Winner’s optimization

param_grid = {
    "num_leaves": [10, 20, 50],
    "max_depth": [3, 5, 10],
    "learning_rate": [0.01, 0.05, 0.1],
    "n_estimators": [100, 200, 500],
    "min_child_samples": [10, 20, 50],
    "subsample": [0.6, 0.8, 1.0],
    "subsample_freq": [1, 5],
    "colsample_bytree": [0.6, 0.8, 1.0]
}
compute_loss_cv_grid(workflow=)

param_combinations = list(itertools.product(*param_grid.values()))
param_list = [dict(zip(param_grid.keys(), values)) for values in param_combinations]

15 Bayesian optimization

from bayes_opt import BayesianOptimization

def objective(**params):
    params["subsample_freq"] = 1
    if "n_estimators" in params:
        params["n_estimators"] = int(params["n_estimators"])
    if "min_child_samples" in params:
        params["min_child_samples"] = int(params["min_child_samples"])
    if "num_leaves" in params:
        params["num_leaves"] = int(params["num_leaves"])
    loss = compute_loss_cv(**params)
    loss = min(loss, 10) # avoid infinity
    return -loss

optimizer = BayesianOptimization(
    f = objective,
    pbounds = {
        "learning_rate": (0.001, 1),
        "n_estimators": [10, 5000],
        "min_child_samples": [5, 1000],
        "num_leaves": (2, 100),
        "subsample": (0.05, 1),
        "colsample_bytree": (0.05, 1),
    },
)
optimizer.set_gp_params(alpha=0.1)
optimizer.maximize(init_points=4, n_iter=50)
|   iter    |  target   | colsam... | learni... | min_ch... | n_esti... | num_le... | subsample |
-------------------------------------------------------------------------------------------------
| 1         | -0.1973   | 0.7893    | 0.4606    | 616.8     | 2.111e+03 | 45.63     | 0.8145    |
| 2         | -0.1551   | 0.5219    | 0.6148    | 380.4     | 80.77     | 70.66     | 0.715     |
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
| 3         | -10.0     | 0.4019    | 0.9809    | 99.83     | 3.38e+03  | 92.28     | 0.07318   |
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
| 4         | -10.0     | 0.6961    | 0.4591    | 63.74     | 1.773e+03 | 26.12     | 0.494     |
| 5         | -0.2055   | 0.4697    | 0.6108    | 894.9     | 4.482e+03 | 35.63     | 0.9648    |
| 6         | -0.1502   | 0.7813    | 0.4157    | 385.8     | 73.11     | 61.44     | 0.4853    |
| 7         | -0.1496   | 0.6383    | 0.02783   | 630.9     | 182.1     | 82.39     | 0.3794    |
| 8         | -0.1567   | 0.2856    | 0.3339    | 991.6     | 2.438e+03 | 5.724     | 0.3342    |
| 9         | -0.1467   | 0.2491    | 0.03234   | 433.1     | 4.997e+03 | 25.48     | 0.4295    |
| 10        | -0.1598   | 0.8173    | 0.9689    | 998.3     | 1.751e+03 | 2.539     | 0.7187    |
| 11        | -0.1529   | 0.1131    | 0.4447    | 992.5     | 900.3     | 21.11     | 0.6196    |
| 12        | -0.1612   | 0.7957    | 0.1412    | 986.0     | 4.987e+03 | 14.22     | 0.6544    |
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_7172\556759016.py:5: RuntimeWarning: divide by zero encountered in log
  return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
| 13        | -10.0     | 0.9512    | 0.5602    | 14.25     | 562.8     | 9.954     | 0.8646    |
| 14        | -0.1463   | 0.3466    | 0.6612    | 998.8     | 31.14     | 10.96     | 0.8683    |
| 15        | -0.7633   | 0.6386    | 0.9207    | 429.0     | 4.982e+03 | 23.15     | 0.8576    |
| 16        | -0.1581   | 0.1712    | 0.3301    | 379.1     | 76.92     | 64.21     | 0.1192    |
| 17        | -1.02     | 0.5549    | 0.2717    | 18.82     | 4.538e+03 | 8.496     | 0.9222    |
| 18        | -0.1578   | 0.4339    | 0.5917    | 985.0     | 4.978e+03 | 16.93     | 0.06027   |
| 19        | -0.1578   | 0.7498    | 0.8093    | 997.3     | 3.862e+03 | 52.94     | 0.1085    |
| 20        | -0.1829   | 0.9235    | 0.4792    | 997.4     | 3.163e+03 | 10.27     | 0.9466    |
| 21        | -0.8248   | 0.4967    | 0.04328   | 5.551     | 4.97e+03  | 30.46     | 0.2458    |
| 22        | -0.1544   | 0.9108    | 0.4457    | 995.8     | 473.5     | 91.47     | 0.8975    |
| 23        | -0.1578   | 0.4196    | 0.4055    | 992.6     | 2.14e+03  | 80.69     | 0.05784   |
| 24        | -0.1589   | 0.7533    | 0.5757    | 997.7     | 1.292e+03 | 88.01     | 0.4198    |
| 25        | -0.1563   | 0.2118    | 0.4558    | 998.8     | 2.731e+03 | 99.91     | 0.3462    |
| 26        | -0.332    | 0.1382    | 0.02911   | 6.049     | 4.975e+03 | 33.67     | 0.05934   |
| 27        | -0.1827   | 0.189     | 0.4946    | 486.3     | 4.475e+03 | 97.61     | 0.4346    |
| 28        | -0.1492   | 0.7353    | 0.7633    | 747.8     | 17.39     | 88.15     | 0.6249    |
| 29        | -0.2159   | 0.5938    | 0.9056    | 999.7     | 4.237e+03 | 91.25     | 0.3161    |
| 30        | -0.1796   | 0.6109    | 0.5716    | 648.8     | 2.539e+03 | 97.21     | 0.2219    |
| 31        | -0.1768   | 0.9478    | 0.4334    | 990.5     | 3.528e+03 | 80.28     | 0.6645    |
| 32        | -0.1588   | 0.374     | 0.1951    | 708.5     | 4.759e+03 | 98.98     | 0.3771    |
| 33        | -0.1578   | 0.2759    | 0.4149    | 622.5     | 4.132e+03 | 3.942     | 0.8533    |
| 34        | -0.159    | 0.6668    | 0.1822    | 818.9     | 2.29e+03  | 92.35     | 0.9888    |
| 35        | -0.1438   | 0.6972    | 0.3458    | 611.9     | 21.19     | 3.192     | 0.8953    |
| 36        | -0.176    | 0.7005    | 0.5155    | 994.8     | 4.634e+03 | 84.26     | 0.389     |
| 37        | -0.1554   | 0.135     | 0.5606    | 998.0     | 1.671e+03 | 99.83     | 0.6122    |
| 38        | -0.1498   | 0.6632    | 0.0107    | 996.5     | 249.5     | 93.96     | 0.739     |
| 39        | -0.157    | 0.2161    | 0.3103    | 986.6     | 4.156e+03 | 5.067     | 0.5235    |
| 40        | -0.1608   | 0.5637    | 0.4224    | 996.8     | 2.897e+03 | 3.947     | 0.311     |
| 41        | -0.1529   | 0.1528    | 0.4015    | 622.2     | 4.455e+03 | 2.996     | 0.6875    |
| 42        | -0.1561   | 0.1674    | 0.6623    | 997.2     | 657.8     | 3.344     | 0.4738    |
| 43        | -0.1549   | 0.09075   | 0.4903    | 998.1     | 2.048e+03 | 99.2      | 0.5309    |
| 44        | -0.1578   | 0.9326    | 0.01754   | 785.0     | 2.448e+03 | 4.117     | 0.08708   |
| 45        | -0.1555   | 0.8372    | 0.9305    | 987.8     | 21.89     | 97.81     | 0.4793    |
| 46        | -0.1609   | 0.08902   | 0.3788    | 761.1     | 4.989e+03 | 89.04     | 0.4031    |
| 47        | -0.1563   | 0.3008    | 0.7247    | 998.4     | 1.123e+03 | 94.95     | 0.6082    |
| 48        | -0.2565   | 0.4793    | 0.5335    | 380.6     | 4.664e+03 | 4.722     | 0.5889    |
| 49        | -0.1578   | 0.3113    | 0.1147    | 999.7     | 1.449e+03 | 4.301     | 0.1844    |
| 50        | -0.1559   | 0.2611    | 0.03458   | 575.0     | 12.44     | 80.55     | 0.6905    |
| 51        | -0.1508   | 0.1458    | 0.07325   | 993.1     | 3.797e+03 | 6.168     | 0.8667    |
| 52        | -0.1546   | 0.1562    | 0.1286    | 999.4     | 3.399e+03 | 7.654     | 0.4483    |
| 53        | -0.1926   | 0.7804    | 0.6878    | 984.1     | 2.532e+03 | 99.75     | 0.7062    |
| 54        | -0.1505   | 0.786     | 0.1277    | 379.1     | 78.57     | 69.73     | 0.2305    |
=================================================================================================
optimizer.max
{'target': -0.14380339815736648,
 'params': {'colsample_bytree': 0.6971898827444669,
  'learning_rate': 0.3458299551100665,
  'min_child_samples': 611.877576735657,
  'n_estimators': 21.192616960472154,
  'num_leaves': 3.191846039469531,
  'subsample': 0.895304713759703}}