Quite a lot of variability in the loss across folds. There certainly seems to be distribution shift
Whether distribution shift is the cause of the mismatch between CV and test errors is uncertain, but it looks like the best bet
5.4 If there’s distribution shift, who guarantees that minimizing CV error will result in the lowest test error?
There is no guarantee!
Since the CV validation folds span multiple different periods, we can hope that the resulting hyperparameters will be robust to the different possibilities of how 2013 will look like
Of course, 2013 can be something we’ve never seen before, and than there really is no guarantee..
Of course, we can also overfit the hyperparameters, but that’s a different problem that exists even with IID data
6 Write grid workflow
Let’s write a function that gets multiple values for each hyperparameter, and runs over all possible combinations.
import itertoolsdef compute_loss_cv_grid(cv_pairs=cv_pairs, **params): keys = params.keys() out = []for vals in itertools.product(*params.values()): cur_params =dict(zip(keys, vals)) cur_params["loss"] = compute_loss_cv(cv_pairs=cv_pairs, **cur_params) out.append(cur_params) out = pd.DataFrame(out)return out
6.1 How many hyperparameters/values to try at once?
Number of models to train and evaluate exponential in number of hyperparameters:
High run time
Need to be wary of overfitting
Let’s try two at a time.
6.2 Start optimizing
Start with possibly the two most important parameters, learning_rate and n_estimators, and fine sane values. We don’t know what scale to try, so we’ll try a wide range
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
C:\Users\erezm\AppData\Local\Temp\ipykernel_18692\59131662.py:4: RuntimeWarning: divide by zero encountered in log
return np.mean(-(y * np.log(pred) + (1-y) * np.log(1-pred)))
n_estimators
10
50
100
500
1000
learning_rate
0.001
0.157110
0.154743
0.152676
0.145208
0.144221
0.010
0.152703
0.145403
0.144335
0.173802
0.211454
0.100
0.144287
0.173587
0.210524
0.327977
0.403627
1.000
inf
inf
inf
inf
inf
Best learning_rate decreases when n_estimator grows, as expected.
6.3 Why look at the full results and not just use the minimizer?
To use our judgement - we’re afraid of overfitting (robustness vs optimality).
6.4 Which values to choose?
learning_rate=0.001 with n_estimators=1000 seems to be best, but it takes ~10 times longer to train than learning_rate=0.01 and n_estimators=100, which is a close second. In the interest of closing loops faster, let’s go with 0.01 and 100. We can always go back to this later.
6.5 Why not look at more fine grained values of parameters?
Robustness vs optimality - not sure I’d belive finer grain. Let’s be content with what we have, at least for now - there’ll be time to redo this later if we want to.
7 Second iteration
Fix learning_rate=0.01 and n_estimators=100, and optimize two other hyperparameters:
Best is num_leaves=4 and min_child_samples=10. But so little leaves is very restrictive, we might get stuch in a local minimum later. I’d prefer something more expressive with harsher regularization. Let’s go with num_leaves=16 and min_child_samples=50