Lab 9 Exercise - Test Set Overfitting Reading

Author

Erez Buchweitz

The assignment

Please read the following paper carefully, in groups of up to four, and answer the following questions elaborately. Your answers are to be submitted to Gradescope in PDF format. Mention Recht’s theorem in passing

The questions

  1. Explain adaptive overfitting in terms of the noise in \(Y_test\)
  2. Can adaptive overfitting occur even if there is no noise in the \(Y\)?
  3. Explain why adaptive overfitting is bad
  4. Can a cross validation estimate of test error likewise become overfit if you run cross validatation too many times on the same data set?
  5. Why does Kaggle maintain a private test set?
  6. Why does Kaggle maintain a public test set?
  7. Suggest a way in which Kaggle could have prevented the possibility of overfitting \(S_\text{private}\) using unsupervised learning (see footnote 3 in section 2.2)
  8. The authors claim in section 3.1 that “substantial deviations from the diagonal would indicate […] overfitting”. Do you agree?
  9. How would the scatterplots in Figure 1 look like if there was substantial adaptive overfitting?
  10. If the linear regression lines in Figure 1 lied below the \(y=x\) diagonal but were parallel to it would that be concerning?

The authors claim in section 6 that “our findings call into question whether common practices such as limiting test set re-use increase the reliability of machine learning”. To discuss this claim, suppose you wish to submit predictions for Kaggle’s public test set, and consider the following scenarios:

  • You need to export your predictions into a file and manually upload that file to Kaggle’s website
  • Kaggle provides an API which you may call from your Jupyter Notebook that sends your predictions to Kaggle’s servers and returns your score on the public test set
  • You have an oracle that scores your predictions on the public test set with zero latency
  • You have full access to the public test set (both \(X\) and \(y\))
  1. Explain how you could overfit the public test set in each scenario
  2. Explain to what degree each scenario limits test set requires

Kaggle requires manually uploading the prediction file to their website to get scored on the public test set. The common setting in machine learning research and practice is that the entire test is publicly availble.

  1. Do you agree with the authors’ claim about test set reuse?
  2. Suggest different ways in which Kaggle can reduce adaptive overfitting