Lab 2 Exercise - Linear Regression Competition

Author

Erez Buchweitz

You are given three files:

You may download and read them like this:

import numpy as np

X_train = np.load("{path}/X_train.npy")
y_train = np.load("{path}/y_train.npy")

There are two data sets, (X_train, y_train) and (X_test, y_test), which are independent and identically distributed. However, I have hidden y_test from you. Your task is to make the best predictions of y_test possible based on the training data and on X_test.

1 How to submit predictions

You will submit a binary .npy file containing a numpy array (one-dimensional), whose length is the same as the number of rows in X_test, that will comprise the predictions of y_test. You can save an array to file like this:

import numpy as np

np.save("pred_{codename}.npy", pred)

The name of the file should be "pred_{codename}.npy" where {codename} is a secret codename of your group’s choosing, that you will be able to use later to identify your submission in the leaderboard. The file, alongside your code, should be submitted to GradeScope.

2 How your predictions will be scored

Your predictions will be compared against the real y_test using average squared error:

\[ \text{your loss} = \frac{1}{N}\sum_{n=1}^N (y_n - \text{pred}_n)^2. \]

A leaderboard will be published with every group’s codename alongside its loss on the test set, ranked from low to high. No other identifying information will be shared, so no other student will be able know your rank on the leaderboard. The leaderboard is for your own benefit! The group with the lowest test loss is the winner.

3 Allowed methods to use

You are permitted to use only methods which you have learned in the lectures or labs, or that are adjacent to them. This includes, for example, ordinary least squares and feature engineering. Any use of significantly more advanced learning algorithms might result in disqualification and zero grade.

4 How you will be graded

A submission reflecting honest effort will receive full grade. Your rank on the leaderboard will not factor into your grade in any way, except that the students in the top two performing groups after the competition has ended (that is, with lowest test loss) will receive an bonus point to their final grade (out of 100).

5 Timeline

The competition will last two consecutive lab sessions, starting from the lab session on Jan 31st and ending at the end of the lab session on Feb 7th. You will submit predictions at two times:

  • At the end of the lab session on Jan 31st.
  • At the end of the lab session on Feb 7th.

A leaderboard will be published after each submission deadline.

You are not required to work on the competition outside lab sessions, however it is encouraged. This is for your benefit! The more time you spend working with data you will find you get more out of this course.

6 GOOD LUCK AND HAVE FUN!