A Neural Network Example

Author

Ryan Giordano

One of the goals of this class is that you be able to read and understand, at a high level, what a novel machine learning method is doing. Let’s practice!

Consider this jax tutorial, with corresponding notebook (downloaded Dec 2025). To run the notebook you’ll need to install jax and install tensorflow. Make sure you can run the notebook.

Here are some questions to answer. You can also think of your own.

What is the task? Is is classification or prediction? What’s the domain of \(x_n\), and what’s the domain of \(y_n\)? Why is it reasonable that \(x_n\) should predict \(y_n\)?
Visually display some of the training points. Does its label look right?
What is the loss function by which the method will ultimately be evaluated? Where in the code is this loss specified? How is it used?
Where in the code is the empirical risk defined? Does the empirical risk use a proxy loss? If so, what is the proxy loss, and how would you justify it? Can you think of alternative proxy losses?
What function \(f(x; \theta)\) is being learned? What are the parameters, \(\theta\)? What is the dimension of \(\theta\)?
What algorithm is being used to minimze the empirical risk? What are its hyperparameters? How is the algorithm initialized? Is there any regularization?
Intuitively, how expressive is \(f(x; \theta)\)? How would you make it more expressive? Less expressive? What aspects of \(f(x; \theta)\) seem arbitrary, and how could you imagine changing them? How would you choose among different architectures?
Find some examples that the learned \(f(\cdot; \hat{\theta})\) fails at. Look at them. Does its failure make sense?
How well does the algorithm work? How do you know? Imagine a real-world use case for this algorithm. Do you have evidence based on this notebook that your estimator is good enough? Why or why not? If not, how would you get better information?
Imagine a real-world task for which the loss function used in the test set may not be appropriate. (Hint: consider measuring uncertainty.) How would you change the test set evaluation to account for this?