Machine Learning Theory

I always felt that it is easier to understand a topic when you understand its innovations and theory from a historical perspective. This shows how many of the more advanced concepts arose naturally, and makes them seem less magical.

Say we have a set of data points $\{(x_i,y_i)\}_{i=1}^{N}$where $x_i$ is age and $y_i$ is height. We want to find a function $f$ that predicts $y_i$ from $x_i$. We can use a linear function $f(x) = wx + b$ to fit the data points.