Linear Regression with MLE

💡

一个线性模型在假定噪音为高斯分布的情况下，用高斯分布的函数表达式替换噪音项，通过函数变换，可以使斜率（theta）变换为由x，y标识的函数，进一步求导，可以得到最大的likelihood（通常计算log（likelihood）），进而得到由MLE估计的最佳theta值。

💡

如果噪音是高斯的，MLE的结果总是与MSE的结果相同

Gaussian model

given specific x, if theta is known

likelihood

💡

what we do mostly is finding the best theta

💡

likelihood: which theta leads to the Gaussian distribution that most likely matches the data points

calculate

this equality is only true when noise is gaussian


def likelihood(theta_hat, x, y):
  """The likelihood function for a linear model with noise sampled from a
    Gaussian distribution with zero mean and unit variance.

  Args:
    theta_hat (float): An estimate of the slope parameter.
    x (ndarray): An array of shape (samples,) that contains the input values.
    y (ndarray): An array of shape (samples,) that contains the corresponding
      measurement values to the inputs.

  Returns:
    ndarray: the likelihood values for the theta_hat estimate
  """
  sigma = 1
  # Compute Gaussian likelihood
  pdf = 1 / np.sqrt(2 * np.pi * sigma**2) * np.exp(-(y - theta_hat * x)**2 / (2 * sigma**2))

  return pdf


print(likelihood(1.0, x[1], y[1]))

💡

为了避免underflow（Arithmetic underflow，小数位数过多，导致的计算溢出），通常计算log（likelihood）另外，通常计算最小值，也就是-log（likelihood），这样更方便使用常用的优化方法。

结论


# Compute theta_hat_MLE
theta_hat_mle = (x @ y) / (x @ x)
ll = np.sum(np.log(likelihood(theta_hat_mle, x, y))) # log likelihood

# Plot the resulting distribution density
fig, ax = plt.subplots()
im = plot_density_image(x, y, theta_hat_mle, ax=ax)
plt.colorbar(im, ax=ax);
ax.scatter(x, y)
ax.set(title=fr'$\hat{{\theta}}$ = {theta_hat_mle:.2f}, log likelihood: {ll:.2f}');

Bonus

We can also see p(y|x,θ) as a function of x . This is the stimulus likelihood function, and it is useful in case we want to decode the input x from observed responses y . This is what is relevant from the point of view of a neuron that does not have access to the outside world and tries to infer what's out there from the responses of other neurons!