一个线性模型在假定噪音为高斯分布的情况下,用高斯分布的函数表达式替换噪音项,通过函数变换,可以使斜率(theta)变换为由x,y标识的函数,进一步求导,可以得到最大的likelihood(通常计算log(likelihood)),进而得到由MLE估计的最佳theta值。
如果噪音是高斯的,MLE的结果总是与MSE的结果相同
Gaussian model
given specific x, if theta is known
likelihood
what we do mostly is finding the best theta
likelihood: which theta leads to the Gaussian distribution that most likely matches the data points
calculate
this equality is only true when noise is gaussian
def likelihood(theta_hat, x, y): """The likelihood function for a linear model with noise sampled from a Gaussian distribution with zero mean and unit variance. Args: theta_hat (float): An estimate of the slope parameter. x (ndarray): An array of shape (samples,) that contains the input values. y (ndarray): An array of shape (samples,) that contains the corresponding measurement values to the inputs. Returns: ndarray: the likelihood values for the theta_hat estimate """ sigma = 1 # Compute Gaussian likelihood pdf = 1 / np.sqrt(2 * np.pi * sigma**2) * np.exp(-(y - theta_hat * x)**2 / (2 * sigma**2)) return pdf print(likelihood(1.0, x[1], y[1]))
为了避免underflow(Arithmetic underflow, 小数位数过多,导致的计算溢出),通常计算log(likelihood)另外,通常计算最小值,也就是-log(likelihood),这样更方便使用常用的优化方法。
结论
# Compute theta_hat_MLE theta_hat_mle = (x @ y) / (x @ x) ll = np.sum(np.log(likelihood(theta_hat_mle, x, y))) # log likelihood # Plot the resulting distribution density fig, ax = plt.subplots() im = plot_density_image(x, y, theta_hat_mle, ax=ax) plt.colorbar(im, ax=ax); ax.scatter(x, y) ax.set(title=fr'$\hat{{\theta}}$ = {theta_hat_mle:.2f}, log likelihood: {ll:.2f}');
Bonus
We can also see p(y|x,θ) as a function of x . This is the stimulus likelihood function, and it is useful in case we want to decode the input x from observed responses y . This is what is relevant from the point of view of a neuron that does not have access to the outside world and tries to infer what's out there from the responses of other neurons!