严格采用线性模型 (A rigorous introduction for linear models)

This survey is meant to provide an introduction to linear models and the theories behind them. Our goal is to give a rigorous introduction to the readers with prior exposure to ordinary least squares. In machine learning, the output is usually a nonlinear function of the input. Deep learning even aims to find a nonlinear dependence with many layers which require a large amount of computation. However, most of these algorithms build upon simple linear models. We then describe linear models from different views and find the properties and theories behind the models. The linear model is the main technique in regression problems and the primary tool for it is the least squares approximation which minimizes a sum of squared errors. This is a natural choice when we're interested in finding the regression function which minimizes the corresponding expected squared error. This survey is primarily a summary of purpose, significance of important theories behind linear models, e.g., distribution theory, minimum variance estimator. We first describe ordinary least squares from three different points of view upon which we disturb the model with random noise and Gaussian noise. By Gaussian noise, the model gives rise to the likelihood so that we introduce a maximum likelihood estimator. It also develops some distribution theories via this Gaussian disturbance. The distribution theory of least squares will help us answer various questions and introduce related applications. We then prove least squares is the best unbiased linear model in the sense of mean squared error and most importantly, it actually approaches the theoretical limit. We end up with linear models with the Bayesian approach and beyond.

翻译：本次调查旨在为线性模型及其背后的理论提供介绍。我们的目标是对先前接触普通最小正方形的读者进行严格的介绍。在机器学习中, 输出通常是输入的非线性函数。深层学习甚至旨在找到非线性依赖性, 多层需要大量计算。但是, 大多数这些算法都建立在简单的线性模型上。我们然后从不同的角度描述线性模型, 并找到模型背后的属性和理论。线性模型是回归问题的主要方法, 而对于它来说, 线性模型的主要工具是最小的正方形近似, 将平方差之和最小的正方形错误相加。在机器学习中, 当我们有兴趣找到回归函数时, 输出结果通常是非线性函数, 从而将相应的正方差错误最小化。本次调查主要是目的的总结, 线性模型背后的重要理论, 例如分布理论, 最小的理论。我们首先从三个不同的角度描述普通的方形, 我们用随机的噪音和高调的噪音来扰乱模型。通过标点, 最小的直线性近的近点, 模型会增加一个可能性, 我们引入一个最有可能的直线性理论。