Theoretically, the conditional expectation of a square-integrable random variable $Y$ given a $d$-dimensional random vector $X$ can be obtained by minimizing the mean squared distance between $Y$ and $f(X)$ over all Borel measurable functions $f \colon \mathbb{R}^d \to \mathbb{R}$. However, in many applications this minimization problem cannot be solved exactly, and instead, a numerical method which computes an approximate minimum over a suitable subfamily of Borel functions has to be used. The quality of the result depends on the adequacy of the subfamily and the performance of the numerical method. In this paper, we derive an expected value representation of the minimal mean squared distance which in many applications can efficiently be approximated with a standard Monte Carlo average. This enables us to provide guarantees for the accuracy of any numerical approximation of a given conditional expectation. We illustrate the method by assessing the quality of approximate conditional expectations obtained by linear, polynomial and neural network regression in different concrete examples.
翻译:从理论上讲,如果以美元为单位的维度随机矢量为单位,则可以通过将所有波罗尔可测量函数中Y美元和f(X)美元之间的平均正方差最小化来达到一个可成形随机变量Y$的有条件预期值。然而,在许多应用中,这一最小化问题无法完全解决,相反,必须使用一个数字方法,计算出一个合适的波罗尔函数子组的近似最低值。结果的质量取决于子家庭是否充足和数字方法的性能。在本文中,我们得出了最低平均正方差的预期值,在许多应用中,这一最小正方差与标准的蒙特卡洛平均数相近。这使我们能够保证某一有条件期望的任何数字近似值的准确性。我们用不同具体的例子来评估线性、多式和线性网络回归得出的大致有条件期望的质量,以此来说明这一方法。