辅助信息存在时反应变量密度的多重回归增强的革命模拟器 (A Multiple Regression-Enhanced Convolution Estimator for the Density of a Response Variable in the Presence of Auxiliary Information)

In this paper we propose a convolution estimator for estimating the density of a response variable that employs an underlying multiple regression framework to enhance the accuracy of the estimates through the incorporation of auxiliary information. Suppose we have a sample of $N$ observations of a response variable and an associated set of covariates, along with an additional auxiliary sample featuring $M$ observations of the covariates only. We prove that the mean square error of the multiple regression-enhanced estimator converges as $O(N^{-1})$, and additionally, for a large fixed $N$, the mean square error converges as $O(M^{-4/5})$ before eventually tailing off as a saturation point is reached. Thus, while the incorporation of auxiliary covariate information isn't quite as effective as incorporating more complete case information, it nevertheless allows for significant improvements in accuracy. In contrast to convolution estimators based on the Nadaraya-Watson estimator for a nonlinear regression model, the convolution estimator proposed herein utilizes the ordinary least squares estimator for a multiple linear regression model. While this type of underlying estimator is not suited to strongly nonlinear data, its strength lies in the fact that it allows the multiple regression-enhanced convolution estimator to provide better performance on data that is generally linear or well fit by low order polynomials, since the ordinary least squares estimator estimator does not suffer from the curse of dimensionality and does not require one to choose hyperparameters. The estimator proposed in this paper is particularly useful estimating the density of a response variable that is challenging to measure, while being in possession of a large amount of auxiliary information. In fact, an application of this type from the field of ophthalmology motivated our work in this paper.

翻译：在本文中, 我们提出一个共变估算器, 用于估计响应变量的密度, 该变量使用一个基本的多重回归框架, 通过整合辅助信息来提高估算的准确性。假设我们有一个对响应变量和相关共变数的抽样, 以及一个额外辅助样本, 仅包含共变数的观测$M美元。我们证明, 多次回归增强的估算器的平均平方差是 $O (N ⁇ -1}) 美元, 另外, 对于一个大固定的美元, 平均平方差会以普通的美元( M ⁇ -4/5} 美元相交, 以便提高估算结果的准确性能。辅助共变数信息的整合非常有效, 但也允许更完整的案件信息整合。与基于 Nadaraya- Watson 的递增缩缩缩缩图的平方位估测器相比, 此处提议的平方平方差错误会以普通的最小的平方差( $O (M) 4- 4/5 美元美元) 相交汇, 最终尾推算器会以最不固定的平的平面的平局状态来评估数据, 而非的平局数据在多个平局的平局数据中, 要求一个更精确的平局的平局的平局的平局的平局的计算法, 的平局的平局的平局的平的计算法, 使得算法的平, 使得的平, 的平的平的平的平的平的平的平的平的平的平的平, 使这个模型的平的平的平, 的平的平的平的平的平的平的平, 使得的平的平的平的平的平的平的平的平的平的平的平, 使得的平的平的平的平的平的平的平的平的平的平, 使得的平的平的平的平的平的平, 使得的平, 使得的平, 使得的平的平的平的平的平, 使得的平的平的平的平的平的平的平的平的平的平的平的平, 使得的平的平的平的平的平的平的平的平的平,