We present a robust framework to perform linear regression with missing entries in the features. By considering an elliptical data distribution, and specifically a multivariate normal model, we are able to conditionally formulate a distribution for the missing entries and present a robust framework, which minimizes the worst case error caused by the uncertainty about the missing data. We show that the proposed formulation, which naturally takes into account the dependency between different variables, ultimately reduces to a convex program, for which a customized and scalable solver can be delivered. In addition to a detailed analysis to deliver such solver, we also asymptoticly analyze the behavior of the proposed framework, and present technical discussions to estimate the required input parameters. We complement our analysis with experiments performed on synthetic, semi-synthetic, and real data, and show how the proposed formulation improves the prediction accuracy and robustness, and outperforms the competing techniques.
翻译:我们提出了一个强大的框架来进行线性回归,其中缺少了各个功能中的条目。 通过考虑椭圆数据分布,特别是多变量的正常模型,我们能够有条件地为缺失的条目制定分布,并提供一个强有力的框架,从而最大限度地减少缺失数据不确定性造成的最坏情况错误。我们表明,拟议的配方自然考虑到不同变量之间的依赖性,最终会降低到一个可交付一个定制和可缩放的求解器的曲线程序。除了详细分析以提供这种求解器外,我们还对拟议框架的行为进行零星分析,并提出技术讨论以估计所需的输入参数。我们用合成、半合成和真实数据进行的实验来补充我们的分析,并表明拟议的配方如何改进预测的准确性和坚固性,并超越了相互竞争的技术。