有部分不匹配数据部分不匹配数据的线性回归线性回归:带有理论保证的本地搜索 (Linear regression with partially mismatched data: local search with theoretical guarantees)

Linear regression is a fundamental modeling tool in statistics and related fields. In this paper, we study an important variant of linear regression in which the predictor-response pairs are partially mismatched. We use an optimization formulation to simultaneously learn the underlying regression coefficients and the permutation corresponding to the mismatches. The combinatorial structure of the problem leads to computational challenges. We propose and study a simple greedy local search algorithm for this optimization problem that enjoys strong theoretical guarantees and appealing computational performance. We prove that under a suitable scaling of the number of mismatched pairs compared to the number of samples and features, and certain assumptions on problem data; our local search algorithm converges to a nearly-optimal solution at a linear rate. In particular, in the noiseless case, our algorithm converges to the global optimal solution with a linear convergence rate. Based on this result, we prove an upper bound for the estimation error of the parameter. We also propose an approximate local search step that allows us to scale our approach to much larger instances. We conduct numerical experiments to gather further insights into our theoretical results, and show promising performance gains compared to existing approaches.

翻译：线性回归是统计及相关领域的基本模型工具。在本文中, 我们研究一个重要的线性回归变量, 预测者- 反应对对的对应部分不匹配。我们使用优化配方同时学习与不匹配相对的基回归系数和变异。问题的组合结构导致计算挑战。我们提议并研究一个简单贪婪的本地搜索算法, 以弥补这一优化问题, 这一问题享有强大的理论保障和对计算性能的吸引力。我们证明, 在与样本和特征数量相比的不匹配对数以及某些关于问题数据的假设的适当规模下, 我们的本地搜索算法会以线性速度归结为接近最佳的解决方案。特别是, 在无噪音的情况下, 我们的算法会以线性趋同率汇集到全球最佳解决方案。基于这个结果, 我们还证明一个估算参数错误的上限。我们还提议了一个近似本地搜索步骤, 使我们能够将我们的方法缩到大得多的例子。我们进行数字实验, 以进一步了解我们的理论结果, 并显示与现有方法相比有希望的业绩收益。