有部分不匹配数据部分不匹配数据的线性回归线性回归:带有理论保证的本地搜索 (Linear regression with partially mismatched data: local search with theoretical guarantees)

Linear regression is a fundamental modeling tool in statistics and related fields. In this paper, we study an important variant of linear regression in which the predictor-response pairs are partially mismatched. We use an optimization formulation to simultaneously learn the underlying regression coefficients and the permutation corresponding to the mismatches. The combinatorial structure of the problem leads to computational challenges. We propose and study a simple greedy local search algorithm for this optimization problem that enjoys strong theoretical guarantees and appealing computational performance. We prove that under a suitable scaling of the number of mismatched pairs compared to the number of samples and features, and certain assumptions on problem data; our local search algorithm converges to a nearly-optimal solution at a linear rate. In particular, in the noiseless case, our algorithm converges to the global optimal solution with a linear convergence rate. We also propose an approximate local search step that allows us to scale our approach to much larger instances. We conduct numerical experiments to gather further insights into our theoretical results and show promising performance gains compared to existing approaches.

翻译：线性回归是统计及相关领域的基本模型工具。在本文中, 我们研究一个重要的线性回归变量, 预测者- 反应对对的对应部分不匹配。我们使用优化配方同时学习与不匹配相对的基回归系数和变异。问题的组合结构导致计算挑战。我们提议并研究一个简单贪婪的本地搜索算法, 以研究这个最优化问题, 它享有很强的理论保障和吸引计算性表现。我们证明, 在与样本和特征数量相比的不匹配的对数与某些问题数据假设的适当比例下, 我们的本地搜索算法会以线性速度接近最佳的解决方案。特别是, 在无噪音的情况下, 我们的算法会以线性趋同率汇集到全球最佳解决方案。我们还提出一个近乎本地搜索步骤, 使我们能够将我们的方法推广到大得多的例子。我们进行数字实验, 以进一步了解我们的理论结果, 并显示与现有方法相比有希望的业绩收益。