Linear regression is effective at identifying interpretable trends in a data set, but averages out potentially different effects on subgroups within data. We propose an iterative algorithm based on the randomized Kaczmarz (RK) method to automatically identify subgroups in data and perform linear regression on these groups simultaneously. We prove almost sure convergence for this method, as well as linear convergence in expectation under certain conditions. The result is an interpretable collection of different weight vectors for the regressor variables that capture the different trends within data. Furthermore, we experimentally validate our convergence results by demonstrating the method can successfully identify two trends within simulated data.
翻译:线性回归有效确定数据集中可解释的趋势,但平均显示对数据内分组的潜在不同影响。我们提议采用基于随机卡茨马尔兹(RK)法的迭代算法,自动识别数据中的分组,同时对这些组进行线性回归。我们几乎可以肯定这种方法的趋同性,以及在某些条件下预期的线性趋同性。结果是对反映数据内不同趋势的递减变量的不同重量矢量进行可解释的收集。此外,我们通过证明该方法能够成功地识别模拟数据中的两种趋势,从而实验验证我们的趋同结果。