We consider the problem of mixed linear regression (MLR), where each observed sample belongs to one of $K$ unknown linear models. In practical applications, the proportions of the $K$ components are often imbalanced. Unfortunately, most MLR methods do not perform well in such settings. Motivated by this practical challenge, in this work we propose Mix-IRLS, a novel, simple and fast algorithm for MLR with excellent performance on both balanced and imbalanced mixtures. In contrast to popular approaches that recover the $K$ models simultaneously, Mix-IRLS does it sequentially using tools from robust regression. Empirically, Mix-IRLS succeeds in a broad range of settings where other methods fail. These include imbalanced mixtures, small sample sizes, presence of outliers, and an unknown number of models $K$. In addition, Mix-IRLS outperforms competing methods on several real-world datasets, in some cases by a large margin. We complement our empirical results by deriving a recovery guarantee for Mix-IRLS, which highlights its advantage on imbalanced mixtures.
翻译:我们考虑了混合线性回归(MLR)问题,每个观察到的样本都属于一个以美元计的未知线性模型。在实际应用中,美元组成部分的比例往往不平衡。不幸的是,大多数MLR方法在这种环境下效果不佳。受这一实际挑战的驱使,我们在此工作中提议Mix-IRLS,这是Mix-IRLS在平衡和不平衡的混合物上表现优异的新颖、简单和快速的算法。与同时回收美元模型的流行方法相反,Mix-IRLS依次使用强力回归工具。在多种情况下,Mix-IRLS成功成功,这包括不平衡的混合物、小样尺寸、外部人物的存在和未知的模型数量K美元。此外,Mix-IRLS在一些真实世界数据集上超越了相互竞争的方法,有时是大幅度。我们通过为Mix-IRLS提供回收保证来补充我们的经验性结果,这突出其对不平衡混合物的优势。