In this paper, we study the problem of sparse mixed linear regression on an unlabeled dataset that is generated from linear measurements from two different regression parameter vectors. Since the data is unlabeled, our task is not only to figure out a good approximation of the regression parameter vectors but also to label the dataset correctly. In its original form, this problem is NP-hard. The most popular algorithms to solve this problem (such as Expectation-Maximization) have a tendency to stuck at local minima. We provide a novel invex relaxation for this intractable problem which leads to a solution with provable theoretical guarantees. This relaxation enables exact recovery of data labels. Furthermore, we recover a close approximation of the regression parameter vectors which match the true parameter vectors in support and sign. Our formulation uses a carefully constructed primal dual witnesses framework for the invex problem. Furthermore, we show that the sample complexity of our method is only logarithmic in terms of the dimension of the regression parameter vectors.
翻译:在本文中, 我们研究由两个不同的回归参数矢量的线性测量产生的未贴标签数据集上微薄的混合线性回归的问题。 由于数据没有标签, 我们的任务不仅在于找出回归参数矢量的良好近似值, 而且还要正确标记数据集。 最初的形式是 NP- hard 。 解决这一问题的最受欢迎的算法( 如期望- 最大化) 倾向于停留在本地迷你状态 。 我们为这个棘手问题提供了新颖的Invex 放松, 从而导致以可辨识的理论保证解决问题。 这种放松使得能够精确恢复数据标签。 此外, 我们的配方在支持和签名中恢复了与真实参数矢量匹配的回归参数参数矢量的近近近近近近。 我们的配方为 Invex 问题使用一个精心构建的原始双证人框架。 此外, 我们的样本复杂性在回归参数矢量矢量的维度方面只是对数。