While mixture of linear regressions (MLR) is a well-studied topic, prior works usually do not analyze such models for prediction error. In fact, {\em prediction} and {\em loss} are not well-defined in the context of mixtures. In this paper, first we show that MLR can be used for prediction where instead of predicting a label, the model predicts a list of values (also known as {\em list-decoding}). The list size is equal to the number of components in the mixture, and the loss function is defined to be minimum among the losses resulted by all the component models. We show that with this definition, a solution of the empirical risk minimization (ERM) achieves small probability of prediction error. This begs for an algorithm to minimize the empirical risk for MLR, which is known to be computationally hard. Prior algorithmic works in MLR focus on the {\em realizable} setting, i.e., recovery of parameters when data is probabilistically generated by a mixed linear (noisy) model. In this paper we show that a version of the popular alternating minimization (AM) algorithm finds the best fit lines in a dataset even when a realizable model is not assumed, under some regularity conditions on the dataset and the initial points, and thereby provides a solution for the ERM. We further provide an algorithm that runs in polynomial time in the number of datapoints, and recovers a good approximation of the best fit lines. The two algorithms are experimentally compared.
翻译:虽然线性回归( MLR) 混合物是一个研究周全的话题, 但先前的工程通常并不分析用于预测错误的模型。 事实上, ~em 预测 和 ~em 损失} 在混合物的背景下没有很好定义。 在本文中, 我们首先显示, MLR 可用于预测哪些地方, 而不是预测标签, 模型预测了一个数值列表( 也称为 ~em 列表- 解码 } ) 。 列表大小与混合物中的成分数量相等, 并且损失函数被定义为所有组件模型造成的损失中最小的。 我们显示, 有了这个定义, 经验风险最小化( ERM) 的解决方案在预测错误的概率很小。 这需要一种算法来将MLRRRR( 已知是计算硬的) 的经验风险最小化。 MLRRRR( 模型) 先前的算法工作侧重于 {em- realifable} 设置 。 i. e. 等, 当数据由混合线性( noy) 模型生成的参数时, 损失函数被定义为最小化的参数中最小化值中最小化的参数。 。 我们甚至显示, 最接近的模型的模型中, 最接近于最精确的模型中的数据是最精确的模型, 。 在最精确的模型中, 最精确的模型中, 最精确性数据。