Often machine learning and statistical models will attempt to describe the majority of the data. However, there may be situations where only a fraction of the data can be fit well by a linear regression model. Here, we are interested in a case where such inliers can be identified by a Disjunctive Normal Form (DNF) formula. We give a polynomial time algorithm for the conditional linear regression task, which identifies a DNF condition together with the linear predictor on the corresponding portion of the data. In this work, we improve on previous algorithms by removing a requirement that the covariances of the data satisfying each of the terms of the condition have to all be very similar in spectral norm to the covariance of the overall condition.
翻译:通常,机器学习和统计模型会试图描述大部分数据。 但是, 在某些情况下, 仅有一小部分数据可以适合线性回归模型。 这里, 我们感兴趣的是, 可以通过分向常态公式来辨别这些直线值。 我们给有条件的线性回归任务给出一个多数值算法, 该算法将识别一个 DNF 条件和数据相应部分的线性预测或线性预测。 在这项工作中, 我们改进了以前的算法, 取消了这样的要求, 即满足每个条件的数据的共变数在光谱规范中必须与总体条件的共变数非常相似 。