Active regression considers a linear regression problem where the learner receives a large number of data points but can only observe a small number of labels. Since online algorithms can deal with incremental training data and take advantage of low computational cost, we consider an online extension of the active regression problem: the learner receives data points one by one and immediately decides whether it should collect the corresponding labels. The goal is to efficiently maintain the regression of received data points with a small budget of label queries. We propose novel algorithms for this problem under $\ell_p$ loss where $p\in[1,2]$. To achieve a $(1+\epsilon)$-approximate solution, our proposed algorithms only require $\tilde{\mathcal{O}}(\epsilon^{-1} d \log(n\kappa))$ queries of labels, where $n$ is the number of data points and $\kappa$ is a quantity, called the condition number, of the data points. The numerical results verify our theoretical results and show that our methods have comparable performance with offline active regression algorithms.
翻译:主动回归会考虑一个线性回归问题, 即学习者获得大量数据点, 但只能观察少量标签。 由于在线算法可以处理递增培训数据并利用低计算成本, 我们考虑主动回归问题的在线扩展: 学习者将一个接一个地接收数据点, 并立即决定是否应该收集相应的标签。 目标是以少量标签查询预算, 有效地维持收到的数据点的回归。 我们提议在$\ ell_ p$损失下对此问题采用新的算法, 以美元为单位[ 1, 2] 。 由于在线算法可以处理递增培训数据并利用低计算成本。 由于在线算法可以处理递增培训数据并利用低计算成本, 我们拟议的算法只要求 $\ telde\ mathcal{O} (\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\