Consider a regression problem where the learner is given a large collection of $d$-dimensional data points, but can only query a small subset of the real-valued labels. How many queries are needed to obtain a $1+\epsilon$ relative error approximation of the optimum? While this problem has been extensively studied for least squares regression, little is known for other losses. An important example is least absolute deviation regression ($\ell_1$ regression) which enjoys superior robustness to outliers compared to least squares. We develop a new framework for analyzing importance sampling methods in regression problems, which enables us to show that the query complexity of least absolute deviation regression is $\Theta(d/\epsilon^2)$ up to logarithmic factors. We further extend our techniques to show the first bounds on the query complexity for any $\ell_p$ loss with $p\in(1,2)$. As a key novelty in our analysis, we introduce the notion of robust uniform convergence, which is a new approximation guarantee for the empirical loss. While it is inspired by uniform convergence in statistical learning, our approach additionally incorporates a correction term to avoid unnecessary variance due to outliers. This can be viewed as a new connection between statistical learning theory and variance reduction techniques in stochastic optimization, which should be of independent interest.
翻译:当学习者得到大量美元维度数据点时, 当学习者得到大量收集美元维度数据点时会考虑回归问题, 但只能查询一小部分实际价值的标签。 需要多少查询才能获得 $\ epsilon$ 的相对误差最佳近似值? 虽然这个问题已经为最小平方回归进行了广泛研究, 但其他损失却鲜为人知。 一个重要的例子就是最小绝对偏差回归( ell_ 1美元回归), 相对于最小平方而言, 相对于外端而言, 前者比外端强。 我们开发了一个新的框架, 用于分析回归问题中的重要抽样方法分析。 这使我们能够显示最小绝对偏差回归的查询复杂性是 $\ Theta (d/\ epsilon% 2), 直至对逻辑因素的对等值。 我们进一步扩展了我们的方法, 以显示任何 $\ ell_ p$ ( 1, 2) 损失的查询复杂性的第一界限。 作为我们分析中的关键新颖的一例, 我们引入了强的统一趋同概念, 这是经验损失的新的近似保证。 尽管它受到统计学习的统一的启发, 我们的方法会增加的理论联系, 以避免在统计优化中出现不必要变化。