We give relative error coresets for training linear classifiers with a broad class of loss functions, including the logistic loss and hinge loss. Our construction achieves $(1\pm \epsilon)$ relative error with $\tilde O(d \cdot \mu_y(X)^2/\epsilon^2)$ points, where $\mu_y(X)$ is a natural complexity measure of the data matrix $X \in \mathbb{R}^{n \times d}$ and label vector $y \in \{-1,1\}^n$, introduced in by Munteanu et al. 2018. Our result is based on subsampling data points with probabilities proportional to their $\ell_1$ $Lewis$ $weights$. It significantly improves on existing theoretical bounds and performs well in practice, outperforming uniform subsampling along with other importance sampling methods. Our sampling distribution does not depend on the labels, so can be used for active learning. It also does not depend on the specific loss function, so a single coreset can be used in multiple training scenarios.
翻译:我们为培训具有广泛损失功能的线性分类员,包括后勤损失和断链损失,提供了相对错误核心。 我们的建筑以$\tilde O(d\cddt\mu_y(X)\2/\epsilon ⁇ 2美元)为单位实现$( 1\pm \ epsilon) 相对错误。 $\ mu_y( X)\\\\\ epsilon2美元是数据矩阵 $X\ y( X) 的自然复杂性测量值 $y( y)\ { in\ mathbb{ {r\n\ \ times d} $ 和标签矢量 $y $ \ {1\ 1\ {1\ ⁇ n$, 由 Munteanu etanu et al. 2018 引入。 我们的建筑的建筑工程以 $( 2018) 。 我们的结果基于与它们的概率成份子抽样数据点成的子取样点为单位, 和 $\ $\ $_ $1 lewis $1 $1 lewxx lexxx 重量成单位。 它在实际操作中可以使用一个单一核心部分。 它可以在多个学习中不取决于特定损失函数。 在多个训练中使用。 在多个中使用。