In this paper, we study several important geometric optimization problems arising in machine learning. First, we revisit the Minimum Enclosing Ball (MEB) problem in Euclidean space $\mathbb{R}^d$. The problem has been extensively studied before, but real-world machine learning tasks often need to handle large-scale datasets so that we cannot even afford linear time algorithms. Motivated by the recent studies on {\em beyond worst-case analysis}, we introduce the notion of stability for MEB, which is natural and easy to understand. Roughly speaking, an instance of MEB is stable, if the radius of the resulting ball cannot be significantly reduced by removing a small fraction of the input points. Under the stability assumption, we present two sampling algorithms for computing radius-approximate MEB with sample complexities independent of the number of input points $n$. In particular, the second algorithm has the sample complexity even independent of the dimensionality $d$. We also consider the general case without the stability assumption. We present a hybrid algorithm that can output either a radius-approximate MEB or a covering-approximate MEB. Our algorithm improves the running time and the number of passes for the previous sublinear MEB algorithms. Our method relies on two novel techniques, the Uniform-Adaptive Sampling method and Sandwich Lemma. Furthermore, we observe that these two techniques can be generalized to design sublinear time algorithms for a broader range of geometric optimization problems with outliers in high dimensions, including MEB with outliers, one-class and two-class linear SVMs with outliers, $k$-center clustering with outliers, and flat fitting with outliers. Our proposed algorithms also work fine for kernels.
翻译:在本文中, 我们研究机器学习中出现的若干重要的几何优化问题 。 首先, 我们重新审视了 Euclidean 空间的最小封闭球( MEB) 问题 $\ mathb{R ⁇ d$ 。 这个问题以前曾得到过广泛的研究, 但真实世界机器学习任务往往需要处理大型数据集, 这样我们甚至无法负担线性时间算法。 最近关于“ 最坏情况分析之外” 的研究激励我们引入了MEB 的稳定性概念, 这是自然的, 容易理解。 粗略地说, 如果由此产生的球的半径无法通过去除一小部分输入点而大大缩小。 但是在稳定性假设下, 我们提出了两种取样算法, 用于计算半径接近的 MEB 。 特别是, 第二算法的取样复杂程度甚至独立于 维度 $d$d 分析 。 我们也可以考虑一般案例, 与稳定性假设一起考虑一般案例。 我们提出的混合算法, 既可以输出一个接近半径的 IMB 美元 或一个直径直径直径直径直径直径直的, MI MIal MIal 。