We study the problem of finding confidence ellipsoids for an arbitrary distribution in high dimensions. Given samples from a distribution $D$ and a confidence parameter $α$, the goal is to find the smallest volume ellipsoid $E$ which has probability mass $\Pr_{D}[E] \ge 1-α$. Ellipsoids are a highly expressive class of confidence sets as they can capture correlations in the distribution, and can approximate any convex set. This problem has been studied in many different communities. In statistics, this is the classic minimum volume estimator introduced by Rousseeuw as a robust non-parametric estimator of location and scatter. However in high dimensions, it becomes NP-hard to obtain any non-trivial approximation factor in volume when the condition number $β$ of the ellipsoid (ratio of the largest to the smallest axis length) goes to $\infty$. This motivates the focus of our paper: can we efficiently find confidence ellipsoids with volume approximation guarantees when compared to ellipsoids of bounded condition number $β$? Our main result is a polynomial time algorithm that finds an ellipsoid $E$ whose volume is within a $O(β)^{γd}$ multiplicative factor of the volume of best $β$-conditioned ellipsoid while covering at least $1-O(α/γ)$ probability mass for any $γ< α$. We complement this with a computational hardness result that shows that such a dependence seems necessary up to constants in the exponent. The algorithm and analysis uses the rich primal-dual structure of the minimum volume enclosing ellipsoid and the geometric Brascamp-Lieb inequality. As a consequence, we obtain the first polynomial time algorithm with approximation guarantees on worst-case instances of the robust subspace recovery problem.
翻译:本文研究在高维空间中为任意分布寻找置信椭球的问题。给定从分布$D$中抽取的样本及置信参数$α$,目标是找到满足概率质量$\Pr_{D}[E] \ge 1-α$的最小体积椭球$E$。椭球作为一类表达能力极强的置信集,能够捕捉分布中的相关性,并可逼近任意凸集。该问题已在多个研究领域得到广泛探讨。在统计学中,这是由Rousseeuw提出的经典最小体积估计量,作为位置与散度的鲁棒非参数估计量。然而在高维情形下,当椭球条件数$β$(最长轴与最短轴长度之比)趋于$\infty$时,获得任何非平凡的体积近似因子均成为NP难问题。这引出了本文的核心关注点:能否在与有界条件数$β$的椭球比较时,高效地找到具有体积近似保证的置信椭球?我们的主要成果是一个多项式时间算法,该算法能找到一个椭球$E$,其体积在最佳$β$条件椭球体积的$O(β)^{γd}$乘性因子范围内,同时对任意$γ< α$至少覆盖$1-O(α/γ)$的概率质量。我们通过计算复杂性结果补充证明,这种指数依赖关系在常数因子内似乎是必要的。该算法与分析利用了最小包围椭球丰富的原始-对偶结构及几何Brascamp-Lieb不等式。基于此,我们首次获得了在鲁棒子空间恢复问题最坏情况实例上具有近似保证的多项式时间算法。