通过杠杆统计和创新搜索,封闭形式、可预见和强有力的常设常设仲裁法院 (Closed-Form, Provable, and Robust PCA via Leverage Statistics and Innovation Search)

The idea of Innovation Search, which was initially proposed for data clustering, was recently used for outlier detection. In the application of Innovation Search for outlier detection, the directions of innovation were utilized to measure the innovation of the data points. We study the Innovation Values computed by the Innovation Search algorithm under a quadratic cost function and it is proved that Innovation Values with the new cost function are equivalent to Leverage Scores. This interesting connection is utilized to establish several theoretical guarantees for a Leverage Score based robust PCA method and to design a new robust PCA method. The theoretical results include performance guarantees with different models for the distribution of outliers and the distribution of inliers. In addition, we demonstrate the robustness of the algorithms against the presence of noise. The numerical and theoretical studies indicate that while the presented approach is fast and closed-form, it can outperform most of the existing algorithms.

翻译：最初为数据群集提出的创新搜索概念,最近被用于探测异常点。在应用创新搜索外部点探测时,创新方向被用于测量数据点的创新。我们研究了创新搜索算法在一个二次成本函数下计算的创新价值,并证明创新价值与新成本函数等同于杠杆分数。这种有趣的连接用于为基于杠杆分数的稳健的五氯苯甲醚方法建立若干理论保障,并设计新的稳健的五氯苯甲醚方法。理论结果包括使用不同分配离子和离子分布模型的性能保障。此外,我们还展示了算法在噪音存在时的稳健性。数字和理论研究表明,虽然所提出的方法既快速又封闭,但能够超越大多数现有的算法。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。