具有统计学习应用的经验三角措施的浓度界限 (Concentration bounds for the empirical angular measure with statistical learning applications)

The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins. Its statistical recovery is an important step in learning problems involving observations far away from the center. In the common situation when the components of the vector have different distributions, the rank transformation offers a convenient and robust way of standardizing data in order to build an empirical version of the angular measure based on the most extreme observations. However, the study of the sampling distribution of the resulting empirical angular measure is challenging. It is the purpose of the paper to establish finite-sample bounds for the maximal deviations between the empirical and true angular measures, uniformly over classes of Borel sets of controlled combinatorial complexity. The bounds are valid with high probability and scale essentially as the square root of the effective sample size, up to a logarithmic factor. Discarding the most extreme observations yields a truncated version of the empirical angular measure for which the logarithmic factor in the concentration bound is replaced by a factor depending on the truncation level. The bounds are applied to provide performance guarantees for two statistical learning procedures tailored to extreme regions of the input space and built upon the empirical angular measure: binary classification in extreme regions through empirical risk minimization and unsupervised anomaly detection through minimum-volume sets of the sphere.

翻译：单位域的角度量是极端区域随机矢量各组成部分的第一阶依赖结构的特点,并以标准边距为定义。它的统计恢复是学习离中心很远的观测问题的一个重要步骤。在矢量各组成部分分布不同的常见情况下,等级转换提供了一种方便和稳健的数据标准化方法,以便根据最极端的观测结果构建一个实验性的角度度量的实验版本。然而,对由此得出的实验性角度量的抽样分布的研究具有挑战性。本文的目的是为实验性与真正角度测量之间的最大偏差设定有限的缩放界限,在受控组合复杂程度的波雷尔组别之间统一进行。界限具有很高的概率和规模,基本上作为有效样品大小的正方根,直到一个对数因素。对最极端的观测结果进行分解,得出一个细微的实验性角度度测量模型,集中点的对数系数被取而取而代之的是非曲度水平的一个因素。在通过极端的轨迹度测量中,将一个约束的矩度测度测量程序用于通过两个极端的矩度测域的实验性测算。在通过极端的轨中,对准的精确度测测度测测测测区域,为通过两个空间测测测的实验性空间的精确度程序,为通过两个空间的实验性空间的测测测测的底的实验性测测测测度的底度的测度。