The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins. Its statistical recovery is an important step in learning problems involving observations far away from the center. In the common situation that the components of the vector have different distributions, the rank transformation offers a convenient and robust way of standardizing data in order to build an empirical version of the angular measure based on the most extreme observations. However, the study of the sampling distribution of the resulting empirical angular measure is challenging. It is the purpose of the paper to establish finite-sample bounds for the maximal deviations between the empirical and true angular measures, uniformly over classes of Borel sets of controlled combinatorial complexity. The bounds are valid with high probability and, up to logarithmic factors, scale as the square root of the effective sample size. The bounds are applied to provide performance guarantees for two statistical learning procedures tailored to extreme regions of the input space and built upon the empirical angular measure: binary classification in extreme regions through empirical risk minimization and unsupervised anomaly detection through minimum-volume sets of the sphere.
翻译:单位范围的角度量是极端区域随机矢量各组成部分的第一阶依赖结构的特点,并以标准边距为定义。它的统计恢复是学习离中心很远的观测问题的一个重要步骤。在矢量各组成部分分布不同的常见情况下,等级变换提供了一种方便和稳健的数据标准化方法,以便根据最极端的观测建立以最极端的观测为基础的实验型量度尺度的实验性模型。然而,对由此得出的实验性角测量的抽样分布的研究具有挑战性。文件的目的是为实验性和真实的三角测量之间的最大偏差设定有限的抽样界限,统一在波罗尔组合组合组合复杂度各类之间。界限的概率很高,最高为逻辑性因素,作为有效样品大小的平方根。这些界限用于为针对输入空间的极端区域制定两个统计性学习程序提供业绩保障,并基于经验性角测量:通过实验性风险最小最小度和最小量的球体内不可控制性异常值检测,在极端区域进行二线性分类。