We study the estimation of the probability to observe data further than a specified distance from a given iid sample in a metric space. The problem extends the classical problem of estimation of the missing mass in discrete spaces. We show that estimation is difficult in general and identify conditions on the distribution, under which the Good-Turing estimator and the conditional missing mass concentrate on their expectations. Applications to supervised learning are sketched.
翻译:我们研究对数据观测概率的估计,其范围大于在计量空间中与某一特定基底样本的一定距离。问题扩大了在离散空间估计缺失质量的典型问题。我们表明,估计一般很难,并查明分布条件,根据这些条件,良好试验估计值和有条件缺失质量将注意力集中在期望值上。对监督学习的应用作了草图。