Given a positive function $g$ from $[0,1]$ to the reals, the function's missing mass in a sequence of iid samples, defined as the sum of $g(pr(x))$ over the missing letters $x$, is introduced and studied. The missing mass of a function generalizes the classical missing mass, and has several interesting connections to other related estimation problems. Minimax estimation is studied for order-$\alpha$ missing mass ($g(p)=p^{\alpha}$) for both integer and non-integer values of $\alpha$. Exact minimax convergence rates are obtained for the integer case. Concentration is studied for a class of functions and specific results are derived for order-$\alpha$ missing mass and missing Shannon entropy ($g(p)=-p\log p$). Sub-Gaussian tail bounds with near-optimal worst-case variance factors are derived. Two new notions of concentration, named strongly sub-Gamma and filtered sub-Gaussian concentration, are introduced and shown to result in right tail bounds that are better than those obtained from sub-Gaussian concentration.
翻译:鉴于正函数$g美元,从$[0,1美元]到实际,该函数在一系列iid样本中缺失质量,其定义为在缺失字母的x美元中等于(pr(xx)美元)的数值,引入并研究该函数的缺失质量。函数的缺失质量概括了古典缺失质量,与其他相关估算问题有几处有趣的联系。对单价-$/alpha$的缺失质量(g(p)=p ⁇ alpha}$)进行了最小估计,以整数和非整数值的值($/alpha$)为单位。为整数案件获得了超大微型趋同率。对某类功能的集中度进行了研究,并得出了某类函数的浓度和具体结果,以Sonna-alpha$(p)=-p\log p$)为单位,与其他相关的估计问题有几处有趣的联系。对近最佳情况差异因子-Gausian尾线进行了研究。引入了两种新的浓度概念,称为强烈的亚伽玛和经过过滤的亚-Gaussi浓度,在右尾框中的结果优于从子浓度。