将勒卡姆的功能估计方法二分化,并应用来估计看不见情况 (Dualizing Le Cam's method for functional estimation, with applications to estimating the unseens)

Le Cam's method (or the two-point method) is a commonly used tool for obtaining statistical lower bound and especially popular for functional estimation problems. This work aims to explain and give conditions for the tightness of Le Cam's lower bound in functional estimation from the perspective of convex duality. Under a variety of settings it is shown that the maximization problem that searches for the best two-point lower bound, upon dualizing, becomes a minimization problem that optimizes the bias-variance tradeoff among a family of estimators. For estimating linear functionals of a distribution our work strengthens prior results of Donoho-Liu \cite{DL91} (for quadratic loss) by dropping the H\"olderian assumption on the modulus of continuity. For exponential families our results extend those of Juditsky-Nemirovski \cite{JN09} by characterizing the minimax risk for the quadratic loss under weaker assumptions on the exponential family. We also provide an extension to the high-dimensional setting for estimating separable functionals. Notably, coupled with tools from complex analysis, this method is particularly effective for characterizing the ``elbow effect'' -- the phase transition from parametric to nonparametric rates. As the main application we derive sharp minimax rates in the Distinct elements problem (given a fraction $p$ of colored balls from an urn containing $d$ balls, the optimal error of estimating the number of distinct colors is $\tilde \Theta(d^{-\frac{1}{2}\min\{\frac{p}{1-p},1\}})$) and the Fisher's species problem (given $n$ iid observations from an unknown distribution, the optimal prediction error of the number of unseen symbols in the next (unobserved) $r \cdot n$ observations is $\tilde \Theta(n^{-\min\{\frac{1}{r+1},\frac{1}{2}\}})$).

翻译：勒 Cam 的方法( 或两点法) 是一种常用的工具, 用于获取较低约束值的统计, 特别是功能估算问题。这项工作旨在从 convex 双重性的角度解释并给 Le Cam 较低约束值的功能估算的严格性提供条件。在多种情况下, 寻找最低两点约束值的最大化问题, 一旦二元化, 就会成为一个最小化问题, 优化一个估量者家族之间的偏差权衡。为了估算我们工作分布的线性功能, 通过降低 H\ olderian 假设的连续性模式, 强化了 Donoho- Liu 2 的先前结果。 DL91} (对于二次损失损失), 通过降低 H\ " olderian 假设值1 " 的精确度。对于指数家族, 我们的结果会扩大Juditsky- Nemirovski\ {JN09}, 将微量值损失的最小值风险描述在指数性家庭内。我们还提供了一个高度设置用于估算 $ $ 美元观测结果的数值。美元美元的数值, 和极值极值的精确分析中, 这个方法是从一个不最精确的最的的的的的。