The brilliant method due to Good and Turing allows for estimating objects not occurring in a sample. The problem, known under names "sample coverage" or "missing mass" goes back to their cryptographic work during WWII, but over years has found has many applications, including language modeling, inference in ecology and estimation of distribution properties. This work characterizes the maximal mean-squared error of the Good-Turing estimator, for any sample \emph{and} alphabet size.
翻译:由 Good and Turing 带来的绝妙方法可以估算在样本中不会发生的对象。 以“ 抽样覆盖” 或“ 失色质量” 命名的问题可追溯到二战期间的加密工作, 但多年来发现有许多应用, 包括语言模型、 生态学推论 和分布属性估计。 这项工作是任何样本 \ emph{ 和} 字母大小的“ 良好试验测算器” 的最大平均差错的特征 。