We seek an entropy estimator for discrete distributions with fully empirical accuracy bounds. As stated, this goal is infeasible without some prior assumptions on the distribution. We discover that a certain information moment assumption renders the problem feasible. We argue that the moment assumption is natural and, in some sense, {\em minimalistic} -- weaker than finite support or tail decay conditions. Under the moment assumption, we provide the first finite-sample entropy estimates for infinite alphabets, nearly recovering the known minimax rates. Moreover, we demonstrate that our empirical bounds are significantly sharper than the state-of-the-art bounds, for various natural distributions and non-trivial sample regimes. Along the way, we give a dimension-free analogue of the Cover-Thomas result on entropy continuity (with respect to total variation distance) for finite alphabets, which may be of independent interest.
翻译:我们寻求一个具有充分实证准确度的离散分布估计符。 正如已经指出的, 这一目标在不事先假定分布的情况下是行不通的。 我们发现, 某个信息时刻的假设使问题变得可行。 我们争论说, 时间假设是自然的, 在某些意义上, 微小的, 比有限的支持或尾尾尾的衰变条件弱。 在目前假设下, 我们为无限字母提供了第一个有限smperpy 倍增估计值, 几乎恢复了已知的迷你马克思率 。 此外, 我们证明, 我们的经验界限比最先进的界限要清晰得多, 对于各种自然分布和非三边抽样制度来说。 沿着这条道路, 我们给出一个覆盖图马的无维度相似值, 其结果就是对有限字母的连续( 与总变异距离有关), 其可能具有独立的兴趣 。