The statistical analysis of data stemming from dynamical systems, including, but not limited to, time series, routinely relies on the estimation of information theoretical quantities, most notably Shannon entropy. To this purpose, possibly the most widespread tool is provided by the so-called plug-in estimator, whose statistical properties in terms of bias and variance were investigated since the first decade after the publication of Shannon's seminal works. In the case of an underlying multinomial distribution, while the bias can be evaluated by knowing support and dataset size, variance is far more elusive. The aim of the present work is to investigate, in the multinomial case, the statistical properties of an estimator of a parameter that describes the variance of the plug-in estimator of Shannon entropy. We then exactly determine the probability distributions that maximize that parameter. The results presented here allow to set upper limits to the uncertainty of entropy assessments under the hypothesis of memoryless underlying stochastic processes.
翻译:对来自动态系统的数据的统计分析,包括但不限于时间序列,通常依赖于对信息理论数量的估计,最主要是香农星。为此目的,最广泛的工具可能由所谓的插头估计器提供,从香农原始作品出版后的第一个十年开始,该估计器在偏差和差异方面的统计属性就进行了调查。在基础多名分布的情况下,虽然偏差可以通过了解支持和数据集大小来评估,但差异远非易事。在多名例子中,当前工作的目的是调查估计参数的统计属性,该参数说明香农恒星的插头估计器的偏差。然后,我们精确地确定使该参数最大化的概率分布。这里介绍的结果使得在无记忆基础随机过程的假设下,对酶评估的不确定性设定了上限。