Estimating the entropy rate of discrete time series is a challenging problem with important applications in numerous areas including neuroscience, genomics, image processing and natural language processing. A number of approaches have been developed for this task, typically based either on universal data compression algorithms, or on statistical estimators of the underlying process distribution. In this work, we propose a fully-Bayesian approach for entropy estimation. Building on the recently introduced Bayesian Context Trees (BCT) framework for modelling discrete time series as variable-memory Markov chains, we show that it is possible to sample directly from the induced posterior on the entropy rate. This can be used to estimate the entire posterior distribution, providing much richer information than point estimates. We develop theoretical results for the posterior distribution of the entropy rate, including proofs of consistency and asymptotic normality. The practical utility of the method is illustrated on both simulated and real-world data, where it is found to outperform state-of-the-art alternatives.
翻译:估计离散时间序列的微小速度是一个具有挑战性的问题,许多领域,包括神经科学、基因组学、图像处理和自然语言处理,都有许多重要的应用领域,包括神经科学、基因组学、图像处理和自然语言处理。已经为这项任务制定了一些方法,通常以通用数据压缩算法或基本过程分布的统计估计器为基础。在这项工作中,我们建议了一种完全Bayeyesian方法来估计微小速度。根据最近推出的作为可变分子Markov链制成离散时间序列模型的BCT框架,我们表明有可能直接从诱发的远地点取样率中取样。这可用于估计整个远地点分布,提供比点估计要丰富得多的信息。我们为酶分布的微小速度开发了理论结果,包括一致性和惯性正常性的证明。该方法的实际效用在模拟数据和现实世界数据上都作了说明,发现其效果都超过最先进的替代品。