In this work we implement the so-called matching time estimators for estimating the entropy rate as well as the entropy production rate for symbolic sequences. These estimators are based on recurrence properties of the system, which have been shown to be appropriate to test irreversibility specially when the sequences have large correlations or memory. Based on limit theorems for matching-times we derive a maximum likelihood estimator for entropy rate assuming that we have a set of moderately short symbolic time-series of finite random duration. We show that the proposed estimator has several properties that makes it adequate to estimate entropy rate and entropy production rate (or to test irreversibility) when the sample sequences have different lengths such as the coding sequences of DNA. We test our approach in some controlled examples of Markov chains. We also implement our estimators in genomic sequences to show that the degree of irreversibility coding sequences of human DNA is significantly larger than the corresponding non-coding sequences.
翻译:在这项工作中,我们应用了所谓的匹配时间估计器来估计酶速率和符号序列的酶产能率。这些估计器基于系统的复现特性,这些特性已证明适合于测试不可逆转性,特别是当序列具有大量关联或记忆时。根据匹配时间的极限定理,我们得出了一种最大可能性估计器,假设我们拥有一套有限的随机时间的中度短暂象征性时间序列。我们显示,拟议的估计器有几个特性,足以在样本序列长度不同时,如DNA的编码序列,估计酶产率和酶产能率(或测试不可逆转性)。我们用某些可控制的Markov链例测试我们的方法。我们还在基因序列中应用了我们的测算器,以显示人类DNA的不可逆转编码序列比相应的非编码序列要大得多。