关于组装方法及其分子生物签字分类的理论及其分类的局限性 (On the Salient Limitations of the Methods of Assembly Theory and their Classification of Molecular Biosignatures)

A recently introduced approach termed ``Assembly Theory", featuring a computable index based on basic principles of statistical compression has been claimed to be a novel and superior approach to classifying and distinguishing living from non-living systems and the complexity of molecular biosignatures. Here, we demonstrate that the assembly pathway method underlying this index is a suboptimal restricted version of Huffman's encoding (Fano-type), widely adopted in computer science in the 1950s, that is comparable (or inferior) to other popular statistical compression schemes. We show how simple modular instructions can mislead the assembly index, leading to failure to capture subtleties beyond trivial statistical properties that are pervasive in biological systems. We present cases whose low complexities can arbitrarily diverge from the random-like appearance to which the assembly pathway method would assign arbitrarily high statistical significance, and show that it fails in simple cases (synthetic or natural). Our theoretical and empirical results imply that the assembly index, whose computable nature we show is not an advantage, does not offer any substantial advantage over existing concepts and methods. Alternatives are discussed.

翻译：最近采用了一种名为“大会理论”的方法,其特点是基于统计压缩基本原则的可计算指数,据说是将生活与非生物系统以及分子生物标志的复杂性进行分类和区分的新颖和优异的方法。在这里,我们证明该指数所依据的组装路径法是1950年代在计算机科学中广泛采用的Huffman编码(Fano型)最不理想的限制性版本,与其他流行的统计压缩计划相似(或低劣 ) 。我们表明简单的模块指示如何能误导组装指数,导致无法捕捉到生物系统中普遍存在的微不足道的统计特性以外的微妙之处。我们介绍的这些案例,其低复杂性可能任意偏离组装路径方法会赋予任意高统计意义的随机外观,并表明在简单案例中(合成或自然)它不会成功。我们的理论和经验结果表明,组装指数(我们所显示的可比较性并非优势)不会给现有概念和方法带来任何重大好处。我们讨论了替代方法。