We demonstrate that the assembly pathway method underlying ``Assembly Theory" (AT) is a suboptimal restricted version of Huffman's encoding (Shannon-Fano type) for `counting copies,' the stated objective of the authors of AT, introduced in computer science in the 1960s and widely used by popular statistical and computable compression algorithms that have been applied to all sort of biosignatures before. We show how simple modular instructions can mislead AT, leading to failure to accomplish what the authors originally intended (counting the `number of copies') or to capture subtleties, beyond very trivial statistical properties of biological systems. We present cases whose low complexity can arbitrarily diverge from the random-like appearance to which the AT would assign arbitrarily high statistical significance, and show that it fails in simple cases (synthetic or natural) which the assembly theory was supposed to shed some light on. Our theoretical and empirical results imply that the assembly index, whose computable nature is not an advantage, does not offer any substantial improvement over existing concepts and methods, computable or (semi) uncomputable. No strong compression or algorithmic complexity results were required to prove that AT and MA are ill-defined and under-perform as compared to simple coding schemes. We show that despite the claims of experimental data, the assembly measure is driven mostly or only by InChI codes which had already been reported before to discriminate organic from inorganic compounds by other indexes.
翻译:我们展示了支撑“组装理论” (AT) 的组装路径方法在“统计拷贝”中是霍夫曼编码 (Shannon-Fano类型) 的一种次优限制版本。AT 的作者旨在统计“拷贝数量”,并在20世纪60年代引入计算机科学中,在此之后广泛应用于各种生物标记。我们展示了AT如何受简单模块化指令的影响,导致未能实现作者最初的意图(计算“拷贝数量”),也无法捕捉生物系统的微妙之处。我们展示了一些案例,它们的低复杂度可以任意与AT赋予任意高的统计显着性的随机外观不同,而且我们展示了在AT本应阐明的简单情况下(合成或天然情况下)它的失败。我们的理论和实证研究结论表明,计算性质是AT并没有比现有的计算或(半)不可计算的概念和方法提供任何实质性改进。我们展示了尽管实验数据声称,组装测量主要或只受到原子间符号化标识(InChI代码)的驱动,但它也受到其他指数的影响,这些指数早已被报告可以通过其他索引将有机化合物与无机化合物加以区分。