组装理论方法的显著局限及其对分子生物标记的分类 (On the Salient Limitations of the Methods of Assembly Theory and their Classification of Molecular Biosignatures) - 专知论文

会员服务 ·

0

生物标记 · 生物 · 化合物 · 霍夫曼编码 · 分子 ·

2023 年 4 月 3 日

On the Salient Limitations of the Methods of Assembly Theory and their Classification of Molecular Biosignatures

翻译：组装理论方法的显著局限及其对分子生物标记的分类

Abicumaran Uthamacumaran,Felipe S. Abrahão,Narsis A. Kiani,Hector Zenil

from arxiv, 25 pages + 13 from the appendix, 5 figures

We demonstrate that the assembly pathway method underlying ``Assembly Theory" (AT) is a suboptimal restricted version of Huffman's encoding (Shannon-Fano type) for `counting copies,' the stated objective of the authors of AT, introduced in computer science in the 1960s and widely used by popular statistical and computable compression algorithms that have been applied to all sort of biosignatures before. We show how simple modular instructions can mislead AT, leading to failure to accomplish what the authors originally intended (counting the `number of copies') or to capture subtleties, beyond very trivial statistical properties of biological systems. We present cases whose low complexity can arbitrarily diverge from the random-like appearance to which the AT would assign arbitrarily high statistical significance, and show that it fails in simple cases (synthetic or natural) which the assembly theory was supposed to shed some light on. Our theoretical and empirical results imply that the assembly index, whose computable nature is not an advantage, does not offer any substantial improvement over existing concepts and methods, computable or (semi) uncomputable. No strong compression or algorithmic complexity results were required to prove that AT and MA are ill-defined and under-perform as compared to simple coding schemes. We show that despite the claims of experimental data, the assembly measure is driven mostly or only by InChI codes which had already been reported before to discriminate organic from inorganic compounds by other indexes.

翻译：我们展示了支撑“组装理论” (AT) 的组装路径方法在“统计拷贝”中是霍夫曼编码 (Shannon-Fano类型) 的一种次优限制版本。AT 的作者旨在统计“拷贝数量”，并在20世纪60年代引入计算机科学中，在此之后广泛应用于各种生物标记。我们展示了AT如何受简单模块化指令的影响，导致未能实现作者最初的意图（计算“拷贝数量”），也无法捕捉生物系统的微妙之处。我们展示了一些案例，它们的低复杂度可以任意与AT赋予任意高的统计显着性的随机外观不同，而且我们展示了在AT本应阐明的简单情况下（合成或天然情况下）它的失败。我们的理论和实证研究结论表明，计算性质是AT并没有比现有的计算或（半）不可计算的概念和方法提供任何实质性改进。我们展示了尽管实验数据声称，组装测量主要或只受到原子间符号化标识（InChI代码）的驱动，但它也受到其他指数的影响，这些指数早已被报告可以通过其他索引将有机化合物与无机化合物加以区分。

0

相关内容

生物标记

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【经典书】主动学习理论，226页pdf，Theory of Active Learning

【经典书】主动学习理论，226页pdf，Theory of Active Learning

专知会员服务

127+阅读 · 2021年7月14日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【经典书】现代统计方法基础，267页pdf，Fundamentals of Modern Statistical Methods

【经典书】现代统计方法基础，267页pdf，Fundamentals of Modern Statistical Methods

专知会员服务

64+阅读 · 2020年8月10日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【医学图像分割| 2019新综述】生物医学图像分割的机器学习技术：技术方面综述和最新应用介绍（Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications），附35页PDF

【医学图像分割| 2019新综述】生物医学图像分割的机器学习技术：技术方面综述和最新应用介绍（Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications），附35页PDF

专知会员服务

57+阅读 · 2019年11月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

专知

32+阅读 · 2018年2月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

中国田鼠亚科 Microtini族(Rodentia: Cricetidae: Arvicolinae)的分类与系统发育研究

国家自然科学基金

0+阅读 · 2014年12月31日

分子光开关用于嵌段共聚物自组装纳米结构的超分辨荧光成像

国家自然科学基金

0+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

几类Pfaffian图的结构性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

海洋天然产物Lamellarin D糖基化衍生物的合成与构效关系研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calreticulin突变在JAK2 V617F阴性的骨髓增殖性肿瘤中的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Livin-Fibronectin分子与生物力学信号偶联介导前列腺癌“抵抗-逃离”转移机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

图的双临猜想及相关的着色问题

国家自然科学基金

0+阅读 · 2011年12月31日

Cayley图的匹配可扩性和semi-Cayley图的谱

国家自然科学基金

0+阅读 · 2011年12月31日

树、格及Hurwitz排列中的计数问题

国家自然科学基金

0+阅读 · 2008年12月31日

On the Simply-Typed Functional Machine Calculus: Categorical Semantics and Strong Normalisation

Arxiv

0+阅读 · 2023年5月25日

Reliable identification of selection mechanisms in language change

Arxiv

0+阅读 · 2023年5月25日

Can Large Language Models emulate an inductive Thematic Analysis of semi-structured interviews? An exploration and provocation on the limits of the approach and the model

Arxiv

0+阅读 · 2023年5月24日

On Degrees of Freedom in Defining and Testing Natural Language Understanding

Arxiv

0+阅读 · 2023年5月24日

Pricing Optimal Outcomes in Coupled and Non-Convex Markets: Theory and Applications to Electricity Markets

Arxiv

0+阅读 · 2023年5月23日

Reviewing Evolution of Learning Functions and Semantic Information Measures for Understanding Deep Learning

Arxiv

0+阅读 · 2023年5月23日

A Survey of Learning on Small Data

Arxiv

19+阅读 · 2022年7月29日

Forecasting: theory and practice

Arxiv

57+阅读 · 2022年1月5日

Graph Signal Processing -- Part III: Machine Learning on Graphs, from Graph Topology to Applications

Arxiv

19+阅读 · 2020年1月2日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

VIP会员

文章信息

相关主题

霍夫曼编码

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【经典书】主动学习理论，226页pdf，Theory of Active Learning

【经典书】主动学习理论，226页pdf，Theory of Active Learning

专知会员服务

127+阅读 · 2021年7月14日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【经典书】现代统计方法基础，267页pdf，Fundamentals of Modern Statistical Methods

【经典书】现代统计方法基础，267页pdf，Fundamentals of Modern Statistical Methods

专知会员服务

64+阅读 · 2020年8月10日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【医学图像分割| 2019新综述】生物医学图像分割的机器学习技术：技术方面综述和最新应用介绍（Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications），附35页PDF

【医学图像分割| 2019新综述】生物医学图像分割的机器学习技术：技术方面综述和最新应用介绍（Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications），附35页PDF

专知会员服务

57+阅读 · 2019年11月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

数据要素发展报告(2025年)：附下载

人工智能代理提升战时舰船战备水平

【NeurIPS2025教程】大语言模型规划

NeurIPS 2025 教程：深度学习训练不稳定性的理论洞见

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

专知

32+阅读 · 2018年2月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

On the Simply-Typed Functional Machine Calculus: Categorical Semantics and Strong Normalisation

Arxiv

0+阅读 · 2023年5月25日

Reliable identification of selection mechanisms in language change

Arxiv

0+阅读 · 2023年5月25日

Can Large Language Models emulate an inductive Thematic Analysis of semi-structured interviews? An exploration and provocation on the limits of the approach and the model

Arxiv

0+阅读 · 2023年5月24日

On Degrees of Freedom in Defining and Testing Natural Language Understanding

Arxiv

0+阅读 · 2023年5月24日

Pricing Optimal Outcomes in Coupled and Non-Convex Markets: Theory and Applications to Electricity Markets

Arxiv

0+阅读 · 2023年5月23日

Reviewing Evolution of Learning Functions and Semantic Information Measures for Understanding Deep Learning

Arxiv

0+阅读 · 2023年5月23日

A Survey of Learning on Small Data

Arxiv

19+阅读 · 2022年7月29日

Forecasting: theory and practice

Arxiv

57+阅读 · 2022年1月5日

Graph Signal Processing -- Part III: Machine Learning on Graphs, from Graph Topology to Applications

Arxiv

19+阅读 · 2020年1月2日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

相关基金

中国田鼠亚科 Microtini族(Rodentia: Cricetidae: Arvicolinae)的分类与系统发育研究

国家自然科学基金

0+阅读 · 2014年12月31日

分子光开关用于嵌段共聚物自组装纳米结构的超分辨荧光成像

国家自然科学基金

0+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

几类Pfaffian图的结构性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

海洋天然产物Lamellarin D糖基化衍生物的合成与构效关系研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calreticulin突变在JAK2 V617F阴性的骨髓增殖性肿瘤中的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Livin-Fibronectin分子与生物力学信号偶联介导前列腺癌“抵抗-逃离”转移机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

图的双临猜想及相关的着色问题

国家自然科学基金

0+阅读 · 2011年12月31日

Cayley图的匹配可扩性和semi-Cayley图的谱

国家自然科学基金

0+阅读 · 2011年12月31日

树、格及Hurwitz排列中的计数问题

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员