治疗数据共同点:机器学习数据集和药物发现与发展任务 (Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development)

Therapeutics machine learning is an emerging field with incredible opportunities for innovatiaon and impact. However, advancement in this field requires formulation of meaningful learning tasks and careful curation of datasets. Here, we introduce Therapeutics Data Commons (TDC), the first unifying platform to systematically access and evaluate machine learning across the entire range of therapeutics. To date, TDC includes 66 AI-ready datasets spread across 22 learning tasks and spanning the discovery and development of safe and effective medicines. TDC also provides an ecosystem of tools and community resources, including 33 data functions and types of meaningful data splits, 23 strategies for systematic model evaluation, 17 molecule generation oracles, and 29 public leaderboards. All resources are integrated and accessible via an open Python library. We carry out extensive experiments on selected datasets, demonstrating that even the strongest algorithms fall short of solving key therapeutics challenges, including real dataset distributional shifts, multi-scale modeling of heterogeneous data, and robust generalization to novel data points. We envision that TDC can facilitate algorithmic and scientific advances and considerably accelerate machine-learning model development, validation and transition into biomedical and clinical implementation. TDC is an open-science initiative available at https://tdcommons.ai.

翻译：治疗机的学习是一个新兴领域,有令人难以置信的革新和影响机会。然而,这个领域的进步需要制定有意义的学习任务和仔细整理数据集。在这里,我们引入了治疗数据共同点(TDC),这是系统存取和评估各种治疗方法的机器学习的第一个统一平台。迄今为止,TDC包括66个全新数据集,分布在22个学习任务中,覆盖了安全和有效药品的发现和开发。TDC还提供了一个工具和社区资源的生态系统,包括33个数据功能和有意义的数据分割类型、23个系统模型评估战略、17个分子生成器和29个公共领导板。所有资源都是通过开放的Python图书馆整合和获取的。我们对某些数据集进行了广泛的实验,表明即使是最强大的算法也不足以解决关键的治疗挑战,包括真实的数据集分布转移、多尺度的混杂数据建模和对新数据点的有力概括化。我们设想TDC能够促进算和科学的进步,并大大加快机器学习模型开发、验证和过渡到MITC的临床和临床应用。

相关内容

Machine Learning

关注 2241

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

专知会员服务

39+阅读 · 2020年11月3日

面向大数据存储的大型元数据服务器的研究，A Survey on Large Scale Metadata Server for Big Data Storage

专知会员服务

9+阅读 · 2020年5月15日

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

专知会员服务

36+阅读 · 2020年5月9日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【综述】安全和健壮的医疗机器学习综述，Secure and Robust Machine Learning for Healthcare: A Survey，附22页pdf