Machine learning for therapeutics is an emerging field with incredible opportunities for innovation and expansion. Despite the initial success, many key challenges remain open. Here, we introduce Therapeutics Data Commons (TDC), the first unifying framework to systematically access and evaluate machine learning across the entire range of therapeutics. At its core, TDC is a collection of curated datasets and learning tasks that can translate algorithmic innovation into biomedical and clinical implementation. To date, TDC includes 66 machine learning-ready datasets from 22 learning tasks, spanning the discovery and development of safe and effective medicines. TDC also provides an ecosystem of tools, libraries, leaderboards, and community resources, including data functions, strategies for systematic model evaluation, meaningful data splits, data processors, and molecule generation oracles. All datasets and learning tasks are integrated and accessible via an open-source library. We envision that TDC can facilitate algorithmic and scientific advances and accelerate development, validation, and transition into production and clinical implementation. TDC is a continuous, open-source initiative, and we invite contributions from the research community. TDC is publicly available at https://tdcommons.ai.
翻译:治疗用机的学习是一个新兴领域,具有令人难以置信的创新和扩展机会。尽管取得了初步成功,但许多关键挑战仍然开放。在这里,我们引入了治疗数据共同点(TDC),这是在各种治疗方法中系统获取和评估机器学习的第一个统一框架。在核心方面,TDC是一个汇集的经整理的数据集和学习任务,可以将算法创新转化为生物医学和临床实施。迄今为止,TDC包括了22项学习任务中66个机学习即成的数据集,覆盖了安全和有效药物的发现和开发。TDC还提供了一个工具、图书馆、领导板和社区资源的生态系统,包括数据功能、系统模型评估战略、有意义的数据分割、数据处理器和分子生成。所有数据集和学习任务都可以通过开放源图书馆加以整合和查阅。我们设想TDC可以促进算法和科学进步,加速发展、验证和过渡到生产和临床实施。TDC是一个持续、公开来源的举措,我们邀请研究界作出贡献。TDC可以在 https://tdcommons上公开提供。