Deep learning's rise since the early 2010s has transformed fields like computer vision and natural language processing and strongly influenced biomedical research. For drug discovery specifically, a key inflection - akin to vision's "ImageNet moment" - arrived in 2015, when deep neural networks surpassed traditional approaches on the Tox21 Data Challenge. This milestone accelerated the adoption of deep learning across the pharmaceutical industry, and today most major companies have integrated these methods into their research pipelines. After the Tox21 Challenge concluded, its dataset was included in several established benchmarks, such as MoleculeNet and the Open Graph Benchmark. However, during these integrations, the dataset was altered and labels were imputed or manufactured, resulting in a loss of comparability across studies. Consequently, the extent to which bioactivity and toxicity prediction methods have improved over the past decade remains unclear. To this end, we introduce a reproducible leaderboard, hosted on Hugging Face with the original Tox21 Challenge dataset, together with a set of baseline and representative methods. The current version of the leaderboard indicates that the original Tox21 winner - the ensemble-based DeepTox method - and the descriptor-based self-normalizing neural networks introduced in 2017, continue to perform competitively and rank among the top methods for toxicity prediction, leaving it unclear whether substantial progress in toxicity prediction has been achieved over the past decade. As part of this work, we make all baselines and evaluated models publicly accessible for inference via standardized API calls to Hugging Face Spaces.
翻译:自2010年代初以来,深度学习的兴起已彻底改变了计算机视觉和自然语言处理等领域,并对生物医学研究产生了深远影响。特别是在药物发现领域,一个关键的转折点——类似于视觉领域的“ImageNet时刻”——出现在2015年,当时深度神经网络在Tox21数据挑战中超越了传统方法。这一里程碑加速了深度学习在整个制药行业的应用,如今大多数主要公司已将这些方法整合到其研究流程中。Tox21挑战结束后,其数据集被纳入多个成熟基准测试中,如MoleculeNet和开放图基准。然而,在这些整合过程中,数据集被修改,标签被插补或人工生成,导致不同研究之间的可比性丧失。因此,过去十年中生物活性和毒性预测方法究竟取得了多大程度的改进仍不明确。为此,我们引入了一个可复现的排行榜,该排行榜托管在Hugging Face平台上,使用原始的Tox21挑战数据集,并附带一组基准方法和代表性方法。当前版本的排行榜显示,原始的Tox21优胜者——基于集成学习的DeepTox方法——以及2017年引入的基于描述符的自归一化神经网络,在毒性预测方面仍具有竞争力,并位居顶级方法之列,这使得过去十年毒性预测是否取得实质性进展变得不明确。作为这项工作的一部分,我们通过Hugging Face Spaces的标准API调用,将所有基准模型和评估模型公开供推理使用。