Activity and property prediction models are the central workhorses in drug discovery and materials sciences, but currently they have to be trained or fine-tuned for new tasks. Without training or fine-tuning, scientific language models could be used for such low-data tasks through their announced zero- and few-shot capabilities. However, their predictive quality at activity prediction is lacking. In this work, we envision a novel type of activity prediction model that is able to adapt to new prediction tasks at inference time, via understanding textual information describing the task. To this end, we propose a new architecture with separate modules for chemical and natural language inputs, and a contrastive pre-training objective on data from large biochemical databases. In extensive experiments, we show that our method CLAMP yields improved predictive performance on few-shot learning benchmarks and zero-shot problems in drug discovery. We attribute the advances of our method to the modularized architecture and to our pre-training objective.
翻译:活动和财产预测模型是药物发现和材料科学的核心工作马,但目前必须对它们进行培训或微调,以适应新的任务。没有培训或微调,科学语言模型可以通过其宣布的零和少发能力用于这种低数据任务。然而,在活动预测方面缺乏预测质量。在这项工作中,我们设想了一种新的活动预测模型,通过理解描述任务的文字信息,能够适应推论时间的新预测任务。为此,我们提议了一个新的结构,其中有化学和自然语言投入的单独模块,以及大型生物化学数据库数据的对比性培训前目标。在广泛的实验中,我们显示我们的CLAMP方法在几发学习基准和药物发现零发问题方面提高了预测性能。我们将我们的方法进展归功于模块化的架构和我们的培训前目标。</s>