Transformer models have been developed in molecular science with excellent performance in applications including quantitative structure-activity relationship (QSAR) and virtual screening (VS). Compared with other types of models, however, they are large, which results in a high hardware requirement to abridge time for both training and inference processes. In this work, cross-layer parameter sharing (CLPS), and knowledge distillation (KD) are used to reduce the sizes of transformers in molecular science. Both methods not only have competitive QSAR predictive performance as compared to the original BERT model, but also are more parameter efficient. Furthermore, by integrating CLPS and KD into a two-state chemical network, we introduce a new deep lite chemical transformer model, DeLiCaTe. DeLiCaTe captures general-domains as well as task-specific knowledge, which lead to a 4x faster rate of both training and inference due to a 10- and 3-times reduction of the number of parameters and layers, respectively. Meanwhile, it achieves comparable performance in QSAR and VS modeling. Moreover, we anticipate that the model compression strategy provides a pathway to the creation of effective generative transformer models for organic drug and material design.
翻译:在分子科学中开发了分子变异模型,在应用方面表现优异,包括定量结构-活动关系(QSAR)和虚拟筛选(VS),但与其他类型模型相比,这些模型规模很大,导致培训和推断过程的缩短时间需要大量硬件;在这项工作中,跨层参数共享(CLPS)和知识蒸馏(KD)被用于减少分子科学中变异器的大小;这两种方法不仅具有与原BERT模型相比具有竞争性的QSAR预测性能,而且具有更高的参数效率;此外,通过将CLPS和KD纳入一个两州化学网络,我们采用了一个新的深层的利特化学变异器模型DeLiCaTe. DeLiCaTe. DeLiCaTe 捕捉到一般领域和特定任务知识,这导致培训速度和推论速度加快4x,因为参数和层次分别减少了10和3倍。此外,通过将CLS和VS模型纳入两个州化学网络,我们采用了一种新的深度化学变异化器模型,我们预测了一种有效的有机材料设计模型。此外,我们预测了一种模型的模型为有机材料的创建模式。