Recent advancements in transfer learning have made it a promising approach for domain adaptation via transfer of learned representations. This is especially when relevant when alternate tasks have limited samples of well-defined and labeled data, which is common in the molecule data domain. This makes transfer learning an ideal approach to solve molecular learning tasks. While Adversarial reprogramming has proven to be a successful method to repurpose neural networks for alternate tasks, most works consider source and alternate tasks within the same domain. In this work, we propose a new algorithm, Representation Reprogramming via Dictionary Learning (R2DL), for adversarially reprogramming pretrained language models for molecular learning tasks, motivated by leveraging learned representations in massive state of the art language models. The adversarial program learns a linear transformation between a dense source model input space (language data) and a sparse target model input space (e.g., chemical and biological molecule data) using a k-SVD solver to approximate a sparse representation of the encoded data, via dictionary learning. R2DL achieves the baseline established by state of the art toxicity prediction models trained on domain-specific data and outperforms the baseline in a limited training-data setting, thereby establishing avenues for domain-agnostic transfer learning for tasks with molecule data.
翻译:最近转让学习的进展使得它成为通过转让学术陈述进行领域适应的一个很有希望的方法。当交替任务有数量有限、定义明确和标签有标签的数据样本时,这一点特别相关,因为分子数据领域常见于交替任务,使转让学习成为解决分子学习任务的理想方法。虽然反向重新编程已证明是重新定位神经网络用于另类任务的成功方法,但大多数工作都考虑同一领域内的源和替代任务。在这项工作中,我们建议采用一种新的算法,即通过字典学习(R2DL),为分子学习任务重新编程经过预先训练的语言模型,进行对抗性重新编程,其动机是利用大量先进语言模型中的知识演示,解决分子学习问题。对抗性方案在密集的源模型输入空间(语言数据)和稀少的目标模型输入空间(例如化学和生物分子数据)之间进行线性转换,利用k-SVD解算法解算法解算,通过字典学习(R2DL),在艺术毒性预测模型的状态下,通过有限的数据传输模型,从而为具体域域域数据学路路段进行学习。