Neural processes (NPs) are models for transfer learning with properties reminiscent of Gaussian Processes (GPs). They are adept at modelling data consisting of few observations of many related functions on the same input space and are trained by minimizing a variational objective, which is computationally much less expensive than the Bayesian updating required by GPs. So far, most studies of NPs have focused on low-dimensional datasets which are not representative of realistic transfer learning tasks. Drug discovery is one application area that is characterized by datasets consisting of many chemical properties or functions which are sparsely observed, yet depend on shared features or representations of the molecular inputs. This paper applies the conditional neural process (CNP) to DOCKSTRING, a dataset of docking scores for benchmarking ML models. CNPs show competitive performance in few-shot learning tasks relative to supervised learning baselines common in chemoinformatics, as well as an alternative model for transfer learning based on pre-training and refining neural network regressors. We present a Bayesian optimization experiment which showcases the probabilistic nature of CNPs and discuss shortcomings of the model in uncertainty quantification.
翻译:神经过程(NPs)是具有与Gaussian processs(GPs)相似特性的转移学习模型(NPs),它们适应于由在同一输入空间上对许多相关功能的很少观测的少量观测组成的模拟数据模型,并经过尽量减少变异目标的培训,这个变异目标的计算成本大大低于GPs所要求的Bayesian更新。迄今为止,大部分NPs的研究侧重于不代表现实转移学习任务的低维值数据集。药物发现是一个应用领域,其特点是由许多化学特性或功能组成的数据集,这些数据集是很少观察到的,但取决于分子投入的共同特征或表示。本文将有条件的神经过程(CNP)应用到DOCKSTring(DOCKSTring),这是用于基准ML模型的对对齐分的数据集。CNPs展示了与受监管的化学实验中常见的学习基线有关的少数的竞争性业绩,以及基于培训前和精炼神经网络后退器的转移学习的替代模型。我们展示了Bayesian优化实验,展示了CNP的不确定性的定量模型。