Neural processes (NPs) are models for transfer learning with properties reminiscent of Gaussian Processes (GPs). They are adept at modelling data consisting of few observations of many related functions on the same input space and are trained by minimizing a variational objective, which is computationally much less expensive than the Bayesian updating required by GPs. So far, most studies of NPs have focused on low-dimensional datasets which are not representative of realistic transfer learning tasks. Drug discovery is one application area that is characterized by datasets consisting of many chemical properties or functions which are sparsely observed, yet depend on shared features or representations of the molecular inputs. This paper applies the conditional neural process (CNP) to DOCKSTRING, a dataset of docking scores for benchmarking ML models. CNPs show competitive performance in few-shot learning tasks relative to supervised learning baselines common in QSAR modelling, as well as an alternative model for transfer learning based on pre-training and refining neural network regressors. We present a Bayesian optimization experiment which showcases the probabilistic nature of CNPs and discuss shortcomings of the model in uncertainty quantification.
翻译:神经过程(NPs)是具有与Gaussian Processs(GPs)相似特性的转移学习模式,它们熟悉由在同一输入空间上对许多相关功能的很少观测而成的模拟数据,并经过尽量减少变异目标的培训,这个变异目标在计算上比GPs要求的Bayesian更新费用低得多。迄今为止,大部分NPs的研究侧重于不代表现实转移学习任务的低维数据集。药物发现是一个应用领域,其特点是由许多化学特性或功能组成的数据集,这些数据集很少观测到,但取决于分子投入的共同特征或表示。本文将有条件的神经过程(CNP)应用到DOCKSTringing,这是衡量ML模型的对齐分数的数据集。CNPs展示了与QSAR模型中常见的受监管学习基线有关的几近的竞争性业绩,以及基于培训前和精炼神经网络回归器的转移学习的替代模型。我们介绍了一个Bayesian优化实验,展示了CNPs量化模型的不稳定性。