Conditional neural processes (CNPs; Garnelo et al., 2018a) are attractive meta-learning models which produce well-calibrated predictions and are trainable via a simple maximum likelihood procedure. Although CNPs have many advantages, they are unable to model dependencies in their predictions. Various works propose solutions to this, but these come at the cost of either requiring approximate inference or being limited to Gaussian predictions. In this work, we instead propose to change how CNPs are deployed at test time, without any modifications to the model or training procedure. Instead of making predictions independently for every target point, we autoregressively define a joint predictive distribution using the chain rule of probability, taking inspiration from the neural autoregressive density estimator (NADE) literature. We show that this simple procedure allows factorised Gaussian CNPs to model highly dependent, non-Gaussian predictive distributions. Perhaps surprisingly, in an extensive range of tasks with synthetic and real data, we show that CNPs in autoregressive (AR) mode not only significantly outperform non-AR CNPs, but are also competitive with more sophisticated models that are significantly more computationally expensive and challenging to train. This performance is remarkable given that AR CNPs are not trained to model joint dependencies. Our work provides an example of how ideas from neural distribution estimation can benefit neural processes, and motivates research into the AR deployment of other neural process models.
翻译:条件神经过程(CNPs)是具有吸引力的元学习模型,能够产生经过校准的预测并通过简单的最大似然过程进行训练。尽管CNPs具有许多优点,但它们无法在其预测中模拟依赖性。有各种方法提出解决方案,但这些解决方案的代价是需要近似推理或仅限于高斯预测。在这项工作中,我们提出了一种在无需改变模型或训练过程的情况下改变CNPs测试时如何部署的方法。我们不再为每个目标点独立地做出预测,而是通过使用概率的链规则自回归地定义联合预测分布,从神经自回归密度估计(NADE)文献中获得灵感。我们展示了这个简单的过程允许分解的高斯CNPs模拟高度相关的非高斯预测分布。令人惊讶的是,在大量具有合成和实际数据的任务中,我们发现AR CNPs不仅显著优于非AR CNPs,而且与更复杂的,更具挑战性的模型竞争,并且这些模型需要显著更多的计算资源和难以训练。考虑到AR CNPs并非受过培训的联合依赖建模,这种性能是非常显著的。我们的工作提供了一个示例,说明神经分布估计的思想如何有助于神经进程,并激励对其他神经进程模型进行AR部署的研究。