Deep kernel learning and related techniques promise to combine the representational power of neural networks with the reliable uncertainty estimates of Gaussian processes. One crucial aspect of these models is an expectation that, because they are treated as Gaussian process models optimized using the marginal likelihood, they are protected from overfitting. However, we identify pathological behavior, including overfitting, on a simple toy example. We explore this pathology, explaining its origins and considering how it applies to real datasets. Through careful experimentation on UCI datasets, CIFAR-10, and the UTKFace dataset, we find that the overfitting from overparameterized deep kernel learning, in which the model is "somewhat Bayesian", can in certain scenarios be worse than that from not being Bayesian at all. However, we find that a fully Bayesian treatment of deep kernel learning can rectify this overfitting and obtain the desired performance improvements over standard neural networks and Gaussian processes.
翻译:深心学习和相关技术有望将神经网络的代表性力量与Gaussian过程的可靠不确定性估计值结合起来。 这些模型的一个重要方面是预期,由于这些模型被作为利用边际可能性优化的Gaussian过程模型来对待,因此它们不会被过度使用。然而,我们用一个简单的玩具来辨别病理行为,包括过度装配。我们探索了这种病理学,解释了其起源并考虑它如何适用于真实的数据集。通过对UCI数据集、CIFAR-10和UTKFace数据集的仔细实验,我们发现,在过度分解的深内核学习中,该模型是“某种贝叶斯人”的模型,在特定情况下可能比根本不是巴伊斯人的情况还要糟糕。然而,我们发现,对深内核学习的完全巴伊斯治疗能够纠正这一过度现象,并在标准神经网络和Gausian进程中得到预期的性能改进。