Deep kernel learning (DKL) and related techniques aim to combine the representational power of neural networks with the reliable uncertainty estimates of Gaussian processes. One crucial aspect of these models is an expectation that, because they are treated as Gaussian process models optimized using the marginal likelihood, they are protected from overfitting. However, we identify situations where this is not the case. We explore this behavior, explain its origins and consider how it applies to real datasets. Through careful experimentation on the UCI, CIFAR-10, and the UTKFace datasets, we find that the overfitting from overparameterized maximum marginal likelihood, in which the model is "somewhat Bayesian", can in certain scenarios be worse than that from not being Bayesian at all. We explain how and when DKL can still be successful by investigating optimization dynamics. We also find that failures of DKL can be rectified by a fully Bayesian treatment, which leads to the desired performance improvements over standard neural networks and Gaussian processes.
翻译:深心学习( DKL) 及相关技术旨在将神经网络的表示力与Gaussian 进程的可靠不确定性估计值结合起来。 这些模型的一个重要方面是预期, 因为这些模型被视作使用边际可能性优化的Gaussian 进程模型, 保护它们不被过度适应。 然而, 我们发现情况并非如此。 我们探索了这一行为, 解释其起源, 并考虑它如何适用于真实的数据集。 通过对 UCI、 CIFAR- 10 和 UTKFace 数据集的仔细实验, 我们发现, 过度利用过度量化的最大边际可能性( 其中模型是“ 某种巴伊西亚人 ” ), 在某些情景下, 可能比非巴伊西亚人更糟糕。 我们解释 DKL 如何以及何时仍然能够通过调查优化动态来成功。 我们还发现, DKL 的失败可以通过完全的巴伊西亚处理来纠正, 从而导致标准神经网络和高斯进程预期的性能改善。