Deep kernel processes (DKPs) generalise Bayesian neural networks, but do not require us to represent either features or weights. Instead, at each hidden layer they represent and optimize a flexible kernel. Here, we develop a Newton-like method for DKPs that converges in around 10 steps, exploiting matrix solvers initially developed in the control theory literature. These are many times faster the usual gradient descent approach. We generalise to arbitrary DKP architectures, by developing "kernel backprop", and algorithms for "kernel autodiff". While these methods currently are not Bayesian as they give point estimates and scale poorly as they are cubic in the number of datapoints, we hope they will form the basis of a new class of much more efficient approaches to optimizing deep nonlinear function approximators.
翻译:深海内核进程( DKPs) 一般化 Bayesian 神经网络( DKPs), 但不要求我们代表特性或重量。 相反, 在每一个隐藏层中, 它们代表并优化一个灵活的内核。 在这里, 我们为 DKP 开发了一个类似 牛顿 的方法, 共 约 10 个步骤, 利用最初在控制理论文献中开发的矩阵解析器。 这些是通常的梯度下降方法的多倍。 我们通过开发“ 内核反向” 和“ 内核自动反向” 算法, 来概括任意的 DKP 结构。 虽然这些方法目前不是 Bayesian, 因为它们给出点估计值和比例差, 因为它们在数据点数中是立立立的, 我们希望它们会形成一个新的更高效方法级基础, 优化深海非线性功能近似。