Modern approximations to Gaussian processes are suitable for "tall data", with a cost that scales well in the number of observations, but under-performs on ``wide data'', scaling poorly in the number of input features. That is, as the number of input features grows, good predictive performance requires the number of summarising variables, and their associated cost, to grow rapidly. We introduce a kernel that allows the number of summarising variables to grow exponentially with the number of input features, but requires only linear cost in both number of observations and input features. This scaling is achieved through our introduction of the B\'ezier buttress, which allows approximate inference without computing matrix inverses or determinants. We show that our kernel has close similarities to some of the most used kernels in Gaussian process regression, and empirically demonstrate the kernel's ability to scale to both tall and wide datasets.
翻译:Gaussian 过程的现代近似值适合“ 全部数据 ”, 其成本在观测数量上比例很高, 但是在“ 整个数据” 上表现不佳, 输入特征的数量比例不高。 也就是说, 随着输入特征的增多, 良好的预测性能要求总和变量的数量及其相关成本, 才能快速增长。 我们引入了一个内核, 允许总和变量的数量随着输入特征的数量而成倍增长, 但只需要在观测和输入特征的数量上都计算线性成本。 通过引入 B\'ezier 支持, 从而实现这一规模的扩大, 从而可以在不计算矩阵反向或决定因素的情况下进行大概的推断。 我们显示, 我们的内核与高斯进程回归中最常用的内核有近似之处, 并用实验性地展示内核向高和大数据集的缩缩缩缩能力 。