When comparing approximate Gaussian process (GP) models, it can be helpful to be able to generate data from any GP. If we are interested in how approximate methods perform at scale, we may wish to generate very large synthetic datasets to evaluate them. Na\"{i}vely doing so would cost \(\mathcal{O}(n^3)\) flops and \(\mathcal{O}(n^2)\) memory to generate a size \(n\) sample. We demonstrate how to scale such data generation to large \(n\) whilst still providing guarantees that, with high probability, the sample is indistinguishable from a sample from the desired GP.
翻译:当比较近似高斯进程模型时, 能够从任何GP中生成数据是有用的。 如果我们有兴趣了解近似方法在规模上如何运行, 我们也许希望生成非常大的合成数据集来评估它们。 Na\ “ {i} 这样做会花费很多成本 \ (mathcal{O} (n)3\) flops 和\ (\\ mathcal{O} (n)\ 2\) 内存来生成大小 \ (n\) 样本 。 我们演示了如何将这些数据生成规模扩大至大 \ (n\), 同时仍然提供保证, 在高概率的情况下, 样本无法与想要的GP样本区分 。