Currently, it is hard to reap the benefits of deep learning for Bayesian methods, which allow the explicit specification of prior knowledge and accurately capture model uncertainty. We present Prior-Data Fitted Networks (PFNs). PFNs leverage large-scale machine learning techniques to approximate a large set of posteriors. The only requirement for PFNs to work is the ability to sample from a prior distribution over supervised learning tasks (or functions). Our method restates the objective of posterior approximation as a supervised classification problem with a set-valued input: it repeatedly draws a task (or function) from the prior, draws a set of data points and their labels from it, masks one of the labels and learns to make probabilistic predictions for it based on the set-valued input of the rest of the data points. Presented with a set of samples from a new supervised learning task as input, PFNs make probabilistic predictions for arbitrary other data points in a single forward propagation, having learned to approximate Bayesian inference. We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems, with over 200-fold speedups in multiple setups compared to current methods. We obtain strong results in very diverse areas such as Gaussian process regression, Bayesian neural networks, classification for small tabular data sets, and few-shot image classification, demonstrating the generality of PFNs. Code and trained PFNs are released at https://github.com/automl/TransformersCanDoBayesianInference.
翻译:目前,很难从Bayesian方法的深层学习中获益,这些方法使得先前知识的清晰规格和准确捕获模型的不确定性能够得到精确的描述。我们展示了一套数据点及其标签。PFNs利用大型机器学习技术来接近大批后座。PFNs工作的唯一要求是能够从先前的分布中抽取受监督的学习任务(或功能)的样本。我们的方法重申后座近似的目标,将其作为一个有监督的分类问题,并附有一套定值输入:它反复从前面提取一个任务(或函数),从前面绘制一套数据点及其标签。我们展示了一组数据点及其标签,掩盖了其中的一个标签,并学习了根据数据点其余部分的定值投入量来为它作出概率性预测。根据一组新的受监督的学习任务作为投入的样本来展示。我们的方法再次将后座近线近线近于一个总数据点,并了解了Bayescomcomerence 。我们证明PFNSs可以近似地展示当前纸质上层的粘度轨道,我们所了解的纸质级平级/轨道的精确的轨道进程。