We present TabPFN, an AutoML method that is competitive with the state of the art on small tabular datasets while being over 1,000$\times$ faster. Our method is very simple: it is fully entailed in the weights of a single neural network, and a single forward pass directly yields predictions for a new dataset. Our AutoML method is meta-learned using the Transformer-based Prior-Data Fitted Network (PFN) architecture and approximates Bayesian inference with a prior that is based on assumptions of simplicity and causal structures. The prior contains a large space of structural causal models and Bayesian neural networks with a bias for small architectures and thus low complexity. Furthermore, we extend the PFN approach to differentiably calibrate the prior's hyperparameters on real data. By doing so, we separate our abstract prior assumptions from their heuristic calibration on real data. Afterwards, the calibrated hyperparameters are fixed and TabPFN can be applied to any new tabular dataset at the push of a button. Finally, on 30 datasets from the OpenML-CC18 suite we show that our method outperforms boosted trees and performs on par with complex state-of-the-art AutoML systems with predictions produced in less than a second. We provide all our code and our final trained TabPFN in the supplementary materials.
翻译:我们提出TabPFN, 这是一种与小表层数据集的先进水平相比具有竞争力的AutoML方法, 它在1,000美元以上, 速度更快。 我们的方法非常简单: 它完全包含在单一神经网络的重量中, 并且是一个单一的远端传输器, 直接得出新数据集的预测。 我们的 AutoMLN 方法是使用基于变异器的先前数据适合网络( PFN) 的架构进行元学习, 并使用基于简单和因果结构假设的先前贝叶推测算。 前一种方法包含大量结构性因果模型和贝叶神经网络的空间, 对小结构有偏差, 因而复杂度较低 。 此外, 我们将PFNFN方法扩大到对前一个超参数进行不同的校准。 通过这样做, 我们将我们先前的抽象假设与真实数据的超光度校准校准校准校准校准校准系统分开。 之后, 校准的超度计可以应用在按钮推动时的任何新的表格数据集。 最后, 30个来自 OpM- CC 18 套的校准的校准系统, 我们用了我们所有的A- g- g- gromas 演示制了我们的系统, 演示制了我们所有的系统 演示制了我们的系统 。