The first tabular foundation model, TabPFN, and its successor TabPFNv2 have impacted tabular AI substantially, with dozens of methods building on it and hundreds of applications across different use cases. This report introduces TabPFN-2.5, the next generation of our tabular foundation model, built for datasets with up to 50,000 data points and 2,000 features, a 20x increase in data cells compared to TabPFNv2. TabPFN-2.5 is now the leading method for the industry standard benchmark TabArena (which contains datasets with up to 100,000 training data points), substantially outperforming tuned tree-based models and matching the accuracy of AutoGluon 1.4, a complex four-hour tuned ensemble that even includes the previous TabPFNv2. Remarkably, default TabPFN-2.5 has a 100% win rate against default XGBoost on small to medium-sized classification datasets (<=10,000 data points, 500 features) and a 87% win rate on larger datasets up to 100K samples and 2K features (85% for regression). For production use cases, we introduce a new distillation engine that converts TabPFN-2.5 into a compact MLP or tree ensemble, preserving most of its accuracy while delivering orders-of-magnitude lower latency and plug-and-play deployment. This new release will immediately strengthen the performance of the many applications and methods already built on the TabPFN ecosystem.
翻译:首个表格基础模型TabPFN及其后继版本TabPFNv2对表格人工智能领域产生了深远影响,已有数十种方法基于其构建,并在数百个不同应用场景中得到部署。本报告介绍新一代表格基础模型TabPFN-2.5,该模型专为处理高达50,000个数据点和2,000个特征的数据集而设计,其数据单元处理能力较TabPFNv2提升20倍。在行业标准基准测试TabArena(包含训练数据点高达100,000的数据集)中,TabPFN-2.5现已成为领先方法,显著超越经过调优的树模型,并与AutoGluon 1.4的精度持平——后者是经过四小时复杂调优的集成模型,甚至包含先前的TabPFNv2。值得注意的是,默认配置的TabPFN-2.5在中小型分类数据集(≤10,000数据点,500特征)上对默认XGBoost的胜率达到100%,在高达100K样本和2K特征的大型数据集上胜率达87%(回归任务为85%)。针对生产环境应用,我们引入了新型蒸馏引擎,可将TabPFN-2.5转换为紧凑的MLP或树集成模型,在保持绝大部分精度的同时,实现数量级延迟降低与即插即用部署。此次新版本发布将立即增强基于TabPFN生态系统的众多应用与方法的性能表现。