Transformer-based models for amortized probabilistic inference, such as neural processes, prior-fitted networks, and tabular foundation models, excel at single-pass marginal prediction. However, many real-world applications, from signal interpolation to multi-column tabular predictions, require coherent joint distributions that capture dependencies between predictions. While purely autoregressive architectures efficiently generate such distributions, they sacrifice the flexible set-conditioning that makes these models powerful for meta-learning. Conversely, the standard approach to obtain joint distributions from set-based models requires expensive re-encoding of the entire augmented conditioning set at each autoregressive step. We introduce a causal autoregressive buffer that preserves the advantages of both paradigms. Our approach decouples context encoding from updating the conditioning set. The model processes the context once and caches it. A dynamic buffer then captures target dependencies: as targets are incorporated, they enter the buffer and attend to both the cached context and previously buffered targets. This enables efficient batched autoregressive generation and one-pass joint log-likelihood evaluation. A unified training strategy allows seamless integration of set-based and autoregressive modes at minimal additional cost. Across synthetic functions, EEG signals, cognitive models, and tabular data, our method matches predictive accuracy of strong baselines while delivering up to 20 times faster joint sampling. Our approach combines the efficiency of autoregressive generative models with the representational power of set-based conditioning, making joint prediction practical for transformer-based probabilistic models.
翻译:基于Transformer的摊销概率推断模型(如神经过程、先验拟合网络和表格基础模型)在单次边际预测方面表现出色。然而,从信号插值到多列表格预测的许多实际应用,都需要能够捕捉预测间依赖关系的连贯联合分布。纯粹的自回归架构虽能高效生成此类分布,却牺牲了使这些模型在元学习中具有强大能力的灵活集合条件设定功能。相反,从基于集合的模型中获取联合分布的标准方法,需要在每个自回归步骤中对整个增广条件集进行代价高昂的重新编码。我们引入了一种因果自回归缓冲区,以保留两种范式的优势。我们的方法将上下文编码与条件集更新解耦。模型一次性处理上下文并缓存。随后,一个动态缓冲区捕获目标依赖关系:随着目标被纳入,它们进入缓冲区,并关注缓存的上下文和先前缓冲的目标。这实现了高效的批量自回归生成和单次联合对数似然评估。统一的训练策略允许以最小的额外成本无缝集成基于集合和自回归两种模式。在合成函数、脑电图信号、认知模型和表格数据上的实验表明,我们的方法在匹配强基线预测精度的同时,实现了高达20倍的联合采样加速。我们的方法结合了自回归生成模型的效率与基于集合的条件设定的表示能力,使得基于Transformer的概率模型能够进行实用的联合预测。