Causal discovery is fundamental for multiple scientific domains, yet extracting causal information from real world data remains a significant challenge. Given the recent success on real data, we investigate whether TabPFN, a transformer-based tabular foundation model pre-trained on synthetic datasets generated from structural causal models, encodes causal information in its internal representations. We develop an adapter framework using a learnable decoder and causal tokens that extract causal signals from TabPFN's frozen embeddings and decode them into adjacency matrices for causal discovery. Our evaluations demonstrate that TabPFN's embeddings contain causal information, outperforming several traditional causal discovery algorithms, with such causal information being concentrated in mid-range layers. These findings establish a new direction for interpretable and adaptable foundation models and demonstrate the potential for leveraging pre-trained tabular models for causal discovery.
翻译:因果发现是多个科学领域的基础,然而从现实世界数据中提取因果信息仍是一项重大挑战。鉴于近期在真实数据上的成功,我们研究了TabPFN——一种基于Transformer的表格基础模型,该模型在由结构因果模型生成的合成数据集上进行了预训练——是否在其内部表示中编码了因果信息。我们开发了一个适配器框架,利用可学习的解码器和因果标记,从TabPFN的冻结嵌入中提取因果信号,并将其解码为用于因果发现的邻接矩阵。我们的评估表明,TabPFN的嵌入包含因果信息,其表现优于多种传统因果发现算法,且此类因果信息集中在中层网络。这些发现为可解释和可适应基础模型开辟了新方向,并展示了利用预训练表格模型进行因果发现的潜力。