Randomized experiments have become a cornerstone of evidence-based decision-making in contexts ranging from online platforms to public health. However, in experimental settings with network interference, a unit's treatment can influence outcomes of other units, challenging both causal effect estimation and its validation. Classic validation approaches fail as outcomes are only observable under a single treatment scenario and exhibit complex correlation patterns due to interference. To address these challenges, we introduce a framework that facilitates the use of machine learning tools for both estimation and validation in causal inference. Central to our approach is the new distribution-preserving network bootstrap, a theoretically-grounded technique that generates multiple statistically-valid subpopulations from a single experiment's data. This amplification of experimental samples enables our second contribution: a counterfactual cross-validation procedure. This procedure adapts the principles of model validation to the unique constraints of causal settings, providing a rigorous, data-driven method for selecting and evaluating estimators. We extend recent causal message-passing developments by incorporating heterogeneous unit-level characteristics and varying local interactions, ensuring reliable finite-sample performance through non-asymptotic analysis. Additionally, we develop and publicly release a comprehensive benchmark toolbox featuring diverse experimental environments, from networks of interacting AI agents to ride-sharing applications. These environments provide known ground truth values while maintaining realistic complexities, enabling systematic evaluation of causal inference methods. Extensive testing across these environments demonstrates our method's robustness to diverse forms of network interference.
翻译:随机化实验已成为从在线平台到公共卫生等领域基于证据决策的基石。然而,在存在网络干扰的实验环境中,一个单元的处理可能影响其他单元的结果,这对因果效应估计及其验证均构成挑战。由于结果仅在单一处理场景下可观测,且因干扰呈现复杂的相关性模式,经典验证方法在此失效。为应对这些挑战,我们提出了一个框架,促进机器学习工具在因果推断的估计和验证中的应用。我们方法的核心是新型分布保持网络自助法——一种基于理论的技术,能从单次实验数据中生成多个统计有效的子群体。这种实验样本的扩增实现了我们的第二个贡献:反事实交叉验证程序。该程序将模型验证原理适配于因果场景的特殊约束,为估计器的选择和评估提供了严谨的数据驱动方法。我们通过纳入异质性单元级特征和变化的局部交互,扩展了近期因果消息传递的研究进展,并通过非渐近分析确保可靠的有限样本性能。此外,我们开发并公开了一个综合基准工具箱,包含从交互式AI智能体网络到网约车应用等多样化实验环境。这些环境在保持现实复杂性的同时提供已知的真实值,从而支持对因果推断方法的系统评估。在这些环境中的广泛测试表明,我们的方法对多种形式的网络干扰具有鲁棒性。