Causal inference studies using textual social media data can provide actionable insights on human behavior. Making accurate causal inferences with text requires controlling for confounding which could otherwise impart bias. Recently, many different methods for adjusting for confounders have been proposed, and we show that these existing methods disagree with one another on two datasets inspired by previous social media studies. Evaluating causal methods is challenging, as ground truth counterfactuals are almost never available. Presently, no empirical evaluation framework for causal methods using text exists, and as such, practitioners must select their methods without guidance. We contribute the first such framework, which consists of five tasks drawn from real world studies. Our framework enables the evaluation of any casual inference method using text. Across 648 experiments and two datasets, we evaluate every commonly used causal inference method and identify their strengths and weaknesses to inform social media researchers seeking to use such methods, and guide future improvements. We make all tasks, data, and models public to inform applications and encourage additional research.
翻译:使用文字社交媒体数据进行因果推断研究可以提供可操作的人类行为的洞察力。 如果对文字进行准确的因果推断,则需要控制混杂,否则可能会产生偏见。最近,提出了许多不同的混淆者调整方法,我们表明,在以前社交媒体研究所启发的两个数据集上,这些现有方法彼此不同。评价因果关系的方法具有挑战性,因为几乎永远无法找到地面真理反事实。目前,没有关于使用文字的因果关系方法的经验性评价框架,因此,从业人员必须选择其方法,而无需指导。我们为第一个这种框架贡献了五个由真实世界研究得出的任务组成的框架。我们的框架使得能够评估任何使用文字的随意推断方法。在648个实验和两个数据集中,我们评估了所有常用的因果关系方法,并找出其长处和短处,以告知社会媒体研究人员寻求使用这些方法,并指导未来的改进。我们公布所有任务、数据和模型,以告知应用情况并鼓励更多的研究。