Current causal text mining datasets vary in objectives, data coverage, and annotation schemes. These inconsistent efforts prevent modeling capabilities and fair comparisons of model performance. Furthermore, few datasets include cause-effect span annotations, which are needed for end-to-end causal relation extraction. To address these issues, we propose UniCausal, a unified benchmark for causal text mining across three tasks: (I) Causal Sequence Classification, (II) Cause-Effect Span Detection and (III) Causal Pair Classification. We consolidated and aligned annotations of six high quality, mainly human-annotated, corpora, resulting in a total of 58,720, 12,144 and 69,165 examples for each task respectively. Since the definition of causality can be subjective, our framework was designed to allow researchers to work on some or all datasets and tasks. To create an initial benchmark, we fine-tuned BERT pre-trained language models to each task, achieving 70.10% Binary F1, 52.42% Macro F1, and 84.68% Binary F1 scores respectively.
翻译:当前的因果文本挖掘数据集在目标、数据覆盖范围和注释方案方面存在差异。这些不一致的努力阻碍了建模能力和模型性能的公平比较。此外,很少有数据集包括因果关系跨度注释,而这对于端到端的因果关系提取是必要的。为了解决这些问题,我们提出了 UniCausal,这是一个因果文本挖掘的统一基准,涵盖三个任务:(I)因果序列分类,(II)起因-结果跨度检测和(III)因果对分类。我们整合并调整了六个高质量的,主要是人工注释的语料库的注释,分别得到总共58,720、12,144和69,165个示例。由于因果关系的定义可能是主观的,我们的框架设计允许研究人员在一些或所有数据集和任务上工作。为了创建一个初始基准,我们对BERT预训练语言模型进行了微调,分别获得70.10%的二进制F1、52.42%的宏F1和84.68%的二进制F1得分。