Current causal text mining datasets vary in objectives, data coverage, and annotation schemes. These inconsistent efforts prevented modeling capabilities and fair comparisons of model performance. Few datasets include cause-effect span annotations, which are needed for end-to-end causal extraction. Therefore, we proposed UniCausal, a unified benchmark for causal text mining across three tasks: Causal Sequence Classification, Cause-Effect Span Detection and Causal Pair Classification. We consolidated and aligned annotations of six high quality human-annotated corpus, resulting in a total of 58,720, 12,144 and 69,165 examples for each task respectively. Since the definition of causality can be subjective, our framework was designed to allow researchers to work on some or all datasets and tasks. As an initial benchmark, we adapted BERT pre-trained models to our task and generated baseline scores. We achieved 70.10% Binary F1 score for Sequence Classification, 52.42% Macro F1 score for Span Detection, and 84.68% Binary F1 score for Pair Classification.
翻译:目前因果采矿的因果文字数据集在目标、数据覆盖范围和批注办法方面各不相同,这些不一致的努力妨碍了模型的建模能力和对模型性业绩的公平比较。很少的数据集包括因果关系说明,这是端到端因果提取所需要的。因此,我们提议UnCausal, 即因果开采的因果文字数据库的统一基准,分三项任务:因果序列分类、因果 Span 探测和Causal Pair 分类。我们合并并统一了6个高质量的人文附加说明的说明,使每项任务分别产生58 720、12 144和69 165个例子。由于因果关系的定义可能是主观的,我们的框架旨在让研究人员就某些或所有数据集和任务开展工作。作为初步基准,我们调整了BERT的预先培训模型,使其适应我们的任务,并得出基准分数。我们取得了Squence分类的70.10%的Binary F1分,Span检测的52.42%的Mcro F1分,以及Pair分类的84.68%的Binary F1分。