The research field of Legal Natural Language Processing (NLP) has been very active recently, with Legal Judgment Prediction (LJP) becoming one of the most extensively studied tasks. To date, most publicly released LJP datasets originate from countries with civil law. In this work, we release, for the first time, a challenging LJP dataset focused on class action cases in the US. It is the first dataset in the common law system that focuses on the harder and more realistic task involving the complaints as input instead of the often used facts summary written by the court. Additionally, we study the difficulty of the task by collecting expert human predictions, showing that even human experts can only reach 53% accuracy on this dataset. Our Longformer model clearly outperforms the human baseline (63%), despite only considering the first 2,048 tokens. Furthermore, we perform a detailed error analysis and find that the Longformer model is significantly better calibrated than the human experts. Finally, we publicly release the dataset and the code used for the experiments.
翻译:最近,法律自然语言处理(LLP)的研究领域非常活跃,法律判决预测(LJP)成为研究最广泛的任务之一。迄今为止,大多数公开公布的LJP数据集都来自大陆法系国家。在这项工作中,我们首次发布了一个具有挑战性的LJP数据集,该数据集侧重于美国的集体诉讼案件。这是英美法系中的第一个数据集,侧重于将投诉作为投入的更难和更现实的任务,而不是法院经常使用的事实摘要。此外,我们通过收集人类专家预测来研究这项任务的困难,表明即使人类专家也只能达到该数据集的53%的准确度。我们的长程模型显然超越了人类基线(63%),尽管只考虑了头2 048个符号。此外,我们进行了详细的错误分析,发现长程模型比人类专家的校准要好得多。最后,我们公开公布了用于实验的数据集和代码。