Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points Study, with over 39,000 examples and over 47,000 total annotations. Our fine-tuned Transformer baselines show promising results, with models performing well above random on most questions. However, on a large subset of questions, there is still room for significant improvement. As the only expert-annotated merger agreement dataset, MAUD is valuable as a benchmark for both the legal profession and the NLP community.
翻译:由于法律条款的长度和复杂性以及缺少专家附加说明的数据集,对法律文本的理解可能是一项特别艰巨的任务。为了应对这一挑战,我们引入了合并协议理解数据集(MAUD),这是一个基于美国律师协会2021年公共目标交易点研究的专家附加说明的理解理解数据集,有超过39,000个实例和超过47,000个总说明。我们经过微调的变换基线显示了有希望的结果,在大多数问题上模型的运行远远超过随机性。然而,在一大批问题上,仍有重大改进的余地。作为仅有的专家附加说明的兼并协议数据集,MAUD作为法律专业和NLP界的一个基准很有价值。