We present a new NLP task and dataset from the domain of the U.S. civil procedure. Each instance of the dataset consists of a general introduction to the case, a particular question, and a possible solution argument, accompanied by a detailed analysis of why the argument applies in that case. Since the dataset is based on a book aimed at law students, we believe that it represents a truly complex task for benchmarking modern legal language models. Our baseline evaluation shows that fine-tuning a legal transformer provides some advantage over random baseline models, but our analysis reveals that the actual ability to infer legal arguments remains a challenging open research question.
翻译:我们提出了一个新的NLP任务和来自美国民事诉讼领域的数据集。 数据集的每个实例都包括对案件的一般性介绍、一个具体问题和可能的解决方案论据,同时详细分析为什么该论点适用于该案。 由于数据集以一本面向法律学生的书为基础,我们认为它代表了为现代法律语言模型制定基准的确实复杂的任务。 我们的基线评估表明,对法律变压器进行微调比随机基线模型有一定的优势,但我们的分析表明,推断法律论据的实际能力仍然是一个具有挑战性的公开研究问题。