Citing legal opinions is a key part of legal argumentation, an expert task that requires retrieval, extraction and summarization of information from court decisions. The identification of legally salient parts in an opinion for the purpose of citation may be seen as a domain-specific formulation of a highlight extraction or passage retrieval task. As similar tasks in other domains such as web search show significant attention and improvement, progress in the legal domain is hindered by the lack of resources for training and evaluation. This paper presents a new dataset that consists of the citation graph of court opinions, which cite previously published court opinions in support of their arguments. In particular, we focus on the verbatim quotes, i.e., where the text of the original opinion is directly reused. With this approach, we explain the relative importance of different text spans of a court opinion by showcasing their usage in citations, and measuring their contribution to the relations between opinions in the citation graph. We release VerbCL, a large-scale dataset derived from CourtListener and introduce the task of highlight extraction as a single-document summarization task based on the citation graph establishing the first baseline results for this task on the VerbCL dataset.
翻译:引用法律意见是法律论证的一个关键部分,这是一项专家任务,需要检索、提取和总结法院裁决中的信息。为引用目的确定意见中具有法律意义的部分,可视为对突出提取或通过检索任务的一种针对具体领域的表述。由于在诸如网络搜索等其他领域的类似任务表明,对引文的利用和改进相当重要,法律领域的进展因缺乏培训和评估资源而受到阻碍。本文件介绍了一套新的数据集,其中包括法院意见引文图,其中引用了以前发表的法院意见,以支持其论点。特别是,我们侧重于逐字记录,即原始意见的案文直接再利用的逐字记录。我们采用这种方法,通过展示其引用中的用法,并衡量其对引文中意见之间关系的贡献,解释法院意见的不同案文的相对重要性。我们发布了VerbCL,这是来自法院Listener的大规模数据集,并引入了突出提取作为单一文件总结任务的任务,其依据的引文图表,确定了关于这项任务的首个基准数据。