Legal texts routinely use concepts that are difficult to understand. Lawyers elaborate on the meaning of such concepts by, among other things, carefully investigating how have they been used in past. Finding text snippets that mention a particular concept in a useful way is tedious, time-consuming, and, hence, expensive. We assembled a data set of 26,959 sentences, coming from legal case decisions, and labeled them in terms of their usefulness for explaining selected legal concepts. Using the dataset we study the effectiveness of transformer-based models pre-trained on large language corpora to detect which of the sentences are useful. In light of models' predictions, we analyze various linguistic properties of the explanatory sentences as well as their relationship to the legal concept that needs to be explained. We show that the transformer-based models are capable of learning surprisingly sophisticated features and outperform the prior approaches to the task.
翻译:法律文本通常使用难以理解的概念; 律师们通过仔细调查过去如何使用这些概念等方法来阐述这些概念的含义; 查找以有用方式提及特定概念的文本片段是乏味的,耗时的,因此是昂贵的; 我们从法律案例裁决中收集了26 959个句子的数据集,并用解释选定法律概念的有用性将其贴上标签; 利用数据集,我们研究了基于变压器的模式的有效性,这些变压器模型先用大语言组合进行训练,以发现哪些句子有用; 根据模型预测,我们分析解释性句子的各种语言特性及其与需要解释的法律概念的关系。 我们显示,变压器模型能够学习出惊人的复杂特征,并超越先前的任务方法。