评估词汇模型与语义模型在法律公式化语言信息检索中的性能差距 (Assessing the Performance Gap Between Lexical and Semantic Models for Information Retrieval With Formulaic Legal Language)

Legal passage retrieval is an important task that assists legal practitioners in the time-intensive process of finding relevant precedents to support legal arguments. This study investigates the task of retrieving legal passages or paragraphs from decisions of the Court of Justice of the European Union (CJEU), whose language is highly structured and formulaic, leading to repetitive patterns. Understanding when lexical or semantic models are more effective at handling the repetitive nature of legal language is key to developing retrieval systems that are more accurate, efficient, and transparent for specific legal domains. To this end, we explore when this routinized legal language is better suited for retrieval using methods that rely on lexical and statistical features, such as BM25, or dense retrieval models trained to capture semantic and contextual information. A qualitative and quantitative analysis with three complementary metrics shows that both lexical and dense models perform well in scenarios with more repetitive usage of language, whereas BM25 performs better than the dense models in more nuanced scenarios where repetition and verbatim~quotes are less prevalent and in longer queries. Our experiments also show that BM25 is a strong baseline, surpassing off-the-shelf dense models in 4 out of 7 performance metrics. However, fine-tuning a dense model on domain-specific data led to improved performance, surpassing BM25 in most metrics, and we analyze the effect of the amount of data used in fine-tuning on the model's performance and temporal robustness. The code, dataset and appendix related to this work are available on: https://github.com/larimo/lexsem-legal-ir.

翻译：法律段落检索是一项重要任务，可协助法律从业者在寻找支持法律论点的相关判例这一耗时过程中提高效率。本研究调查从欧盟法院（CJEU）判决书中检索法律段落或段落的任务，其语言具有高度结构化和公式化特征，导致重复性模式的出现。理解词汇模型或语义模型何时能更有效地处理法律语言的重复特性，对于开发在特定法律领域中更准确、高效且透明的检索系统至关重要。为此，我们探讨这种程式化的法律语言何时更适合使用依赖词汇和统计特征的方法（如BM25）进行检索，或更适合使用经过训练以捕捉语义和上下文信息的密集检索模型。通过三项互补指标的定性与定量分析表明：在语言重复使用较多的场景中，词汇模型和密集模型均表现良好；而在重复和逐字引用较少、查询较长的更精细场景中，BM25的表现优于密集模型。我们的实验还表明，BM25是一个强大的基线模型，在7项性能指标中有4项超越现成的密集模型。然而，在领域特定数据上对密集模型进行微调可提升性能，在多数指标上超越BM25。我们进一步分析了微调数据量对模型性能及时序鲁棒性的影响。本工作的相关代码、数据集及附录详见：https://github.com/larimo/lexsem-legal-ir。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日