Legal passage retrieval is an important task that assists legal practitioners in the time-intensive process of finding relevant precedents to support legal arguments. This study investigates the task of retrieving legal passages or paragraphs from decisions of the Court of Justice of the European Union (CJEU), whose language is highly structured and formulaic, leading to repetitive patterns. Understanding when lexical or semantic models are more effective at handling the repetitive nature of legal language is key to developing retrieval systems that are more accurate, efficient, and transparent for specific legal domains. To this end, we explore when this routinized legal language is better suited for retrieval using methods that rely on lexical and statistical features, such as BM25, or dense retrieval models trained to capture semantic and contextual information. A qualitative and quantitative analysis with three complementary metrics shows that both lexical and dense models perform well in scenarios with more repetitive usage of language, whereas BM25 performs better than the dense models in more nuanced scenarios where repetition and verbatim~quotes are less prevalent and in longer queries. Our experiments also show that BM25 is a strong baseline, surpassing off-the-shelf dense models in 4 out of 7 performance metrics. However, fine-tuning a dense model on domain-specific data led to improved performance, surpassing BM25 in most metrics, and we analyze the effect of the amount of data used in fine-tuning on the model's performance and temporal robustness. The code, dataset and appendix related to this work are available on: https://github.com/larimo/lexsem-legal-ir.
翻译:法律段落检索是一项重要任务,可协助法律从业者在寻找支持法律论点的相关判例这一耗时过程中提高效率。本研究调查从欧盟法院(CJEU)判决书中检索法律段落或段落的任务,其语言具有高度结构化和公式化特征,导致重复性模式的出现。理解词汇模型或语义模型何时能更有效地处理法律语言的重复特性,对于开发在特定法律领域中更准确、高效且透明的检索系统至关重要。为此,我们探讨这种程式化的法律语言何时更适合使用依赖词汇和统计特征的方法(如BM25)进行检索,或更适合使用经过训练以捕捉语义和上下文信息的密集检索模型。通过三项互补指标的定性与定量分析表明:在语言重复使用较多的场景中,词汇模型和密集模型均表现良好;而在重复和逐字引用较少、查询较长的更精细场景中,BM25的表现优于密集模型。我们的实验还表明,BM25是一个强大的基线模型,在7项性能指标中有4项超越现成的密集模型。然而,在领域特定数据上对密集模型进行微调可提升性能,在多数指标上超越BM25。我们进一步分析了微调数据量对模型性能及时序鲁棒性的影响。本工作的相关代码、数据集及附录详见:https://github.com/larimo/lexsem-legal-ir。