This paper summarizes our approaches submitted to the case law retrieval task in the Competition on Legal Information Extraction/Entailment (COLIEE) 2022. Our methodology consists of four steps; in detail, given a legal case as a query, we reformulate it by extracting various meaningful sentences or n-grams. Then, we utilize the pre-processed query case to retrieve an initial set of possible relevant legal cases, which we further re-rank. Lastly, we aggregate the relevance scores obtained by the first stage and the re-ranking models to improve retrieval effectiveness. In each step of our methodology, we explore various well-known and novel methods. In particular, to reformulate the query cases aiming to make them shorter, we extract unigrams using three different statistical methods: KLI, PLM, IDF-r, as well as models that leverage embeddings (e.g., KeyBERT). Moreover, we investigate if automatic summarization using Longformer-Encoder-Decoder (LED) can produce an effective query representation for this retrieval task. Furthermore, we propose a novel re-ranking cluster-driven approach, which leverages Sentence-BERT models that are pre-tuned on large amounts of data for embedding sentences from query and candidate documents. Finally, we employ a linear aggregation method to combine the relevance scores obtained by traditional IR models and neural-based models, aiming to incorporate the semantic understanding of neural models and the statistically measured topical relevance. We show that aggregating these relevance scores can improve the overall retrieval effectiveness.
翻译:本文总结了我们在2022年法律信息提取/零售竞争(COLIEE)中提交的判例法检索任务的方法。我们的方法由四个步骤组成:我们的方法包括四个步骤;详细,一个法律案例作为查询,我们通过抽取各种有意义的判决或n克来重新修改它。然后,我们利用预处理的查询案例检索一套初步的可能的相关法律案例,我们进一步重新排列这些案例。最后,我们汇总了第一阶段获得的相关分数和为提高检索效率而重新排名的模式。在我们方法的每一个步骤中,我们探索了各种众所周知的新颖的方法。特别是,为了重新排列整个查询案例,以便缩短这些案例,我们利用三种不同的统计方法:KLI、PLM、UNF-r以及利用嵌入模式(例如KeyBERT)的模型。此外,我们调查的是,使用Longewent-Encoder-Decoder(LELED)系统(WELED)获得的相关分数和重新排序的模型,我们建议采用新颖的分类组合组合的内值相关性方法来缩短这些查询案例。我们最后将排序的排序的内值排序的内值排序和内值排序的指数模型用于升级的指数模型的计算,我们最终的排序的排序的排序,以显示的指数模型的排序的排序的指数模型的排序的排序的指数模型,以显示的排序的排序的指数的指数的指数的排序的指数模型,我们获得的计算。