Leibi@COLIEE 2022:以基于集群驱动的BERT案例法检索模型为基础,将测算词汇模型与基于集群驱动的BERT案例法检索模型合并起来 (LeiBi@COLIEE 2022: Aggregating Tuned Lexical Models with a Cluster-driven BERT-based Model for Case Law Retrieval)

This paper summarizes our approaches submitted to the case law retrieval task in the Competition on Legal Information Extraction/Entailment (COLIEE) 2022. Our methodology consists of four steps; in detail, given a legal case as a query, we reformulate it by extracting various meaningful sentences or n-grams. Then, we utilize the pre-processed query case to retrieve an initial set of possible relevant legal cases, which we further re-rank. Lastly, we aggregate the relevance scores obtained by the first stage and the re-ranking models to improve retrieval effectiveness. In each step of our methodology, we explore various well-known and novel methods. In particular, to reformulate the query cases aiming to make them shorter, we extract unigrams using three different statistical methods: KLI, PLM, IDF-r, as well as models that leverage embeddings (e.g., KeyBERT). Moreover, we investigate if automatic summarization using Longformer-Encoder-Decoder (LED) can produce an effective query representation for this retrieval task. Furthermore, we propose a novel re-ranking cluster-driven approach, which leverages Sentence-BERT models that are pre-tuned on large amounts of data for embedding sentences from query and candidate documents. Finally, we employ a linear aggregation method to combine the relevance scores obtained by traditional IR models and neural-based models, aiming to incorporate the semantic understanding of neural models and the statistically measured topical relevance. We show that aggregating these relevance scores can improve the overall retrieval effectiveness.

翻译：本文总结了我们在2022年法律信息提取/零售竞争(COLIEE)中提交的判例法检索任务的方法。我们的方法由四个步骤组成:我们的方法包括四个步骤;详细,一个法律案例作为查询,我们通过抽取各种有意义的判决或n克来重新修改它。然后,我们利用预处理的查询案例检索一套初步的可能的相关法律案例,我们进一步重新排列这些案例。最后,我们汇总了第一阶段获得的相关分数和为提高检索效率而重新排名的模式。在我们方法的每一个步骤中,我们探索了各种众所周知的新颖的方法。特别是,为了重新排列整个查询案例,以便缩短这些案例,我们利用三种不同的统计方法:KLI、PLM、UNF-r以及利用嵌入模式(例如KeyBERT)的模型。此外,我们调查的是,使用Longewent-Encoder-Decoder(LELED)系统(WELED)获得的相关分数和重新排序的模型,我们建议采用新颖的分类组合组合的内值相关性方法来缩短这些查询案例。我们最后将排序的排序的内值排序的内值排序和内值排序的指数模型用于升级的指数模型的计算,我们最终的排序的排序的排序,以显示的指数模型的排序的排序的指数模型的排序的排序的指数模型,以显示的排序的排序的指数的指数的指数的排序的指数模型,我们获得的计算。

相关内容

MoDELS

关注 0

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日