This study deals with the problem of information retrieval (IR) for Vietnamese legal texts. Despite being well researched in many languages, information retrieval has still not received much attention from the Vietnamese research community. This is especially true for the case of legal documents, which are hard to process. This study proposes a new approach for information retrieval for Vietnamese legal documents using sentence-transformer. Besides, various experiments are conducted to make comparisons between different transformer models, ranking scores, syllable-level, and word-level training. The experiment results show that the proposed model outperforms models used in current research on information retrieval for Vietnamese documents.
翻译:这项研究涉及越南法律文本的信息检索问题,尽管对许多语文进行了良好的研究,但越南研究界对信息检索没有给予多少注意,特别是法律文件难以处理,这项研究提出了使用句式转换工具检索越南法律文件信息的新办法,此外,还进行了各种试验,比较不同的变压器模型、排名、可调等级和文字培训,试验结果表明,拟议的模型优于目前越南文件信息检索研究所使用的模型。