Legal artificial intelligence (LegalAI) aims to benefit legal systems with the technology of artificial intelligence, especially natural language processing (NLP). Recently, inspired by the success of pre-trained language models (PLMs) in the generic domain, many LegalAI researchers devote their effort to apply PLMs to legal tasks. However, utilizing PLMs to address legal tasks is still challenging, as the legal documents usually consist of thousands of tokens, which is far longer than the length that mainstream PLMs can process. In this paper, we release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding. We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering. The experimental results demonstrate that our model can achieve promising improvement on tasks with long documents as inputs.
翻译:法律人造情报(LALAI)旨在利用人工智能技术,特别是自然语言处理技术,使法律制度受益。 最近,在通用领域预先培训的语言模式取得成功的启发下,许多法律人造情报(LALAI)研究人员致力于将人造情报应用于法律任务,然而,利用人造情报处理法律任务仍然具有挑战性,因为法律文件通常由数千个象征物组成,远远长于主流人造情报所能够处理的长度。在本文件中,我们发布了长期前培训语言模式,称为法律前人员,供中国法律文件理解。我们评估法律人造情报组织的各种任务,包括判决预测、类似的案件检索、法律阅读理解和法律问题回答。实验结果显示,我们的模式可以在长期的文件投入下取得大有希望的改进。