法律协助人:中文长年法律文件培训前语文模式 (Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · Processing（编程语言） · CASE · 自动问答 ·

2021 年 5 月 9 日

Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents

翻译：法律协助人:中文长年法律文件培训前语文模式

Chaojun Xiao,Xueyu Hu,Zhiyuan Liu,Cunchao Tu,Maosong Sun

Legal artificial intelligence (LegalAI) aims to benefit legal systems with the technology of artificial intelligence, especially natural language processing (NLP). Recently, inspired by the success of pre-trained language models (PLMs) in the generic domain, many LegalAI researchers devote their effort to apply PLMs to legal tasks. However, utilizing PLMs to address legal tasks is still challenging, as the legal documents usually consist of thousands of tokens, which is far longer than the length that mainstream PLMs can process. In this paper, we release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding. We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering. The experimental results demonstrate that our model can achieve promising improvement on tasks with long documents as inputs.

翻译：法律人造情报(LALAI)旨在利用人工智能技术,特别是自然语言处理技术,使法律制度受益。最近,在通用领域预先培训的语言模式取得成功的启发下,许多法律人造情报(LALAI)研究人员致力于将人造情报应用于法律任务,然而,利用人造情报处理法律任务仍然具有挑战性,因为法律文件通常由数千个象征物组成,远远长于主流人造情报所能够处理的长度。在本文件中,我们发布了长期前培训语言模式,称为法律前人员,供中国法律文件理解。我们评估法律人造情报组织的各种任务,包括判决预测、类似的案件检索、法律阅读理解和法律问题回答。实验结果显示,我们的模式可以在长期的文件投入下取得大有希望的改进。

2

相关内容

语言模型化

语言模型化

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【论文翻译】2020最新预训练语言模型综述：Pre-trained Models for Natural Language Processing: A Survey

【论文翻译】2020最新预训练语言模型综述：Pre-trained Models for Natural Language Processing: A Survey

专知会员服务

94+阅读 · 2020年4月13日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

每周一起读 × 招募 | ACL 2019：基于知识增强的语言表示模型

每周一起读 × 招募 | ACL 2019：基于知识增强的语言表示模型

PaperWeekly

8+阅读 · 2019年6月13日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Github项目推荐 | awesome-bert：BERT相关资源大列表

Github项目推荐 | awesome-bert：BERT相关资源大列表

AI研习社

27+阅读 · 2019年2月26日

Awesome-Chinese-NLP：中文自然语言处理相关资料

Awesome-Chinese-NLP：中文自然语言处理相关资料

AINLP

30+阅读 · 2019年2月17日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

Pre-trained Language Model based Ranking in Baidu Search

Arxiv

9+阅读 · 2021年6月25日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

0+阅读 · 2021年6月23日

Semantics-aware BERT for Language Understanding

Arxiv

4+阅读 · 2019年9月5日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Arxiv

4+阅读 · 2019年8月27日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

Enriching Pre-trained Language Model with Entity Information for Relation Classification

Arxiv

5+阅读 · 2019年5月20日

Pre-trained Language Model Representations for Language Generation

Arxiv

5+阅读 · 2019年4月1日

Improving the Transformer Translation Model with Document-Level Context

Arxiv

4+阅读 · 2018年10月8日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

语言模型化

Processing（编程语言）

相关VIP内容

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【论文翻译】2020最新预训练语言模型综述：Pre-trained Models for Natural Language Processing: A Survey

【论文翻译】2020最新预训练语言模型综述：Pre-trained Models for Natural Language Processing: A Survey

专知会员服务

94+阅读 · 2020年4月13日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《军事域人工智能风险、机遇与治理战略指导报告》2025最新76页报告

《杀伤网与精确规模：智能饱和战争时代的战略要务-印度视角》2025最新报告

俄乌冲突的地缘政治与军事教训（万字长文）

《弹药快速效能建模：推进互操作性与技术优势》2025最新26页报告

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

每周一起读 × 招募 | ACL 2019：基于知识增强的语言表示模型

每周一起读 × 招募 | ACL 2019：基于知识增强的语言表示模型

PaperWeekly

8+阅读 · 2019年6月13日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Github项目推荐 | awesome-bert：BERT相关资源大列表

Github项目推荐 | awesome-bert：BERT相关资源大列表

AI研习社

27+阅读 · 2019年2月26日

Awesome-Chinese-NLP：中文自然语言处理相关资料

Awesome-Chinese-NLP：中文自然语言处理相关资料

AINLP

30+阅读 · 2019年2月17日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

相关论文

Pre-trained Language Model based Ranking in Baidu Search

Arxiv

9+阅读 · 2021年6月25日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

0+阅读 · 2021年6月23日

Semantics-aware BERT for Language Understanding

Arxiv

4+阅读 · 2019年9月5日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Arxiv

4+阅读 · 2019年8月27日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

Enriching Pre-trained Language Model with Entity Information for Relation Classification

Arxiv

5+阅读 · 2019年5月20日

Pre-trained Language Model Representations for Language Generation

Arxiv

5+阅读 · 2019年4月1日

Improving the Transformer Translation Model with Document-Level Context

Arxiv

4+阅读 · 2018年10月8日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

微信扫码咨询专知VIP会员