持续预训练语言模型 (Continual Pre-training of Language Models) - 专知论文

会员服务 ·

0

预训练 · 知识 · 语料库 · 语料 · 语言模型 ·

2023 年 4 月 12 日

Continual Pre-training of Language Models

翻译：持续预训练语言模型

Zixuan Ke,Yijia Shao,Haowei Lin,Tatsuya Konishi,Gyuhak Kim,Bing Liu

from arxiv, https://github.com/UIC-Liu-Lab/ContinualLM

Language models (LMs) have been instrumental for the rapid advance of natural language processing. This paper studies continual pre-training of LMs, in particular, continual domain-adaptive pre-training (or continual DAP-training). Existing research has shown that further pre-training an LM using a domain corpus to adapt the LM to the domain can improve the end-task performance in the domain. This paper proposes a novel method to continually DAP-train an LM with a sequence of unlabeled domain corpora to adapt the LM to these domains to improve their end-task performances. The key novelty of our method is a soft-masking mechanism that directly controls the update to the LM. A novel proxy is also proposed to preserve the general knowledge in the original LM. Additionally, it contrasts the representations of the previously learned domain knowledge (including the general knowledge in the pre-trained LM) and the knowledge from the current full network to achieve knowledge integration. The method not only overcomes catastrophic forgetting, but also achieves knowledge transfer to improve end-task performances. Empirical evaluation demonstrates the effectiveness of the proposed method.

翻译：语言模型（LM）已经成为自然语言处理快速发展的关键因素。本文研究持续预训练LM，特别是持续领域适应性预训练（或持续DAP预训练）。现有研究表明，使用领域语料库进一步预训练LM以适应该领域可以提高该领域的最终任务性能。本文提出了一种新方法，可以使用一系列未标记的领域语料库来持续DAP训练LM，以适应这些领域以提高它们的最终任务性能。我们方法的关键创新点是一种软屏蔽机制，可以直接控制LM的更新。还提出了一种新的代理，以保留原始LM中的通用知识。此外，它对比了先前学习的领域知识（包括预先训练的LM中的通用知识）和来自当前完整网络的知识，以实现知识整合。该方法不仅克服了灾难性遗忘，还实现了知识传递，以提高最终任务性能。实证评估证明了该方法的有效性。

0

相关内容

预训练

在搭建网络模型时，需要随机初始化参数，然后开始训练网络，不断调整直到网络的损失越来越小。在训练的过程中，一开始初始化的参数会不断变化。当参数训练到比较好的时候就可以将训练模型的参数保存下来，以便训练好的模型可以在下次执行类似任务时获得较好的结果。

「知识增强预训练语言模型」最新研究综述

「知识增强预训练语言模型」最新研究综述

专知会员服务

62+阅读 · 2022年11月18日

【浙大-WWW2022】OntoPrompt & KnowPrompt：知识提示的预训练微调

【浙大-WWW2022】OntoPrompt & KnowPrompt：知识提示的预训练微调

专知会员服务

48+阅读 · 2022年1月26日

知识增强预训练语言模型:全面综述

知识增强预训练语言模型:全面综述

专知会员服务

93+阅读 · 2021年10月19日

神经网络的持续终身学习综述论文

专知会员服务

44+阅读 · 2021年5月19日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【EMNLP2020】低资源域适应的多阶段预训练

专知会员服务

19+阅读 · 2020年10月13日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

专知会员服务

46+阅读 · 2020年4月25日

【中科院计算所 | 文献综述】自然语言生成的无监督前训练:文献综述，Unsupervised Pre-training for Natural Language Generation: A Literature Review

【中科院计算所 | 文献综述】自然语言生成的无监督前训练:文献综述，Unsupervised Pre-training for Natural Language Generation: A Literature Review

专知会员服务

48+阅读 · 2019年11月15日

【KDD2019教程】从浅层到深层的语言表达:预训练、微调，等等，From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond

【KDD2019教程】从浅层到深层的语言表达:预训练、微调，等等，From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond

专知会员服务

16+阅读 · 2019年11月4日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

专知

19+阅读 · 2018年3月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

SIRT1调控miR-15b-5p转录的新机制及其在结直肠癌转移的作用

国家自然科学基金

0+阅读 · 2015年12月31日

情绪影响人际信任的效应与机制研究

国家自然科学基金

1+阅读 · 2015年12月31日

Flotillin-1诱导胃癌早期转移的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

ETK/EZH2信号通路调节膀胱癌细胞侵袭转移的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

连接蛋白43对低剪切应力诱导内皮细胞间质转化的调节作用

国家自然科学基金

0+阅读 · 2012年12月31日

抑癌基因PDCD4调控miR-184和miR-374a抑制鼻咽癌生长及促进凋亡

国家自然科学基金

0+阅读 · 2012年12月31日

基于计算实验的我国新能源产业培育与发展机制及政策研究

国家自然科学基金

0+阅读 · 2012年12月31日

RI调控ILK信号通路抑制膀胱癌发生EMT及转移的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

miR-96促进乳腺癌转移复发的生物学功能及分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

RI作用ILK介导的信号转导通路在膀胱癌侵袭转移中的机制研究

国家自然科学基金

1+阅读 · 2010年12月31日

MVP: Multi-task Supervised Pre-training for Natural Language Generation

Arxiv

0+阅读 · 2023年5月28日

Adapting Language-Audio Models as Few-Shot Audio Learners

Arxiv

0+阅读 · 2023年5月28日

A Survey of Knowledge-Enhanced Pre-trained Language Models

Arxiv

18+阅读 · 2022年11月17日

Data-Free Knowledge Transfer: A Survey

Arxiv

21+阅读 · 2021年12月31日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

A continual learning survey: Defying forgetting in classification tasks

Arxiv

32+阅读 · 2021年4月16日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

K-BERT: Enabling Language Representation with Knowledge Graph

K-BERT: Enabling Language Representation with Knowledge Graph

Arxiv

19+阅读 · 2019年9月17日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

16+阅读 · 2019年5月24日

VIP会员

文章信息

相关主题

相关VIP内容

「知识增强预训练语言模型」最新研究综述

「知识增强预训练语言模型」最新研究综述

专知会员服务

62+阅读 · 2022年11月18日

【浙大-WWW2022】OntoPrompt & KnowPrompt：知识提示的预训练微调

【浙大-WWW2022】OntoPrompt & KnowPrompt：知识提示的预训练微调

专知会员服务

48+阅读 · 2022年1月26日

知识增强预训练语言模型:全面综述

知识增强预训练语言模型:全面综述

专知会员服务

93+阅读 · 2021年10月19日

神经网络的持续终身学习综述论文

专知会员服务

44+阅读 · 2021年5月19日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【EMNLP2020】低资源域适应的多阶段预训练

专知会员服务

19+阅读 · 2020年10月13日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

专知会员服务

46+阅读 · 2020年4月25日

【中科院计算所 | 文献综述】自然语言生成的无监督前训练:文献综述，Unsupervised Pre-training for Natural Language Generation: A Literature Review

【中科院计算所 | 文献综述】自然语言生成的无监督前训练:文献综述，Unsupervised Pre-training for Natural Language Generation: A Literature Review

专知会员服务

48+阅读 · 2019年11月15日

【KDD2019教程】从浅层到深层的语言表达:预训练、微调，等等，From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond

【KDD2019教程】从浅层到深层的语言表达:预训练、微调，等等，From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond

专知会员服务

16+阅读 · 2019年11月4日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

专知

19+阅读 · 2018年3月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

MVP: Multi-task Supervised Pre-training for Natural Language Generation

Arxiv

0+阅读 · 2023年5月28日

Adapting Language-Audio Models as Few-Shot Audio Learners

Arxiv

0+阅读 · 2023年5月28日

A Survey of Knowledge-Enhanced Pre-trained Language Models

Arxiv

18+阅读 · 2022年11月17日

Data-Free Knowledge Transfer: A Survey

Arxiv

21+阅读 · 2021年12月31日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

A continual learning survey: Defying forgetting in classification tasks

Arxiv

32+阅读 · 2021年4月16日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

K-BERT: Enabling Language Representation with Knowledge Graph

K-BERT: Enabling Language Representation with Knowledge Graph

Arxiv

19+阅读 · 2019年9月17日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

16+阅读 · 2019年5月24日

相关基金

SIRT1调控miR-15b-5p转录的新机制及其在结直肠癌转移的作用

国家自然科学基金

0+阅读 · 2015年12月31日

情绪影响人际信任的效应与机制研究

国家自然科学基金

1+阅读 · 2015年12月31日

Flotillin-1诱导胃癌早期转移的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

ETK/EZH2信号通路调节膀胱癌细胞侵袭转移的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

连接蛋白43对低剪切应力诱导内皮细胞间质转化的调节作用

国家自然科学基金

0+阅读 · 2012年12月31日

抑癌基因PDCD4调控miR-184和miR-374a抑制鼻咽癌生长及促进凋亡

国家自然科学基金

0+阅读 · 2012年12月31日

基于计算实验的我国新能源产业培育与发展机制及政策研究

国家自然科学基金

0+阅读 · 2012年12月31日

RI调控ILK信号通路抑制膀胱癌发生EMT及转移的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

miR-96促进乳腺癌转移复发的生物学功能及分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

RI作用ILK介导的信号转导通路在膀胱癌侵袭转移中的机制研究

国家自然科学基金

1+阅读 · 2010年12月31日

微信扫码咨询专知VIP会员