预训练语言模型 - 专知荟萃

会员服务 ·

预训练语言模型

预训练语言模型

近年来，预训练模型（例如ELMo、GPT、BERT和XLNet等）的快速发展大幅提升了诸多NLP任务的整体水平，同时也使得很多应用场景进入到实际落地阶段。预训练语言模型本身就是神经网络语言模型，它的特点包括：第一，可以使用大规模无标注纯文本语料进行训练；第二，可以用于各类下游NLP任务，不是针对某项定制的，但以后可用在下游NIP任务上，你不需要为下游任务专门设计一种神经网络，或者提供一种结构，直接在几种给定的固定框架中选择一种进行 fine-tune，就可以从而得到很好的结果。

预训练语言模型 Pre-trained Language Model专知荟萃

综述

自然语言处理中的表示学习进展：从Transfomer到BERT 复旦大学邱锡鹏
- [http://datasci.tju.edu.cn/mla2019/report7.html]
NLP深度学习的各类模型综述
- [https://7125messi.github.io/post/nlp%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%E7%9A%84%E5%90%84%E7%B1%BB%E6%A8%A1%E5%9E%8B%E7%BB%BC%E8%BF%B0/]
预训练语言模型综述
- [https://zhuanlan.zhihu.com/p/91158598]
nlp语言模型和预训练综述
- [https://zhuanlan.zhihu.com/p/91158598]

进阶论文

模型

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin: Attention is All you Need. NIPS 2017: 5998-6008
Andrew M. Dai, Quoc V. Le.Semi-supervised Sequence Learning.NIPS 2015.
Oren Melamud, Jacob Goldberger, Ido Dagan.context2vec: Learning Generic Context Embedding with Bidirectional LSTM.CoNLL 2016.
Prajit Ramachandran, Peter J. Liu, Quoc V. Le.Unsupervised Pretraining for Sequence to Sequence Learning. EMNLP 2017. (Pre-trained seq2seq)
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee and Luke Zettlemoyer.Deep contextualized word representations. NAACL 2018. (ELMo)
Jeremy Howard and Sebastian Ruder.Universal Language Model Fine-tuning for Text Classification. ACL 2018. (ULMFiT)
Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. Preprint. Improving Language Understanding by Generative Pre-Training. (GPT)
Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL 2019.
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever.Language Models are Unsupervised Multitask Learners.
Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun and Qun Liu. ERNIE: Enhanced Language Representation with Informative Entities.ACL 2019.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova:BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT (1) 2019: 4171-4186
Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian and Hua Wu.ERNIE: Enhanced Representation through Knowledge Integration.
Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, Yejin Choi.Defending Against Neural Fake News. NeurIPS 2019. (Grover)
Guillaume Lample, Alexis Conneau.Cross-lingual Language Model Pretraining. NeurIPS 2019.
Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao. Multi-Task Deep Neural Networks for Natural Language Understanding. ACL 2019.
Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.MASS: Masked Sequence to Sequence Pre-training for Language Generation. ICML 2019.
Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon.Unified Language Model Pre-training for Natural Language Understanding and Generation. Preprint.(UniLM)
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.XLNet: Generalized Autoregressive Pretraining for Language Understanding.. NeurIPS 2019.
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. Preprint. RoBERTa: A Robustly Optimized BERT Pretraining Approach.
Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S.. Weld, Luke Zettlemoyer, Omer Levy SpanBERT: Improving Pre-training by Representing and Predicting Spans
Matthew E. Peters, Mark Neumann, Robert L Knowledge Enhanced Contextual Word Representations.
Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang. VisualBERT: A Simple and Performant Baseline for Vision and Language
Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. NeurIPS 2019
Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, Cordelia Schmid. VideoBERT: A Joint Model for Video and Language Representation Learning. ICCV 2019
Hao Tan, Mohit Bansal. LXMERT: Learning Cross-Modality Encoder Representations from Transformers. EMNLP 2019
Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai. VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Gen Li, Nan Duan, Yuejian Fang, Ming Gong, Daxin Jiang, Ming Zhou. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, Ping Wang. K-BERT: Enabling Language Representation with Knowledge Graph
Chris Alberti, Jeffrey Ling, Michael Collins, David Reitter. Fusion of Detected Objects in Text for Visual Question Answering. EMNLP 2019
Chen Sun, Fabien Baradel, Kevin Murphy, Cordelia Schmid. Contrastive Bidirectional Transformer for Temporal Representation Learning
Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu, Haifeng Wang: A Continual Pre-training Framework for Language Understanding.
Dan Kondratyuk, Milan Straka. 75 Languages, 1 Model: Parsing Universal Dependencies Universally. EMNLP 2019
Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, Guoping Hu. Pre-Training with Whole Word Masking for Chinese BERT
Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu. UNITER: Learning UNiversal Image-TExt Representations
Anonymous authors. HUBERT Untangles BERT to Improve Transfer across NLP Tasks. ICLR 2020 under review
Julian Eisenschlos, Sebastian Ruder, Piotr Czapla, Marcin Kardas, Sylvain Gugger, Jeremy Howard. MultiFiT: Efficient Multi-lingual Language Model Fine-tuning. EMNLP 2019
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Anonymous authors. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. ICLR 2020 under review

知识蒸馏和模型压缩

Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu. TinyBERT: Distilling BERT for Natural Language Understanding
Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy Lin. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu. Patient Knowledge Distillation for BERT Model Compression. EMNLP 2019
Ze Yang, Linjun Shou, Ming Gong, Wutao Lin, Daxin Jiang. Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System
Wei Zhu, Xiaofeng Zhou, Keqiang Wang, Xun Luo, Xiepeng Li, Yuan Ni, Guotong Xie. PANLP at MEDIQA 2019: Pre-trained Language Models, Transfer Learning and Knowledge Distillation. The 18th BioNLP workshop
Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao. Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
Iulia Turc, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation
Henry Tsai, Jason Riesa, Melvin Johnson, Naveen Arivazhagan, Xin Li, Amelia Archer. Small and Practical BERT Models for Sequence Labeling. EMNLP 2019
Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt KeutzerQ-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Anonymous authors. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. ICLR 2020 under review
Sanqiang Zhao, Raghav Gupta, Yang Song, Denny Zhou. Extreme Language Model Compression with Optimal Subwords and Shared Projections
Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

分析

Olga Kovaleva, Alexey Romanov, Anna Rogers, Anna Rumshisky. Revealing the Dark Secrets of BERT. EMNLP 2019
Betty van Aken, Benjamin Winter, Alexander L枚ser, Felix A. How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations
Paul Michel, Omer Levy, Graham Neubig. Are Sixteen Heads Really Better than One?
Di Jin, Zhijing Jin, Joey Tianyi Zhou, Peter Szolovits. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment
Alex Wang, Kyunghyun Cho. BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model. NeuralGen 2019
Nelson F.. Liu, Matt Gardner, Yonatan Belinkov, Matthew E Linguistic Knowledge and Transferability of Contextual Representations
Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D. Manning.What Does BERT Look At? An Analysis of BERT's Attention
Yongjie Lin, Yi Chern Tan, Robert Frank. Open Sesame: Getting Inside BERT's Linguistic Knowledge. BlackBoxNLP 2019
Jesse Vig, Yonatan Belinkov. Analyzing the Structure of Attention in a Transformer Language Model. BlackBoxNLP 2019
Samira Abnar, Lisa Beinborn, Rochelle Choenni, Willem Zuidema. Blackbox meets blackbox: Representational Similarity and Stability Analysis of Neural Language Models and Brains. BlackBoxNLP 2019
Ian Tenney, Dipanjan Das, Ellie Pavlick. BERT Rediscovers the Classical NLP Pipeline. ACL 2019
Telmo Pires, Eva Schlinger, Dan Garrette. How multilingual is Multilingual BERT?. ACL 2019
Ganesh Jawahar, Benoît Sagot, Djamé Seddah. What Does BERT Learn about the Structure of Language?. ACL 2019
Shijie Wu, Mark Dredze. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT. EMNLP 2019
Kawin Ethayarajh. How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings. EMNLP 2019
Timothy Niven, Hung-Yu Kao. Probing Neural Network Comprehension of Natural Language Arguments. ACL 2019
Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer Singh. Universal Adversarial Triggers for Attacking and Analyzing NLP. EMNLP 2019
Elena Voita, Rico Sennrich, Ivan Titov. The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives. EMNLP 2019
Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, Matt Gardner. Do NLP Models Know Numbers? Probing Numeracy in Embeddings. EMNLP 2019
Alex Warstadt, Yu Cao, Ioana Grosu, Wei Peng, Hagen Blix, Yining Nie, Anna Alsop, Shikha Bordia, Haokun Liu, Alicia Parrish, Sheng-Fu Wang, Jason Phang, Anhad Mohananey, Phu Mon Htut, Paloma Jeretič, Samuel R. Bowman. Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs
Yaru Hao, Li Dong, Furu Wei, Ke Xu. Visualizing and Understanding the Effectiveness of BERT. EMNLP 2019
Andy Coenen, Emily Reif, Ann Yuan, Been Kim, Adam Pearce, Fernanda Viégas, Martin Wattenberg. Visualizing and Measuring the Geometry of BERT. NeurIPS 2019
Gino Brunner, Yang Liu, Damián Pascual, Oliver Richter, Roger Wattenhofer. Preprint.On the Validity of Self-Attention as Explanation in Transformer Models.
Yao-Hung Hubert Tsai, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, Ruslan Salakhutdinov. Transformer Dissection: An Unified Understanding for Transformer's Attention via the Lens of Kernel. EMNLP 2019.
Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel. Language Models as Knowledge Bases? EMNLP 2019.
Matthew E. Peters, Sebastian Ruder, Noah A. Smith.To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks. RepL4NLP 2019.
Mikel Artetxe, Sebastian Ruder, Dani Yogatama. Preprint.On the Cross-lingual Transferability of Monolingual Representations code.
John Hewitt, Christopher D. Manning.A Structural Probe for Finding Syntax in Word Representations. NAACL 2019.
Yoav Goldberg. Technical Report. Assessing BERT’s Syntactic Abilities.
Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R. Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R. Bowman,Dipanjan Das, and Ellie Pavlick. What do you learn from context? Probing for sentence structure in contextualized word representations.ICLR 2019.
Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen, Benjamin Van Durme, Edouard Grave, Ellie Pavlick, Samuel R. Bowman.Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling.ACL 2019.

入门学习

自然语言处理中的语言模型预训练方法（ELMo、GPT和BERT）
- [https://www.cnblogs.com/robert-dlut/p/9824346.html]
深入理解语言模型 Language Model
- [https://zhuanlan.zhihu.com/p/52061158]
NLP中的语言模型(language model)
- [https://blog.csdn.net/huanghaocs/article/details/77935556]
理解语言的 Transformer 模型
- [https://www.tensorflow.org/tutorials/text/transformer]

代码

Transformer-Attention Is All You Need
BERT-Pre-training of Deep Bidirectional Transformers for Language Understanding
GPT2-Language Models are Unsupervised Multitask Learners
- https://github.com/openai/gpt-2
ERNIE-Enhanced Language Representation with Informative Entities
- https://github.com/thunlp/ERNIE https://github.com/PaddlePaddle/ERNIE/tree/develop/ERNIE
XLM-Cross-lingual Language Model Pretraining
- https://github.com/facebookresearch/XLM
MASS-Masked Sequence to Sequence Pre-training for Language Generation
- https://github.com/microsoft/MASS
XLNet-Generalized Autoregressive Pretraining for Language Understanding
- https://github.com/zihangdai/xlnet
LAMA-Language Models as Knowledge Bases?
- https://github.com/facebookresearch/LAMA
Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs
- https://github.com/alexwarstadt/data_generation
LXMERT-Learning Cross-Modality Encoder Representations from Transformers
- https://github.com/airsplay/lxmert
XLNet-Generalized Autoregressive Pretraining for Language Understanding
- https://github.com/zihangdai/xlnet
MT-DNN-Multi-Task Deep Neural Networks for Natural Language Understanding
- https://github.com/namisan/mt-dnn

领域专家

清华大学
- 孙茂松 [http://nlp.csai.tsinghua.edu.cn/site/index.php/en?id=16]
- 刘知远 [http://nlp.csai.tsinghua.edu.cn/~lzy/index_cn.html]
- 刘洋 [http://nlp.csai.tsinghua.edu.cn/~ly/index_cn.html]
哈尔滨工业大学
- 刘挺 [http://ir.hit.edu.cn/~liuting/]
- 丁效 [http://ir.hit.edu.cn/~xding/]
- 秦兵 [http://ir.hit.edu.cn/~qinb/]
- 车万翔 [http://ir.hit.edu.cn/~car/chinese.htm]
微软亚洲研究院自然语言计算组：
- 周明 [https://www.msra.cn/zh-cn/people/ming-zhou]
- 刘铁岩 [https://www.msra.cn/zh-cn/people/tie-yan-liu]
- 谢幸 [https://www.microsoft.com/en-us/research/people/xingx/]
华为诺亚方舟实验室
- 刘群
百度
- 王海峰 [http://ir.hit.edu.cn/~wanghaifeng/whf_cn.htm]
- 吴华[http://research.baidu.com/People/index-view?id=121]
- Hao Tian[http://research.baidu.com/People/index-view?id=137]

Tutorial

Latent Structure Models for Natural Language Processing
- [https://deep-spin.github.io/tutorial/acl.pdf]
Graph-Based Meaning Representations: Design and Processing
- [https://github.com/cfmrp/tutorial]
Discourse Analysis and Its Applications
- [https://arxiv.org/abs/1510.00726]
Deep Learning for Natural Language Processing: Theory and Practice [Tutorial]
- [https://www.microsoft.com/en-us/research/publication/deep-learning-for-natural-language-processing-theory-and-practice-tutorial/]
Recurrent Neural Networks with Word Embeddings
- [http://deeplearning.net/tutorial/rnnslu.html]
LSTM Networks for Sentiment Analysis
- [http://deeplearning.net/tutorial/lstm.html]
Semantic Representations of Word Senses and Concepts 语义表示 ACL 2016 Tutorial by José Camacho-Collados, Ignacio Iacobacci, Roberto Navigli and Mohammad Taher Pilehvar
- [http://acl2016.org/index.php?article_id=58]
- [http://wwwusers.di.uniroma1.it/~collados/Slides_ACL16Tutorial_SemanticRepresentation.pdf]
ACL 2016 Tutorial: Understanding Short Texts 短文本理解
- [http://www.wangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/]
Practical Neural Networks for NLP EMNLP 2016
- [https://github.com/clab/dynet_tutorial_examples]
Structured Neural Networks for NLP: From Idea to Code
- [https://github.com/neubig/yrsnlp-2016/blob/master/neubig16yrsnlp.pdf]
Understanding Deep Learning Models in NLP
- [http://nlp.yvespeirsman.be/blog/understanding-deeplearning-models-nlp/]
Deep learning for natural language processing, Part 1
- [https://softwaremill.com/deep-learning-for-nlp/]
TensorFlow Tutorial on Seq2Seq Models
- [https://www.tensorflow.org/tutorials/seq2seq/index.html]
Natural Language Understanding with Distributed Representation Lecture Note by Cho
- [https://github.com/nyu-dl/NLP_DL_Lecture_Note]
Michael Collins
- [http://www.cs.columbia.edu/mcollins/]
Several tutorials by Radim Řehůřek
- [https://radimrehurek.com/gensim/tutorial.html]
Natural Language Processing in Action
- [https://www.manning.com/books/natural-language-processing-in-action]
Semantic Specialization of Distributional Word Vectors
- [https://www.emnlp-ijcnlp2019.org/program/tutorials/]
Dive into Deep Learning for Natural Language Processing
- [https://www.emnlp-ijcnlp2019.org/program/tutorials/]
Transfer Learning in Natural Language Processing. Sebastian Ruder, Matthew E. Peters, Swabha Swayamdipta, Thomas Wolf. NAACL 2019.
- [https://docs.google.com/presentation/d/1fIhGikFPnb7G5kr58OvYC3GN4io7MznnM0aAgadvJfc/edit?usp=sharing]
Transformers: State-of-the-art Natural Language Processing. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Jamie Brew. Preprint.
- [https://arxiv.org/pdf/1910.03771.pdf]
- [https://github.com/huggingface/transformers]
【2019 北京智源大会】预训练语言模型的研究与应用刘群/华为诺亚方舟实验室
- [https://slides.baai.ac.cn/2019/]

成为VIP会员查看完整内容

参考链接

父主题

自然语言处理

子主题

XLNet（广义自回归预训练方法）

荟萃目录

微信扫码咨询专知VIP会员