自然语言处理中的表示学习进展:从Transfomer到BERT 复旦大学邱锡鹏
NLP深度学习的各类模型综述
预训练语言模型综述
nlp语言模型和预训练综述
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin: Attention is All you Need. NIPS 2017: 5998-6008
Andrew M. Dai, Quoc V. Le.Semi-supervised Sequence Learning.NIPS 2015.
Oren Melamud, Jacob Goldberger, Ido Dagan.context2vec: Learning Generic Context Embedding with Bidirectional LSTM.CoNLL 2016.
Prajit Ramachandran, Peter J. Liu, Quoc V. Le.Unsupervised Pretraining for Sequence to Sequence Learning. EMNLP 2017. (Pre-trained seq2seq)
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee and Luke Zettlemoyer.Deep contextualized word representations. NAACL 2018. (ELMo)
Jeremy Howard and Sebastian Ruder.Universal Language Model Fine-tuning for Text Classification. ACL 2018. (ULMFiT)
Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. Preprint. Improving Language Understanding by Generative Pre-Training. (GPT)
Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL 2019.
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever.Language Models are Unsupervised Multitask Learners.
Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun and Qun Liu. ERNIE: Enhanced Language Representation with Informative Entities.ACL 2019.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova:BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT (1) 2019: 4171-4186
Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian and Hua Wu.ERNIE: Enhanced Representation through Knowledge Integration.
Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, Yejin Choi.Defending Against Neural Fake News. NeurIPS 2019. (Grover)
Guillaume Lample, Alexis Conneau.Cross-lingual Language Model Pretraining. NeurIPS 2019.
Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao. Multi-Task Deep Neural Networks for Natural Language Understanding. ACL 2019.
Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.MASS: Masked Sequence to Sequence Pre-training for Language Generation. ICML 2019.
Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon.Unified Language Model Pre-training for Natural Language Understanding and Generation. Preprint.(UniLM)
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.XLNet: Generalized Autoregressive Pretraining for Language Understanding.. NeurIPS 2019.
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. Preprint. RoBERTa: A Robustly Optimized BERT Pretraining Approach.
Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S.. Weld, Luke Zettlemoyer, Omer Levy SpanBERT: Improving Pre-training by Representing and Predicting Spans
Matthew E. Peters, Mark Neumann, Robert L Knowledge Enhanced Contextual Word Representations.
Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang. VisualBERT: A Simple and Performant Baseline for Vision and Language
Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. NeurIPS 2019
Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, Cordelia Schmid. VideoBERT: A Joint Model for Video and Language Representation Learning. ICCV 2019
Hao Tan, Mohit Bansal. LXMERT: Learning Cross-Modality Encoder Representations from Transformers. EMNLP 2019
Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai. VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Gen Li, Nan Duan, Yuejian Fang, Ming Gong, Daxin Jiang, Ming Zhou. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, Ping Wang. K-BERT: Enabling Language Representation with Knowledge Graph
Chris Alberti, Jeffrey Ling, Michael Collins, David Reitter. Fusion of Detected Objects in Text for Visual Question Answering. EMNLP 2019
Chen Sun, Fabien Baradel, Kevin Murphy, Cordelia Schmid. Contrastive Bidirectional Transformer for Temporal Representation Learning
Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu, Haifeng Wang: A Continual Pre-training Framework for Language Understanding.
Dan Kondratyuk, Milan Straka. 75 Languages, 1 Model: Parsing Universal Dependencies Universally. EMNLP 2019
Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, Guoping Hu. Pre-Training with Whole Word Masking for Chinese BERT
Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu. UNITER: Learning UNiversal Image-TExt Representations
Anonymous authors. HUBERT Untangles BERT to Improve Transfer across NLP Tasks. ICLR 2020 under review
Julian Eisenschlos, Sebastian Ruder, Piotr Czapla, Marcin Kardas, Sylvain Gugger, Jeremy Howard. MultiFiT: Efficient Multi-lingual Language Model Fine-tuning. EMNLP 2019
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Anonymous authors. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. ICLR 2020 under review
Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu. TinyBERT: Distilling BERT for Natural Language Understanding
Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy Lin. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu. Patient Knowledge Distillation for BERT Model Compression. EMNLP 2019
Ze Yang, Linjun Shou, Ming Gong, Wutao Lin, Daxin Jiang. Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System
Wei Zhu, Xiaofeng Zhou, Keqiang Wang, Xun Luo, Xiepeng Li, Yuan Ni, Guotong Xie. PANLP at MEDIQA 2019: Pre-trained Language Models, Transfer Learning and Knowledge Distillation. The 18th BioNLP workshop
Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao. Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
Iulia Turc, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation
Henry Tsai, Jason Riesa, Melvin Johnson, Naveen Arivazhagan, Xin Li, Amelia Archer. Small and Practical BERT Models for Sequence Labeling. EMNLP 2019
Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt KeutzerQ-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Anonymous authors. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. ICLR 2020 under review
Sanqiang Zhao, Raghav Gupta, Yang Song, Denny Zhou. Extreme Language Model Compression with Optimal Subwords and Shared Projections
Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Olga Kovaleva, Alexey Romanov, Anna Rogers, Anna Rumshisky. Revealing the Dark Secrets of BERT. EMNLP 2019
Betty van Aken, Benjamin Winter, Alexander L枚ser, Felix A. How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations
Paul Michel, Omer Levy, Graham Neubig. Are Sixteen Heads Really Better than One?
Di Jin, Zhijing Jin, Joey Tianyi Zhou, Peter Szolovits. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment
Alex Wang, Kyunghyun Cho. BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model. NeuralGen 2019
Nelson F.. Liu, Matt Gardner, Yonatan Belinkov, Matthew E Linguistic Knowledge and Transferability of Contextual Representations
Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D. Manning.What Does BERT Look At? An Analysis of BERT's Attention
Yongjie Lin, Yi Chern Tan, Robert Frank. Open Sesame: Getting Inside BERT's Linguistic Knowledge. BlackBoxNLP 2019
Jesse Vig, Yonatan Belinkov. Analyzing the Structure of Attention in a Transformer Language Model. BlackBoxNLP 2019
Samira Abnar, Lisa Beinborn, Rochelle Choenni, Willem Zuidema. Blackbox meets blackbox: Representational Similarity and Stability Analysis of Neural Language Models and Brains. BlackBoxNLP 2019
Ian Tenney, Dipanjan Das, Ellie Pavlick. BERT Rediscovers the Classical NLP Pipeline. ACL 2019
Telmo Pires, Eva Schlinger, Dan Garrette. How multilingual is Multilingual BERT?. ACL 2019
Ganesh Jawahar, Benoît Sagot, Djamé Seddah. What Does BERT Learn about the Structure of Language?. ACL 2019
Shijie Wu, Mark Dredze. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT. EMNLP 2019
Kawin Ethayarajh. How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings. EMNLP 2019
Timothy Niven, Hung-Yu Kao. Probing Neural Network Comprehension of Natural Language Arguments. ACL 2019
Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer Singh. Universal Adversarial Triggers for Attacking and Analyzing NLP. EMNLP 2019
Elena Voita, Rico Sennrich, Ivan Titov. The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives. EMNLP 2019
Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, Matt Gardner. Do NLP Models Know Numbers? Probing Numeracy in Embeddings. EMNLP 2019
Alex Warstadt, Yu Cao, Ioana Grosu, Wei Peng, Hagen Blix, Yining Nie, Anna Alsop, Shikha Bordia, Haokun Liu, Alicia Parrish, Sheng-Fu Wang, Jason Phang, Anhad Mohananey, Phu Mon Htut, Paloma Jeretič, Samuel R. Bowman. Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs
Yaru Hao, Li Dong, Furu Wei, Ke Xu. Visualizing and Understanding the Effectiveness of BERT. EMNLP 2019
Andy Coenen, Emily Reif, Ann Yuan, Been Kim, Adam Pearce, Fernanda Viégas, Martin Wattenberg. Visualizing and Measuring the Geometry of BERT. NeurIPS 2019
Gino Brunner, Yang Liu, Damián Pascual, Oliver Richter, Roger Wattenhofer. Preprint.On the Validity of Self-Attention as Explanation in Transformer Models.
Yao-Hung Hubert Tsai, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, Ruslan Salakhutdinov. Transformer Dissection: An Unified Understanding for Transformer's Attention via the Lens of Kernel. EMNLP 2019.
Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel. Language Models as Knowledge Bases? EMNLP 2019.
Matthew E. Peters, Sebastian Ruder, Noah A. Smith.To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks. RepL4NLP 2019.
Mikel Artetxe, Sebastian Ruder, Dani Yogatama. Preprint.On the Cross-lingual Transferability of Monolingual Representationscode.
John Hewitt, Christopher D. Manning.A Structural Probe for Finding Syntax in Word Representations. NAACL 2019.
Yoav Goldberg. Technical Report. Assessing BERT’s Syntactic Abilities.
Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R. Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R. Bowman,Dipanjan Das, and Ellie Pavlick. What do you learn from context? Probing for sentence structure in contextualized word representations.ICLR 2019.
Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen, Benjamin Van Durme, Edouard Grave, Ellie Pavlick, Samuel R. Bowman.Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling.ACL 2019.
自然语言处理中的语言模型预训练方法(ELMo、GPT和BERT)
深入理解语言模型 Language Model
NLP中的语言模型(language model)
理解语言的 Transformer 模型
Transformer-Attention Is All You Need
BERT-Pre-training of Deep Bidirectional Transformers for Language Understanding
GPT2-Language Models are Unsupervised Multitask Learners
ERNIE-Enhanced Language Representation with Informative Entities
XLM-Cross-lingual Language Model Pretraining
MASS-Masked Sequence to Sequence Pre-training for Language Generation
XLNet-Generalized Autoregressive Pretraining for Language Understanding
LAMA-Language Models as Knowledge Bases?
Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs
LXMERT-Learning Cross-Modality Encoder Representations from Transformers
XLNet-Generalized Autoregressive Pretraining for Language Understanding
MT-DNN-Multi-Task Deep Neural Networks for Natural Language Understanding