自然语言处理常见数据集、论文最全整理分享

会员服务 ·

自然语言处理常见数据集、论文最全整理分享

2019 年 1 月 26 日 深度学习与NLP

本文收集了自然语言处理中一些测试数据集，以及机器翻译、阅读和问答，序列标注，知识图谱和社会计算，情感分析和文本分类等NLP常见任务里前沿的一些论文。

感谢IsaacChanghau的整理和无私分享，原文地址：

https://github.com/IsaacChanghau/DL-NLP-Readings

自然语言处理数据集

序列标注

· [2002 CoNLL] Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition, [paper], [bibtex], [dataset].

· [2003 CoNLL] Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition, [paper], [bibtex], [dataset].

· [2017 CoNLL] CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, [paper], [bibtex], [homepage].

· [2017 ACL] Cross-lingual Name Tagging and Linking for 282 Languages, [paper], [bibtex], [homepage].

机器阅读与问答

· [2013 EMNLP] MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text, [paper], [bibtex], [homepage], source: [mcobzarenco/mctest].

· [2015 NIPS] CNN/DailyMail: Teaching Machines to Read and Comprehend, [paper], [bibtex], [homepage], sources: [thomasmesnard/DeepMind-Teaching-Machines-to-Read-and-Comprehend].

· [2016 EMNLP] SQuAD 100,000+ Questions for Machine Comprehension of Text, [paper], [bibtex], [homepage].

· [2016 ICLR] bAbI: Towards AI-Complete Question Answering: a Set of Prerequisite Toy Tasks, [paper], [bibtex], [homepage], sources: [facebook/bAbI-tasks].

· [2017 EMNLP] World Knowledge for Reading Comprehension: Rare Entity Prediction with Hierarchical LSTMs Using External Descriptions, [paper], [bibtex], [homepage].

· [2017 EMNLP] RACE: Large-scale ReAding Comprehension Dataset From Examinations, [paper], [bibtex], [homepage], sources: [qizhex/RACE_AR_baselines].

· [2017 ACL] TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension, [paper], [bibtex] [homepage], sources: [mandarjoshi90/triviaqa].

· [2018 TACL] QAngaroo: Constructing Datasets for Multi-hop Reading Comprehension Across Documents, [paper], [bibtex], [homepage].

· [2018 ICLR] CLOTH: Large-scale Cloze Test Dataset Designed by Teachers, [paper], [bibtex], [homepage], sources: [qizhex/Large-scale-Cloze-Test-Dataset-Designed-by-Teachers].

· [2018 NAACL] MultiRC: Looking Beyond the Surface - A Challenge Set for Reading Comprehension over Multiple Sentences, [paper], [bibtex], [homepage], sources: [CogComp/multirc].

· [2018 EMNLP] HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering, [paper], [bibtex], [attachment], [homepage], sources: [hotpotqa/hotpot].

常识知识库

· [2017 AAAI] ConceptNet 5.5: An Open Multilingual Graph of General Knowledge, [paper], [bibtex], sources: [GitHub page], [commonsense/conceptnet5], [commonsense/conceptnet-numberbatch].

循环神经网络（RNN）

· [2001 PhD Thesis] Long Short-Term Memory in Recurrent Neural Networks, [Gers' Ph.D. Thesis].

· [2014 ArXiv] Recurrent Neural Network Regularization, [paper].

· [2015 ArXiv] Grid Long Short-Term Memory, [paper], sources: [Tensotflow-GridLSTMCell].

· [2016 ArXiv] Visualizing and Understanding Curriculum Learning for Long Short-Term Memory Networks, [paper].

· [2016 ICLR] Visualizing and Understanding Recurrent Networks, [paper].

· Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences, [paper], sources: [Tensorflow-PhasedLSTMCell].

· [2017 ACML] Nested LSTMs, [paper], sources: [hannw/nlstm], [titu1994/Nested-LSTM].

· [2017 ICLR] Variable Computation in Recurrent Neural Networks, [paper].

· [2018 EMNLP] Simple Recurrent Units for Highly Parallelizable Recurrence, [paper], [bibtex], sources: [taolei87/sru].

· [2018 ICLR] Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks, [paper], [homepage], sources: [imatge-upc/skiprnn-2017-telecombcn].

机器翻译

· [2014 SSST] On the properties of neural machine Translation Encoder-Decoder Approaches, [paper].

· [2015 ICLR] Neural Machine Translation by Jointly Learning to Align and Translate, [paper], sources: [lisa-groundhog/GroundHog], [tensorflow/nmt].

· [2015 EMNLP] Effective Approaches to Attention-based Neural Machine Translation, [paper], [HarvardNLP homepage], sources: [dillonalaird/Attention], [tensorflow/nmt].

· [2016 ACL] Neural Machine Translation of Rare Words with Subword Units, [paper], [bibtex], [software], sources: [rsennrich/subword-nmt], [soaxelbrooke/python-bpe].

· [2017 ACL] A Convolutional Encoder Model for Neural Machine Translation, [paper], sources: [facebookresearch/fairseq].

· [2017 NIPS] Attention is All You Need, [paper], [Chinses blog], sources: [Kyubyong/transformer], [jadore801120/attention-is-all-you-need-pytorch], [DongjunLee/transformer-tensorflow].

· [2017 EMNLP] Neural Machine Translation with Word Predictions, [paper].

· [2017 EMNLP] Massive Exploration of Neural Machine Translation Architectures, [paper], [homepage], sources: [google/seq2seq].

· [2017 EMNLP] Efficient Attention using a Fixed-Size Memory Representation, [paper].

· [2018 AMTA] Context Models for OOV Word Translation in Low-Resource Language, [paper].

· [2018 NAACL] Self-Attention with Relative Position Representations, [paper].

· [2018 COLING] Double Path Networks for Sequence to Sequence Learning, [paper].

机器阅读与问答

· [2014 NIPS] Deep Learning for Answer Sentence Selection, [paper], sources: [brmson/Sentence-selection].

· [2014 ACL] Freebase QA: Information Extraction or Semantic Parsing?, [paper].

· [2015 IJCAI] Convolutional Neural Tensor Network Architecture for Community-based Question Answering, [paper], [bibtex], sources: [GauravBh1010tt/DeepLearn], [SongRb/Seq2SeqLearning].

· [2015 NIPS] Pointer Networks, [paper], [blog], sources: [devsisters/pointer-network-tensorflow], [https://github.com/ikostrikov/TensorFlow-Pointer-Networks], [keon/pointer-networks], [pemami4911/neural-combinatorial-rl-pytorch], [shiretzet/PointerNet].

· [2016 ACL] Question Answering on Freebase via Relation Extraction and Textual Evidence, [paper], sources: [syxu828/QuestionAnsweringOverFB].

· [2016 EMNLP] Long Short-Term Memory-Networks for Machine Reading, [paper], sources: [cheng6076/SNLI-attention], [vsitzmann/snli-attention-tensorflow].

· [2016 ICLR] LSTM-based Deep Learning Models for Non-factoid Answer Selection, [paper], sources: [Alan-Lee123/answer-selection], [tambetm/allenAI].

· [2016 ICML] Ask Me Anything: Dynamic Memory Networks for Natural Language Processing, [paper], sources: [DongjunLee/dmn-tensorflow].

· [2016 ACL] A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task, [paper], sources: [danqi/rc-cnn-dailymail].

· [2016 ICML] Dynamic Memory Networks for Visual and Textual Question Answering, [paper], [blog], sources: [therne/dmn-tensorflow], [barronalex/Dynamic-Memory-Networks-in-TensorFlow], [ethancaballero/Improved-Dynamic-Memory-Networks-DMN-plus], [dandelin/Dynamic-memory-networks-plus-Pytorch], [DeepRNN/visual_question_answering].

· [2017 ICLR] Query-Reduction Networks for Question Answering, [paper], [homepage], sources: [uwnlp/qrn].

· [2017 ICLR] Bi-Directional Attention Flow for Machine Comprehension, [paper], [homepage], [demo], sources: [allenai/bi-att-flow].

· [2017 ACL] Learning to Skim Text, [paper], [notes].

· [2017 ACL] R-Net: Machine Reading Comprehension with Self-matching Networks, [paper], [blog], sources: [HKUST-KnowComp/R-Net], [YerevaNN/R-NET-in-Keras], [minsangkim142/R-net].

· [2017 ICLR] Machine Comprehension Using Match-LSTM and Answer Pointer, [paper], sources: [shuohangwang/SeqMatchSeq], [MurtyShikhar/Question-Answering], [InnerPeace-Wu/reading_comprehension-cs224n].

· [2017 EMNLP] Accurate Supervised and Semi-Supervised Machine Reading for Long Documents, [paper], [bibtex].

· [2017 ArXiv] Simple and Effective Multi-Paragraph Reading Comprehension, [paper], sources: [allenai/document-qa].

· [2017 CoNLL] Making Neural QA as Simple as Possible but not Simpler, [paper], [homepage], [github-page], sources: [georgwiese/biomedical-qa].

· [2017 EMNLP] Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension, [paper], sources: [davidgolub/QuestionGeneration].

· [2017 ACL] Attention-over-Attention Neural Networks for Reading Comprehension, [paper], sources: [OlavHN/attention-over-attention], [marshmelloX/attention-over-attention].

· [2017 EMNLP] Identifying Where to Focus in Reading Comprehension for Neural Question Generation, [paper], [bibtex].

· [2017 ACL] Improved Neural Relation Detection for Knowledge Base Question Answering, [paper].

· [2017 ACL] An End-to-End Model for Question Answering over Knowledge Base with Cross-Attention Combining Global Knowledge, [paper], [homepage], [blog].

· [2017 EMNLP] Learning what to read: Focused machine reading, [paper], [bibtex].

· [2017 ACL] Reading Wikipedia to Answer Open-Domain Questions, [paper], sources: [facebookresearch/DrQA], [hitvoice/DrQA].

· [2018 ICLR] MaskGAN: Better Text Generation via Filling in the ______, [paper].

· [2018 AAAI] Multi-attention Recurrent Network for Human Communication Comprehension, [paper].

· [2018 ArXiv] An Attention-Based Word-Level Interaction Model: Relation Detection for Knowledge Base Question Answering, [paper].

· [2018 ICLR] FusionNet: Fusing via Fully-aware Attention with Application to Machine Comprehension, [paper], sources: [exe1023/FusionNet], [momohuang/FusionNet-NLI].

· [2018 NAACL] Contextualized Word Representations for Reading Comprehension, [paper], sources: [shimisalant/CWR].

· [2018 ICLR] QANet: Combing Local Convolution with Global Self-Attention for Reading Comprehension, [paper], sources: [hengruo/QANet-pytorch], [NLPLearn/QANet].

· [2018 ICLR] Neural Speed Reading via Skim-RNN, [paper], sources: [schelotto/Neural_Speed_Reading_via_Skim-RNN_PyTorch].

· [2018 SemEval] Yuanfudao at SemEval-2018 Task 11: Three-way Attention and Relational Knowledge for Commonsense Machine Comprehension, [paper], sources: [intfloat/commonsense-rc].

· [2018 ACL] Knowledgeable Reader: Enhancing Cloze-Style Reading Comprehension with External Commonsense Knowledge, [paper].

· [2018 ACL] Stochastic Answer Networks for Machine Reading Comprehension, [paper], [bibtex], sources: [kevinduh/san_mrc].

对话、聊天机器人和NLG系统

包括对话系统、聊天机器人、对话算法、自然语言生成方法等。

· [2013 IEEE] POMDP-based Statistical Spoken Dialogue Systems: a Review, [paper].

· [2014 NIPS] Sequence to Sequence Learning with Neural Networks, [paper], sources: [farizrahman4u/seq2seq], [ma2rten/seq2seq], [JayParks/tf-seq2seq], [macournoyer/neuralconvo].

· [2015 CIKM] A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion, [paper], sources: [sordonia/hred-qs].

· [2015 EMNLP] Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems, [paper], sources: [shawnwun/RNNLG], [hit-computer/SC-LSTM].

· [2015 ArXiv] Attention with Intention for a Neural Network Conversation Model, [paper].

· [2015 ACL] Neural Responding Machine for Short-Text Conversation, [paper].

· [2016 AAAI] Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models, [paper], sources: [suriyadeepan/augmented_seq2seq], [julianser/hed-dlg], [sordonia/hed-dlg], [julianser/hred-latent-piecewise], [julianser/hed-dlg-truncated].

· [2016 ACL] On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems, [paper].

· [2016 EMNLP] Deep Reinforcement Learning for Dialogue Generation, [paper], sources: [liuyuemaicha/Deep-Reinforcement-Learning-for-Dialogue-Generation-in-tensorflow].

· [2016 EMNLP] Multi-view Response Selection for Human-Computer Conversation, [paper].

· [2017 ACM] A Survey on Dialogue Systems: Recent Advances and New Frontiers, [paper], sources: [shawnspace/survey-in-dialog-system].

· [2017 EMNLP] Adversarial Learning for Neural Dialogue Generation, [paper], sources: [jiweil/Neural-Dialogue-Generation], [liuyuemaicha/Adversarial-Learning-for-Neural-Dialogue-Generation-in-Tensorflow].

· [2017 ACL] Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots, [paper], sources: [MarkWuNLP/MultiTurnResponseSelection], [krayush07/sequential-match-network].

序列标签( POS、NER、SRL、RE、依存分析、实体链接、标点符号恢复等))

包括词性标注、短语识别、命名实体识别( NER )、语义角色标注( SRL )、标点符号恢复、句子分割、依存分析、关系提取、实体链接等。

词性标注和命名实体识别

· [2010 ACL] On Jointly Recognizing and Aligning Bilingual Named Entities, [paper], [bibtex].

· [2012 CIKM] Joint Bilingual Name Tagging for Parallel Corpora, [paper], [bibtex].

· [2012 Springer] Supervised Sequence Labelling with Recurrent Neural Networks, [Alex Graves's Ph.D. Thesis].

· [2015 ArXiv] Bidirectional LSTM-CRF Models for Sequence Tagging, [paper], [bibtex] [blog], sources: [Hironsan/anago], [guillaumegenthial/sequence_tagging].

· [2015 Cheminformatics] Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization, [paper], [bibtex].

· [2016 ArXiv] Multi-Task Cross-Lingual Sequence Tagging from Scratch, [paper], [bibtex].

· [2016 EMNLP] Improving Multilingual Named Entity Recognition with Wikipedia Entity Type Mapping, [paper], [bibtex].

· [2016 NAACL] Name Tagging for Low-resource Incident Languages based on Expectation-driven Learning, [paper], [bibtex].

· [2016 ICLR] Multi-task Sequence to Sequence Learning, [paper], [bibtex].

· [2016 ACL] Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss, [paper], [bibtex], sources: [bplank/bilstm-aux].

· [2016 ACL] Named Entity Recognition with Bidirectional LSTM-CNNs, [paper], [bibtex], sources: [ThanhChinhBK/Ner-BiLSTM-CNNs].

· [2016 NAACL] Neural Architectures for Named Entity Recognition, [paper], [bibtex], sources: [clab/stack-lstm-ner], [glample/tagger], [marekrei/sequence-labeler].

· [2016 ACL] End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF, [paper], [bibtex], sources: [LopezGG/NN_NER_tensorFlow].

· [2017 IJCNLP] Segment-Level Neural Conditional Random Fields for Named Entity Recognition, [paper], [bibtex].

· [2017 IJCNLP] Low-Resource Named Entity Recognition with Cross-Lingual, Character-Level Neural Conditional Random Fields, [paper], [bibtex].

· [2017 WNUT] A Multi-task Approach for Named Entity Recognition in Social Media Data, [paper], [bibtex], sources: [tavo91/NER-WNUT17].

· [2017 ACL] Weakly Supervised Cross-Lingual Named Entity Recognition via Effective Annotation and Representation Projection, [paper], [bibtex].

· [2017 RLNLP] Multi-task Domain Adaptation for Sequence Tagging, [paper], [bibtex].

· [2017 EMNLP] Cheap Translation for Cross-Lingual Named Entity Recognition, [paper], [bibtex].

· [2017 ACL] Semi-supervised Multitask Learning for Sequence Labeling, [paper], [bibtex].

· [2017 EMNLP] Part-of-Speech Tagging for Twitter with Adversarial Neural Networks, [paper], [bibtex], sources: [guitaowufeng/TPANN].

· [2017 EMNLP] Fast and Accurate Entity Recognition with Iterated Dilated Convolutions, [paper], [bibtex] sources: [iesl/dilated-cnn-ner].

· [2017 ICLR] Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks, [paper], [bibtex], sources: [kimiyoung/transfer].

· [2017 ArXiv] Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks, [paper], [bibtex], sources: [UKPLab/emnlp2017-bilstm-cnn-crf].

· [2017 EMNLP] Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging, [paper], [bibtex], sources: [UKPLab/emnlp2017-bilstm-cnn-crf].

· [2017 InterSpeech] Label-dependency coding in Simple Recurrent Networks for Spoken Language Understanding, [paper], [bibtex].

· [2017 ACL] Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary, [paper], [bibtex], sources: [mengf1/trpos].

· [2017 EMNLP] Semi-Supervised Structured Prediction with Neural CRF Autoencoder, [paper], [bibtex], sources: [cosmozhang/NCRF-AE].

· [2017 EMNLP] Cross-Lingual Transfer Learning for POS Tagging without Cross-Lingual Resources, [paper], [bibtex].

· [2017 ACL] Semi-supervised Sequence Tagging with Bidirectional Language Models, [paper], [bibtex].

· [2018 LREC] Transfer Learning for Named-Entity Recognition with Neural Networks, [paper], [bibtex], sources: [Franck-Dernoncourt/NeuroNER].

· [2018 ICLR] Deep Active Learning for Named Entity Recognition, [paper], [bibtex].

· [2018 AAAI] Empower Sequence Labeling with Task-Aware Neural Language Model, [paper], [bibtex], sources: [LiyuanLucasLiu/LM-LSTM-CRF].

· [2018 NAACL] Robust Multilingual Part-of-Speech Tagging via Adversarial Training, [paper], [bibtex], sources: [michiyasunaga/pos_adv].

· [2018 ArXiv] Improving Part-of-speech Tagging Via Multi-task Learning and Character-level Word Representations, [paper], [bibtex].

· [2018 NAACL] Label-aware Double Transfer Learning for Cross-Specialty Medical Named Entity Recognition, [paper], [bibtex], sources: [felixwzh/La-DTL].

· [2018 NAACL] Zero-shot Sequence Labeling: Transferring Knowledge from Sentences to Tokens, [paper], [bibtex].

· [2018 ACL] Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings, [paper], [bibtex], sources: [google/meta_tagger].

· [2018 ACL] Named Entity Recognition With Parallel Recurrent Neural Networks, [paper], [bibtex].

· [2018 ACL] Chinese NER Using Lattice LSTM, [paper], [bibtex] sources: [jiesutd/LatticeLSTM].

· [2018 ACL] Hybrid semi-Markov CRF for Neural Sequence Labeling, [paper], [bibtex] sources: [ZhixiuYe/HSCRF-pytorch].

· [2018 ACL] A Multi-lingual Multi-task Architecture for Low-resource Sequence Labeling, [paper], [bibtex], sources: [limteng-rpi/mlmt].

· [2018 AAAI] Adversarial Learning for Chinese NER from Crowd Annotations, [paper], [bibtex], sources: [SUDA-HLT/ALCrowd].

· [2018 IJCAI] Improving Low Resource Named Entity Recognition using Cross-lingual Knowledge Transfer, [paper], [bibtex], sources: [scir-code/lrner].

· [2018 COLING] Contextual String Embeddings for Sequence Labeling, [paper], [bibtex], sources: [zalandoresearch/flair].

语义角色标注 (SRL)

· [2015 ACL] End-to-end Learning of Semantic Role Labeling using Recurrent Neural Networks, [paper], [bibtex] sources: [sanjaymeena/semantic_role_labeling_deep_learning], [hiroki13/neural-semantic-role-labeler].

· [2016 ACL] Neural Semantic Role Labeling with Dependency Path Embeddings, [paper], [bibtex] sources: [microth/PathLSTM].

· [2017 ACL] Deep Semantic Role Labeling: What Works and Whats Next, [paper], [bibtex], sources: [luheng/deep_srl].

· [2018 AAAI] Deep Semantic Role Labeling with Self-Attention, [paper], [bibtex], sources: [XMUNLP/Tagger].

· [2018 EMNLP] Linguistically-Informed Self-Attention for Semantic Role Labeling, [paper], [Supplemental Material], [bibtex], [author], [slides], [slides w/ notes], sources: [strubell/LISA].

标点符号恢复、句子分割

· [2016 Interspeech] Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration, [paper], [bibtex], sources: [ottokart/punctuator2].

· [2017 ICASSP] Sequence-to-Sequence Models for Punctuated Transcription Combing Lexical and Acoustic Features, [paper], [bibtex], sources: [choko/acoustic_punctuation].

· [2017 SLSP] Attentional Parallel RNNs for Generating Punctuation in Transcribed Speech, [paper], [bibtex], [dataset], sources: [alpoktem/punkProse].

· [2017 EACL] Sentence Segmentation in Narrative Transcripts from Neuropsychological Tests using Recurrent Convolutional Neural Networks, [paper], [bibtex].

依存分析

· [2014 EMNLP] A Fast and Accurate Dependency Parser using Neural Networks, [paper], [bibtex] sources: [akjindal53244/dependency_parsing_tf], [ljj314zz/dependency_parsing_tf-master].

· [2016 TACL] Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representation, [paper], [bibtex], sources: [elikip/bist-parser].

· [2017 ICLR] Deep Bi-Affine Attention for Neural Dependency Parsing, [paper], [bibtex] sources: [tdozat/Parser-v1], [tdozat/Parser-v2].

· [2018 ACL] Simpler but More Accurate Semantic Dependency Parsing, [paper], [bibtex].

关系抽取与实体链接

· [2018 ACL] DSGAN: Generative Adversarial Training for Distant Supervision Relation Extraction, [paper], [bibtex].

知识图与社会网络表示

包括知识图谱完成/表示、社交网络表示等。

知识图谱补全/表示

· [2013 NIPS] Reasoning With Neural Tensor Networks for Knowledge Base Completion, [paper], sources: [siddharth-agrawal/Neural-Tensor-Network], [dddoss/tensorflow-socher-ntn].

· [2013 NIPS] TransE: Translating Embeddings for Modeling Multi-relational Data, [paper], sources: [thunlp/TensorFlow-TransX].

· [2014 AAAI] TransH: Knowledge Graph Embedding by Translating on Hyperplanes, [paper], sources: [thunlp/TensorFlow-TransX].

· [2015 EMNLP] PTransE: Modeling Relation Paths for Representation Learning of Knowledge Bases, [paper], [homepage], sources: [thunlp/Fast-TransX].

· [2015 AAAI] TransR: Learning Entity and Relation Embeddings for Knowledge Graph Completion, [paper], sources: [thunlp/TensorFlow-TransX].

· [2015 ACL] TransD: Knowledge Graph Embedding via Dynamic Mapping Matrix, [paper], sources: [thunlp/TensorFlow-TransX].

· [2016 AAAI] Knowledge Graph Completion with Adaptive Sparse Transfer Matrix, [paper], sources: [FrankWork/transparse], [thunlp/Fast-TransX].

· [2016 ACL] Commonsense Knowledge Base Completion, [paper], [homepage], sources: [Lorraine333/ACL_CKBC].

· [2017 AKBC] RelNet: End-to-End Modeling of Entities & Relations, [paper], [homepage].

· [2017 EMNLP] Context-Aware Representations for Knowledge Base Relation Extraction, [paper], sources: [UKPLab/emnlp2017-relation-extraction].

· [2018 AAAI] SenticNet 5: Discovering Conceptual Primitives for Sentiment Analysis by Means of Context Embeddings, [paper].

图/社交网络表示学习

· [2014 KDD] DeepWalk: Online Learning of Social Representations, [paper], sources: [phanein/deepwalk].

· [2016 NIPS] Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering, [paper], [bibtex], sources: [mdeff/cnn_graph], [xbresson/spectral_graph_convnets].

· [2018 ICLR] Graph Attention Networks, [paper], [bibtex], sources: [PetarV-/GAT], [Diego999/pyGAT], [danielegrattarola/keras-gat].

情感分析与文本分类

包括情感分析、立场是别和文本分类。

文本、段落和文本分类

· [2014 EMNLP] Convolutional Neural Networks for Sentence Classification, [paper], [bibtex] sources: [yoonkim/CNN_sentence], [dennybritz/cnn-text-classification-tf].

· [2015 ACL] Deep Unordered Composition Rivals Syntactic Methods for Text Classification, [paper], [bibtex], [slides], sources: [miyyer/dan].

· [2015 AAAI] Recurrent Convolutional Neural Networks for Text Classification, [paper], [bibtex], sources: [knok/rcnn-text-classification], [airalcorn2/Recurrent-Convolutional-Neural-Network-Text-Classifier].

· [2016 NAACL] Hierarchical Attention Networks for Document Classification, [paper], [bibtex], sources: [richliao/textClassifier], [ematvey/hierarchical-attention-networks].

· [2017 EACL] Bag of Tricks for Efficient Text Classification, [paper], [bibtex], sources: [facebookresearch/fastText].

· [2017 ArXiv] Which Encoding is the Best for Text Classification in Chinese, English, Japanese and Korean?, [paper], [bibtex], sources: [zhangxiangxiao/glyph].

· [2017 ArXiv] Multi-Task Label Embedding for Text Classification, [paper], [bibtex], [blog].

· [2017 ICLR] Adversarial Training Methods For Semi-Supervised Text Classification, [paper], [bibtex], sources: [TobiasLee/Text-Classification].

· [2017 IJCNLP] PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts, [paper], [bibtex], sources: [Franck-Dernoncourt/pubmed-rct].

· [2017 ACL] Adversarial Multi-task Learning for Text Classification, [paper], [bibtex], sources: [FrankWork/fudan_mtl_reviews].

· [2018 ArXiv] Densely Connected Bidirectional LSTM with Applications to Sentence Classification, [paper], [bibtex], source: [IsaacChanghau/Dense_BiLSTM].

· [2018 NAACL] Multinomial Adversarial Networks for Multi-Domain Text Classification, [paper], [bibtex] sources: [ccsasuke/man].

· [2018 EMNLP] Hierarchical Neural Networks for Sequential Sentence Classification in Medical Scientific Abstracts, [paper], [bibtex].

情感分析

· Introduction to Sentiment Analysis, [slides], [blog].

· [2013 EMNLP] Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank, [paper], sources: [rksltnl/RNTN], [awni/semantic-rntn], [rgobbel/rntn].

· [2014 ACL] A Convolutional Neural Network for Modelling Sentences, [paper], sources: [hritik25/Dynamic-CNN-for-Modelling-Sentences], [FredericGodin/DynamicCNN].

· [2015 EMNLP] Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-Level Multimodal Sentiment Analysis, [paper].

· [2015 ACL] Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks, cover semantic relatedness and sentiment classification tasks. [paper], sources: [stanfordnlp/treelstm], [nicolaspi/treelstm], [sapruash/RecursiveNN], [dallascard/TreeLSTM].

· [2016 EMNLP] A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis, [paper].

· [2016 EMNLP] Attention-based LSTM for Aspect-level Sentiment Classification, [paper], sources: [scaufengyang/TD-LSTM].

· [2016 ICDM] Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis, [paper], sources: [SenticNet/multimodal-sentiment-detection].

· [2017 ICME] Select-additive Learning: Improving Generalization in Multimodal Sentiment Analysis, [paper], sources: [HaohanWang/SelectAdditiveLearning].

· [2017 ICMI] Multimodal Sentiment Analysis with Word-Level Fusion and Reinforcement Learning, [paper].

· [2017 ACM SIGIR] Multitask Learning for Fine-Grained Twitter Sentiment Analysis, [paper], sources: [balikasg/sigir2017].

· [2017 EMNLP] Tensor Fusion Network for Multimodal Sentiment Analysis, [paper], sources: [A2Zadeh/TensorFusionNetwork].

· [2017 ACL] Context-Dependent Sentiment Analysis in User-Generated Videos, [paper], sources: [SenticNet/contextual-sentiment-analysis].

· [2018 ACL] Multiple Instance Learning Networks for Fine-Grained Sentiment Analysis, [paper], [data].

· [2018 AAAI] Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM, [paper].

· [2018 Cognitive Computation] Sentic LSTM: a Hybrid Network for Targeted Aspect-Based Sentiment Analysis, [paper], sources: [SenticNet/sentic-lstm].

立场是别

· [2016 ACM] Stance and Sentiment in Tweets, [paper].

· [2016 SemEval] SemEval-2016 Task 6: Detecting Stance in Tweets, [paper], [homepage], [The SemEval-2016 Stance Dataset].

· [2016 SemEval] DeepStance at SemEval-2016 Task 6: Detecting Stance in Tweets Using Character and Word-Level CNNs, [paper].

· [2016 SEM@ACL] Detecting Stance in Tweets And Analyzing its Interaction with Sentiment, [paper], sources: [vishaalmohan/twitter-stance-detection].

· [2018 ECIR] Topical Stance Detection for Twitter: A Two-Phase LSTM Model Using Attention, [paper].

字符/单词Embedding和baseline系统

Character Embeddings

· [2016 AAAI] Char2Vec: Character-Aware Neural Language Models, [paper], sources: [carpedm20/lstm-char-cnn-tensorflow], [yoonkim/lstm-char-cnn].

Word Embeddings

· [2008 NIPS] HLBL: A Scalable Hierarchical Distributed Language Model, [paper], [wenjieguan/Log-bilinear-language-models].

· [2010 INTERSPEECH] RNNLM: Recurrent neural network based language model, [paper], [Ph.D. Thesis], [slides], sources: [mspandit/rnnlm].

· [2013 NIPS] Word2Vec: Distributed Representations of Words and Phrases and their Compositionality, [paper], [word2vec explained], [params explained], [blog], sources: [word2vec], [dav/word2vec], [yandex/faster-rnnlm], [tf-word2vec], [zake7749/word2vec-tutorial].

· [2013 CoNLL] Better Word Representations with Recursive Neural Networks for Morphology, [paper].

· [2014 ACL] Word2Vecf: Dependency-Based Word Embeddings, [paper], [blog], sources: [Yoav Goldberg/word2vecf], [IsaacChanghau/Word2VecfJava].

· [2014 EMNLP] GloVe: Global Vectors for Word Representation, [paper], [homepage], sources: [stanfordnlp/GloVe].

· [2014 ICML] Compositional Morphology for Word Representations and Language Modelling, [paper], sources: [thompsonb/comp-morph], [claravania/subword-lstm-lm].

· [2015 ACL] Hyperword: Improving Distributional Similarity with Lessons Learned from Word Embeddings, [paper], sources: [Omer Levy/hyperwords].

· [2016 ICLR] Exploring the Limits of Language Modeling, [paper], [slides], sources: [tensorflow/models/lm_1b].

· [2016 CoNLL] Context2Vec: Learning Generic Context Embedding with Bidirectional LSTM, [paper], sources: [orenmel/context2vec].

· [2016 IEEE Intelligent Systems] How to Generate a Good Word Embedding?, [paper], [基于神经网络的词和文档语义向量表示方法研究], [blog], sources: [licstar/compare].

· [2016 ArXiv] Linear Algebraic Structure of Word Senses, with Applications to Polysemy, [paper], [slides], sources: [YingyuLiang/SemanticVector].

· [2017 ACL] FastText: Enriching Word Vectors with Subword Information, [paper], sources: [facebookresearch/fastText], [salestock/fastText.py].

· [2017 ArXiv] Implicitly Incorporating Morphological Information into Word Embedding, [paper].

· [2017 AAAI] Improving Word Embeddings with Convolutional Feature Learning and Subword Information, [paper], sources: [ShelsonCao/IWE].

· [2018 ICML] Learning K-way D-dimensional Discrete Codes for Compact Embedding Representations, [paper], supplementary, sources: [chentingpc/kdcode-lm].

· [2018 ICLR] Compressing Word Embeddings via Deep Compositional Code Learning, [paper], [bibtex], sources: [msobroza/compositional_code_learning].

Baseline 系统

· [2017 NIPS] Learned in Translation: Contextualized Word Vectors, [paper], sources: [salesforce/cove].

· [2018 NAACL] Deep contextualized word representations, [paper], [homepage], sources: [allenai/bilm-tf].

· [2018 ArXiv] GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations, [paper], [bibtex].

· [2018 ArXiv] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, [paper], [bibtex], sources: [google-research/bert], [huggingface/pytorch-pretrained-BERT].

句子表示、自然语言推理和摘要

包括句子embedding/表示、自然语言推理、句子匹配、文本蕴涵、文本摘要等。

句子 Embedding / 表示

· [2015 NIPS] Skip Thought Vectors, [paper], [bibtex], sources: [ryankiros/skip-thoughts].

· [2017 ICLR] A Simple But Tough-to-beat Baseline for Sentence Embeddings, [paper], [bibtex], sources: [PrincetonML/SIF].

· [2017 ICLR] A Structured Self-attentive Sentence Embedding, [paper], [bibtex], sources: [ExplorerFreda/Structured-Self-Attentive-Sentence-Embedding], [flrngel/Self-Attentive-tensorflow], [kaushalshetty/Structured-Self-Attention].

· [2017 EMNLP] Supervised Learning of Universal Sentence Representations from Natural Language Inference Data, [paper], [bibtex], sources: [facebookresearch/InferSent].

· [2018 ICLR] Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning, [paper], [bibtex], sources: [Maluuba/gensen].

· [2018 ArXiv] Universal Sentence Encoder, [paper], [bibtex], sources: [TensorFlow Hub/universal-sentence-encoder], [helloeve/universal-sentence-encoder-fine-tune].

· [2018 ArXiv] Evaluation of Sentence Embeddings in Downstream and Linguistic Probing Tasks, [paper], [bibtex].

· [2018 EMNLP] XNLI: Evaluating Cross-lingual Sentence Representations, [paper], [bibtex], sources: [facebookresearch/XNLI].

· [2018 EMNLP] Dynamic Meta-Embeddings for Improved Sentence Representations, [paper], [bibtex], sources: [facebookresearch/DME].

自然语言推理（文本蕴含，句子匹配）

· [2016 NAACL] Learning Natural Language Inference with LSTM, [paper], [bibtex], source: [shuohangwang/SeqMatchSeq].

· [2017 IJCAI] BiMPM: Bilateral Multi-Perspective Matching for Natural Language Sentences, [paper], [bibtex], sources: [zhiguowang/BiMPM].

· [2017 ArXiv] Distance-based Self-Attention Network for Natural Language Inference, [paper], [bibtex].

· [2018 AAAI] DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding, [paper], [bibtex], sources: [taoshen58/DiSAN].

· [2018 IJCAI] Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling, [paper], [bibtex].

文本摘要

· [2017 ACL] Get To The Point: Summarization with Pointer-Generator Networks, [paper], [bibtex], sources: [abisee/pointer-generator], [abisee/cnn-dailymail], [JafferWilson/Process-Data-of-CNN-DailyMail].

· [2018 ICLR] Generating Wikipedia by Summarizing Long Sequences, [paper], [bibtex], sources: [tensorflow/tensor2tensor · wikisum].

可解释性、消歧、回指和语篇

包括可解释性、歧义消除、回指、语篇关系表征等。

可解释性

· [2012 COLING] Learning Effective and Interpretable Semantic Models using Non-Negative Sparse Embedding, [paper], [homepage].

· [2015 NAACL] A Compositional and Interpretable Semantic Space, [paper].

· [2015 EMNLP] Online Learning of Interpretable Word Embeddings, [paper].

· [2015 ACL] SPOWV: Sparse Overcomplete Word Vector Representations, [paper], [mfaruqui/sparse-coding].

· [2016 IJCAI] Sparse Word Embeddings Using l1 Regularized Online Learning, [paper], [slides].

· [2017 ArXiv] Semantic Structure and Interpretability of Word Embeddings, [paper].

· [2017 EMNLP] Rotated Word Vector Representations and their Interpretability, [paper], [poster], sources: [SungjoonPark/factor_rotation], [mvds314/factor_rotation].

· [2018 AAAI] SPINE: SParse Interpretable Neural Embeddings, [paper], sources: [harsh19/SPINE].

· [2018 ACL] Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder, [paper], [bibtex], sources: [tianran/glimvec].

消歧

· [2015 VSM] A Simple Word Embedding Model for Lexical Substitution, [paper], sources: [orenmel/lexsub].

· [2017 EMNLP] Deep Joint Entity Disambiguation with Local Neural Attention, [paper], sources: [dalab/deep-ed].

共指和回指消解

· [2012 EMNLP] Joint Entity and Event Coreference Resolution across Documents, [paper], [bibtex].

· [2016 EMNLP] Deep Reinforcement Learning for Mention-Ranking Coreference Models, [paper], [bibtex], [blog], [demo], sources: [huggingface/neuralcoref], [clarkkev/deep-coref].

· [2016 ACL] Improving Coreference Resolution by Learning Entity-Level Distributed Representations, [paper], [bibtex], sources: [clarkkev/deep-coref].

· [2017 ArXiv] Linguistic Knowledge as Memory for Recurrent Neural Networks, [paper], [bibtex].

语篇关系表征与识别

· [2017 EMNLP] Multi-task Attention-based Neural Networks for Implicit Discourse Relationship Representation and Identification, [paper], [bibtex].

多任务和未报告的研究工作

包括多任务学习、NLP调研、NLP优化方法、语法纠错等。

多任务学习

· [2011 JMLR] Natural Language Processing (Almost) from Scratch, cover Tagging, Chunking, Parsing, NER, SRL and etc.tasks, [paper], [bibtex], sources: [attardi/deepnl].

· [2017 EMNLP] A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks, cover Tagging, Chunking, Parsing, Relatedness, Entailment tasks, [paper], [bibtex], [blog], sources: [rubythonode/joint-many-task-model].

· [2018 ICLR] Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling, [paper], [bibtex], sources: [taoshen58/BiBloSA].

· [2018 CoNLL] Sequence Classification with Human Attention, [paper], [bibtex], sources: [coastalcph/Sequence_classification_with_human_attention].

· [2018 ArXiv] Improving Language Understanding by Generative Pre-Training, [paper], [bibtex], [homepage], sources: [openai/finetune-transformer-lm].

自然语言调研

· [2018 JAIR] Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation, [paper].

· [2018 CIM] Recent Trends in Deep Learning Based Natural Language Processing, [paper].

语法纠错

· [2014 CoNLL] The CoNLL-2014 Shared Task on Grammatical Error Correction, [paper], [bibtex] [homepage].

· [2018 AAAI] A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction, [paper], [bibtex], [nusnlp/mlconvgec2018].

其他

· [2016 EMNLP] How Transferable are Neural Networks in NLP Applications?, [paper].

· [2017 ICML] Language Modeling with Gated Convolutional Networks, [paper], sources: [anantzoid/Language-Modeling-GatedCNN], [jojonki/Gated-Convolutional-Networks].

· [2017 CIKM] Commonsense for Machine Intelligence: Text to Knowledge and Knowledge to Text, [slides], [CIKM 2017 Singapore Tutorials], [Commonsense for Machine Intelligence, Allen Institute, CIKM 2017 TUTORIAL], [Allen Institute].

· [2017 ICLR] An Actor Critic Algorithm for Structured Prediction, [paper], [bibtex], sources: [rizar/actor-critic-public].

· [2017 ACL] Learning When to Skim and When to Read, [paper], [blog].

· [2018 ArXiv] Fast Directional Self-Attention Mechanism, [paper], [bibtex], sources: [taoshen58/Fast-DiSA].

· [2018 ICLR] Regularizing and Optimizing LSTM Language Models, [paper], [bibtex], sources: [salesforce/awd-lstm-lm], author page: [Nitish Shirish Keskar].