槽位的 context 向量和意图的 context 向量组合通过门结构(其中 v 和 W 都是可训练的):
,d 是输入向量 h 的维度。,获得 的权重。论文源码使用的是:
用 g 作为预测 的权重向量:
Stack-Propagation
A Stack-Propagation Framework with Token-level Intent Detection for Spoken Language Understanding [18] Git: https://github.com/%20LeePleased/StackPropagation-SLU
其中前面部分是正样本,后面部分是负样本损失,就是采样的负样本集合。方法很质朴,我觉得比 pu learning 有效。作者还证明了通过负采样,不将未标注实体作为负样本的概率大于 (1-2/(n-5)),缓解未标注问题。
2.4 预训练语言模型
这个主要是 bert 相关的优化。对于下游任务,包括 NER 也有提升,就不展开了,见图:
参考文献
[1] Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data [2] Bidirectional LSTM-CRF Models for Sequence Tagging:https://arxiv.org/abs/1508.01991v1[3] Neural Architectures for Named Entity Recognition:https://arxiv.org/abs/1603.01360[4] Transition-based dependency parsing with stack long-short-term memory:http://www.oalib.com/paper/4074644[5] End-to-end Sequence Labeling via Bi-directional LSTM- CNNs-CRF:https://www.aclweb.org/anthology/P16-1101.pdf[6] Fast and Accurate Entity Recognition with Iterated Dilated Convolutions:https://arxiv.org/abs/1702.02098[7] Joint Slot Filling and Intent Detection via Capsule Neural Networks:https://arxiv.org/abs/1812.09471[8] Dynamic Routing Between Capsules:http://papers.nips.cc/paper/6975-dynamic-routing-between-capsules.pdf[9] Neural Architectures for Named Entity Recognition:https://arxiv.org/abs/1603.01360[10] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding:https://arxiv.org/abs/1810.04805[11] Neural Architectures for Named Entity Recognition:https://arxiv.org/abs/1603.01360[12] Attending to Characters in Neural Sequence Labeling Models:https://arxiv.org/abs/1611.04361[13] Character-Based LSTM-CRF with Radical-LevelFeatures for Chinese Named Entity Recognition:http://www.nlpr.ia.ac.cn/cip/ZongPublications/2016/13董传海Character-Based%20LSTM-CRF%20with%20Radical-Level%20Features%20for%20Chinese%20Named%20Entity%20Recognition.pdf[14] Named Entity Recognition with Character-Level Models:https://nlp.stanford.edu/manning/papers/conll-ner.pdf[15] Improving Named Entity Recognition for Chinese Social Mediawith Word Segmentation Representation Learning:https://www.aclweb.org/anthology/P16-2025[16] Slot-Gated Modeling for Joint Slot Filling and Intent Prediction:https://aclanthology.org/N18-2118[17] Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling:https://blog.csdn.net/shine19930820/article/details/83052232[18] A Stack-Propagation Framework with Token-level Intent Detection for Spoken Language Understanding:https://www.aclweb.org/anthology/D19-1214/[19] BERT for Joint Intent Classification and Slot Filling:https://arxiv.org/abs/1902.10909[20] SpanNER: Named EntityRe-/Recognition as Span Prediction (https://arxiv.org/pdf/2106.00641v1.pdf)[21] Coarse-to-Fine Pre-training for Named Entity Recognition (https://aclanthology.org/2020.emnlp-main.514.pdf)[22] A Unified MRC Framework for Named Entity Recognition (https://arxiv.org/pdf/1910.11476v6.pdf)[23] Span-Level Model for Relation Extraction (https://aclanthology.org/P19-1525.pdf)[24] Instance-Based Learning of Span Representations (https://aclanthology.org/2020.acl-main.575)[25] SpERT:Span-based Joint Entity and Relation Extraction with Transformer Pre-training (https://arxiv.org/abs/1909.07755)[26] https://medium.com/jasonwu0731/pre-finetuning-domain-adaptive-pre-training-of-language-models-db8fa9747668[27] https://arxiv.org/pdf/2108.00801.pdf[28] https://arxiv.org/pdf/1911.04474.pdf[29] https://arxiv.org/pdf/2004.11795.pdf[30] ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations (https://arxiv.org/abs/1911.00720)[31] https://arxiv.org/pdf/2008.11869.pdf[32] K-BERT: Enabling Language Representation with Knowledge Graph (https://arxiv.org/pdf/1909.07606.pdf)[33] Learning Named Entity Tagger using Domain-Specific Dictionary (https://arxiv.org/abs/1809.03599)[34] Better Modeling of Incomplete Annotations for Named Entity Recognition (https://aclanthology.org/N19-1079.pdf)[35] https://arxiv.org/abs/1702.04457[36] Training Named Entity Tagger from Imperfect Annotations (https://arxiv.org/abs/1909.01441)[37] Distantly Supervised Named Entity Recognition using Positive-Unlabeled Learning (https://arxiv.org/abs/1906.01378)[38] Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition (https://arxiv.org/pdf/2012.05426)