Linear chain conditional random fields (CRFs) combined with contextual word embeddings have achieved state of the art performance on sequence labeling tasks. In many of these tasks, the identity of the neighboring words is often the most useful contextual information when predicting the label of a given word. However, contextual embeddings are usually trained in a task-agnostic manner. This means that although they may encode information about the neighboring words, it is not guaranteed. It can therefore be beneficial to design the sequence labeling architecture to directly extract this information from the embeddings. We propose locally-contextual nonlinear CRFs for sequence labeling. Our approach directly incorporates information from the neighboring embeddings when predicting the label for a given word, and parametrizes the potential functions using deep neural networks. Our model serves as a drop-in replacement for the linear chain CRF, consistently outperforming it in our ablation study. On a variety of tasks, our results are competitive with those of the best published methods. In particular, we outperform the previous state of the art on chunking on CoNLL 2000 and named entity recognition on OntoNotes 5.0 English.
翻译:有条件的线性链随机字段( CRFs) 与上下文字嵌入符一起, 实现了序列标签任务的最新性能。 在许多这些任务中, 相邻单词的身份通常是预测给定单词标签时最有用的背景信息。 但是, 相邻单词通常以任务不可知的方式培训。 这意味着, 虽然它们可以对相邻单词的信息进行编码, 但无法保证它。 因此, 设计序列标签结构以直接从嵌入中提取这些信息可能是有益的 。 我们为序列标签提议了本地的线性非线性通用报告格式。 我们的方法在预测给定单词标签时, 直接纳入了相邻词的身份信息, 并使用深层线性网络将潜在功能配对准。 我们的模型可以作为线性通用报告格式的自动替换, 持续超过我们的连接研究。 在一系列任务中, 我们的结果与最佳发布的方法相比是竞争性的 。 特别是, 我们比2000 CONLLLL 和 点 5.0 上命名实体识别 5.0 上 。