Natural language understanding (NLU) converts sentences into structured semantic forms. The paucity of annotated training samples is still a fundamental challenge of NLU. To solve this data sparsity problem, previous work based on semi-supervised learning mainly focuses on exploiting unlabeled sentences. In this work, we introduce a dual task of NLU, semantic-to-sentence generation (SSG), and propose a new framework for semi-supervised NLU with the corresponding dual model. The framework is composed of dual pseudo-labeling and dual learning method, which enables an NLU model to make full use of data (labeled and unlabeled) through a closed-loop of the primal and dual tasks. By incorporating the dual task, the framework can exploit pure semantic forms as well as unlabeled sentences, and further improve the NLU and SSG models iteratively in the closed-loop. The proposed approaches are evaluated on two public datasets (ATIS and SNIPS). Experiments in the semi-supervised setting show that our methods can outperform various baselines significantly, and extensive ablation studies are conducted to verify the effectiveness of our framework. Finally, our method can also achieve the state-of-the-art performance on the two datasets in the supervised setting. Our code is available at \url{https://github.com/rhythmcao/slu-dual-learning.git}.
翻译:自然语言理解( NLU) 将句子转换为结构化的语义形式。 缺少附加说明的培训样本仍然是NLU的一个基本挑战。 为了解决数据宽度问题, 先前基于半监督学习的工作主要侧重于利用未贴标签的句子。 在这项工作中, 我们引入了NLU、 语义到文义生成( SSG) 的双重任务, 并提议了一个半监督的NLU 和相应的双重模式的新框架。 该框架由双伪标签和双轨学习方法组成, 使NLU模型能够通过原始和双重任务的封闭式循环充分利用数据( 标签和未贴标签的)。 通过整合双重任务, 框架可以利用纯的语义形式和未贴标签的句子生成( SSG ), 并在封闭式循环中进一步改进 NLU 和 SSG 模式。 在两个公共数据集( ATIS 和 SNIPS) 上对拟议的方法进行了评估。 在半监督的设置中进行实验, 显示我们的方法可以大大超越了我们现有/ II 的运行框架的两种方法, 。