Semantic parsing is an important NLP problem, particularly for voice assistants such as Alexa and Google Assistant. State-of-the-art (SOTA) semantic parsers are seq2seq architectures based on large language models that have been pretrained on vast amounts of text. To better leverage that pretraining, recent work has explored a reformulation of semantic parsing whereby the output sequences are themselves natural language sentences, but in a controlled fragment of natural language. This approach delivers strong results, particularly for few-shot semantic parsing, which is of key importance in practice and the focus of our paper. We push this line of work forward by introducing an automated methodology that delivers very significant additional improvements by utilizing modest amounts of unannotated data, which is typically easy to obtain. Our method is based on a novel synthesis of four techniques: joint training with auxiliary unsupervised tasks; constrained decoding; self-training; and paraphrasing. We show that this method delivers new SOTA few-shot performance on the Overnight dataset, particularly in very low-resource settings, and very compelling few-shot results on a new semantic parsing dataset.
翻译:语义解析是一个重要的非语言解析问题, 特别是对于Alexa 和 Google 助理等语音助理来说, 语义解析是一个重要的 NLP 问题。 状态语义解析是基于大型语言模型的后继2seq 结构, 这些模型在大量文本上已经预先培训。 为了更好地利用这一预演, 最近的工作探索了语义解析的重新组合, 输出序列本身是自然语言的句子, 但是在自然语言的控制片段中。 这种方法产生了强有力的结果, 特别是对于少数发相语义解析, 这在实践和我们论文的重点中都至关重要。 我们推进了这一工作线, 采用了一种自动化方法, 通过使用少量的非注释性数据( 通常很容易获得) 来带来巨大的额外改进。 我们的方法基于四种技术的新组合: 辅助性培训与辅助性非监管性任务; 约束解密; 自我培训; 和 paraphrasing 。 我们显示, 这种方法在超夜数据集上, 特别是在非常低的资源级环境环境中, 以及极具说服力的小片段数据。