We introduce a novel setup for low-resource task-oriented semantic parsing which incorporates several constraints that may arise in real-world scenarios: (1) lack of similar datasets/models from a related domain, (2) inability to sample useful logical forms directly from a grammar, and (3) privacy requirements for unlabeled natural utterances. Our goal is to improve a low-resource semantic parser using utterances collected through user interactions. In this highly challenging but realistic setting, we investigate data augmentation approaches involving generating a set of structured canonical utterances corresponding to logical forms, before simulating corresponding natural language and filtering the resulting pairs. We find that such approaches are effective despite our restrictive setup: in a low-resource setting on the complex SMCalFlow calendaring dataset (Andreas et al., 2020), we observe 33% relative improvement over a non-data-augmented baseline in top-1 match.
翻译:我们引入了一个用于低资源任务导向语义分析的新构件,它包含了现实世界情景中可能出现的若干制约因素:(1) 缺乏来自相关领域的类似数据集/模型,(2) 无法直接从语法中抽样使用有用的逻辑形式,(3) 自然语义没有标签的隐私要求。我们的目标是利用通过用户互动收集的语句改进低资源语义分析器。在这个极具挑战性但现实的环境下,我们调查数据增强方法,包括在模拟相应的自然语言和过滤所产生的配对之前,产生一套与逻辑形式相对应的结构化语句。我们发现,尽管我们设置了限制性的设置:在复杂的 SMCalFlow 日历数据集的低资源环境中(Andreas等人,2020年),我们在前一匹配中观察到了与非数据推荐基线相比的33%的相对改进。