As the creation of task-oriented conversational data is costly, data augmentation techniques have been proposed to create synthetic data to improve model performance in new domains. Up to now, these learning-based techniques (e.g. paraphrasing) still require a moderate amount of data, making application to low-resource settings infeasible. To tackle this problem, we introduce an augmentation framework that creates synthetic task-oriented dialogues, operating with as few as 5 shots. Our framework utilizes belief state annotations to define dialogue functions of each turn pair. It then creates templates of pairs through de-lexicalization, where the dialogue function codifies the allowable incoming and outgoing links of each template. To generate new dialogues, our framework composes allowable adjacent templates in a bottom-up manner. We evaluate our framework using TRADE as the base DST model, observing significant improvements in the fine-tuning scenarios within a low-resource setting. We conclude that this end-to-end dialogue augmentation framework can be a practical tool for natural language understanding performance in emerging task-oriented dialogue domains.
翻译:由于创建面向任务的谈话数据费用高昂,因此提议了数据增强技术,以创建合成数据,改进新领域的模型性能。到目前为止,这些基于学习的技术(例如抛光法)仍需要少量数据,使得对低资源环境的应用不可行。为了解决这一问题,我们引入了一个增强框架,创建以任务为导向的综合对话,仅用5个镜头运作。我们的框架利用信仰状态说明来界定每个转角对的对话功能。然后通过脱灵活化创建配对模板,其中对话功能将每个模板的允许进出链接编码。为生成新对话,我们的框架以自下而上的方式构建了可允许的相邻模板。我们用贸易作为DST模型来评估我们的框架,观察在低资源环境下微调情景方面的重大改进。我们的结论是,这一端对端对话增强框架可以成为在新兴的任务导向对话领域自然语言理解业绩的实用工具。