The collection and annotation of task-oriented conversational data is a costly and time-consuming manner. Many augmentation techniques have been proposed to improve the performance of state-of-the-art (SOTA) systems in new domains that lack the necessary amount of data for training. However, these augmentation techniques (e.g. paraphrasing) also require some mediocre amount of data since they use learning-based approaches. This makes using SOTA systems in emerging low-resource domains infeasible. We, to tackle this problem, introduce a framework, that creates synthetic task-oriented dialogues in a fully automatic manner, which operates with input sizes of as small as a few dialogues. Our framework uses the simple idea that each turn-pair in a task-oriented dialogue has a certain function and exploits this idea to mix them creating new dialogues. We evaluate our framework within a low-resource setting by integrating it with a SOTA model TRADE in the dialogue state tracking task and observe significant improvements in the fine-tuning scenarios in several domains. We conclude that this end-to-end dialogue augmentation framework can be a crucial tool for natural language understanding performance in emerging task-oriented dialogue domains.
翻译:收集和说明以任务为导向的对话数据是一种昂贵和耗时的方式。许多增强技术已被提出来,以改善在缺乏必要数量培训数据的新领域最先进(SOTA)系统的业绩。然而,这些增强技术(例如parphrasing)也要求使用一些中等数量的数据,因为它们使用学习方法。这使得在新兴的低资源领域使用SOTA系统不可行。我们为解决这一问题,引入了一个框架,以完全自动的方式创建以任务为导向的综合对话,这种对话以小于少数对话的输入大小运作。我们的框架使用简单的想法,即任务导向对话中的每一个翻转都具有某种功能,并利用这种想法将它们混合产生新的对话。我们通过在对话状态跟踪任务中将其与SOTA模式贸易相结合,在低资源环境下评估我们的框架,并观察若干领域微调情景的重大改进。我们的结论是,这一端到端的对话增强框架可以成为新兴任务导向对话领域自然语言理解业绩的关键工具。