This paper studies the exposure bias problem in task-oriented dialog systems, where the model's generated content over multiple turns drives the dialog context away from the ground-truth distribution at training time, introducing error propagation and damaging the robustness of the TOD system. To bridge the gap between training and inference for multi-turn task-oriented dialogs, we propose session-level sampling which explicitly exposes the model to sampled generated content of dialog context during training. Additionally, we employ a dropout-based consistency regularization with the masking strategy R-Mask to further improve the robustness and performance of the model. The proposed UBARv2 achieves state-of-the-art performance on the standardized evaluation benchmark MultiWOZ and extensive experiments show the effectiveness of the proposed methods.
翻译:本文研究了任务导向对话系统中的暴露偏见问题,在这种系统中,模型在多个转折中生成的内容促使对话环境远离培训时间的地面实况分布,引入了错误传播,破坏了TOD系统的稳健性。为弥合培训与多方向任务导向对话推论之间的差距,我们提议在届会一级取样,明确暴露了模型在培训期间通过抽样生成的对话内容。此外,我们采用基于辍学的一致规范,与掩码战略R-Mask, 以进一步提高模型的稳健性和性。拟议的UBARv2在标准化评价基准多WOZ上取得了最先进的业绩,并进行了广泛的实验,展示了拟议方法的有效性。