As large dialogue models become commonplace in practice, the problems surrounding high compute requirements for training, inference and larger memory footprint still persists. In this work, we present AUTODIAL, a multi-task dialogue model that addresses the challenges of deploying dialogue model. AUTODIAL utilizes parallel decoders to perform tasks such as dialogue act prediction, domain prediction, intent prediction, and dialogue state tracking. Using classification decoders over generative decoders allows AUTODIAL to significantly reduce memory footprint and achieve faster inference times compared to existing generative approach namely SimpleTOD. We demonstrate that AUTODIAL provides 3-6x speedups during inference while having 11x fewer parameters on three dialogue tasks compared to SimpleTOD. Our results show that extending current dialogue models to have parallel decoders can be a viable alternative for deploying them in resource-constrained environments.
翻译:随着大型对话模式在实践中变得司空见惯,围绕培训、推论和更大的记忆足迹的高计算要求问题仍然存在。在这项工作中,我们介绍了AUTODIAL,这是一个多任务对话模式,处理部署对话模式的挑战。AUKODIAL利用平行解码器执行诸如对话行为预测、域预测、意图预测和对话状态跟踪等任务。使用分类解码器而不是基因解码器使AUTIODIAL能够大大减少记忆足迹并实现比现有的基因化方法,即简单TOD更快的推导时间。我们证明,AUTODIAL在推断期间提供3-6x加速率,而与简单TOD相比,3个对话任务有11x的参数。我们的结果显示,将现有对话模式扩大为平行解码器,可以作为在资源紧张的环境中部署这些模式的可行替代办法。</s>