As large dialogue models become commonplace in practice, the problems surrounding high compute requirements for training, inference and larger memory footprint still persists. In this work, we present AUTODIAL, a multi-task dialogue model that addresses the challenges of deploying dialogue model. AUTODIAL utilizes parallel decoders to perform tasks such as dialogue act prediction, domain prediction, intent prediction, and dialogue state tracking. Using classification decoders over generative decoders allows AUTODIAL to significantly reduce memory footprint and achieve faster inference times compared to existing generative approach namely SimpleTOD. We demonstrate that AUTODIAL provides 3-6x speedups during inference while having 11x fewer parameters on three dialogue tasks compared to SimpleTOD. Our results show that extending current dialogue models to have parallel decoders can be a viable alternative for deploying them in resource-constrained environments.
翻译:随着大型对话模型越来越常见,训练、推理和更大的内存占用方面的高计算要求仍然存在问题。在本文中,我们提出了AUTODIAL,一种多任务对话模型,旨在解决部署对话模型的挑战。AUTODIAL利用并行解码器执行诸如对话行为预测、域预测、意图预测和对话状态跟踪等任务。使用分类解码器而不是生成式解码器让AUTODIAL能够显著减少内存占用,并在推理时达到比现有的生成式方法——SimpleTOD更快的速度。我们证明了,在三个对话任务上,与SimpleTOD相比,AUTODIAL在推理时提供了3-6倍的速度提升,同时参数数量减少了11倍。我们的结果表明,将当前的对话模型扩展为具有并行解码器可以成为在资源限制环境中部署它们的可行替代方案。