To explore the limit of dialogue generation pre-training, we present the models of PLATO-XL with up to 11 billion parameters, trained on both Chinese and English social media conversations. To train such large models, we adopt the architecture of unified transformer with high computation and parameter efficiency. In addition, we carry out multi-party aware pre-training to better distinguish the characteristic information in social media conversations. With such designs, PLATO-XL successfully achieves superior performances as compared to other approaches in both Chinese and English chitchat. We further explore the capacity of PLATO-XL on other conversational tasks, such as knowledge grounded dialogue and task-oriented conversation. The experimental results indicate that PLATO-XL obtains state-of-the-art results across multiple conversational tasks, verifying its potential as a foundation model of conversational AI.
翻译:为了探索对话生成培训前的极限,我们展示了具有110亿参数的PLATO-XL模式,这些模型在中英社交媒体对话方面都受过培训;为了培训这些大型模型,我们采用了计算和参数效率高的统一变压器结构;此外,我们还开展了多党意识培训前培训,以更好地区分社交媒体对话中的特有信息;有了这些设计,PLATO-XL成功地取得了优异的成绩,而中文和英文聊天时采用的其他方法。我们进一步探索了PLATO-XL在其他对话任务方面的能力,如基于知识的对话和面向任务的对话。实验结果表明,PLATO-XL在多个对话任务中获得了最先进的成果,并验证了它作为对话AI的基础模型的潜力。