The emergence of pretrained large language models has led to the deployment of a range of social chatbots for chitchat. Although these chatbots demonstrate language ability and fluency, they are not guaranteed to be engaging and can struggle to retain users. This work investigates the development of social chatbots that prioritize user engagement to enhance retention, specifically examining the use of human feedback to efficiently develop highly engaging chatbots. The proposed approach uses automatic pseudo-labels collected from user interactions to train a reward model that can be used to reject low-scoring sample responses generated by the chatbot model at inference time. Intuitive evaluation metrics, such as mean conversation length (MCL), are introduced as proxies to measure the level of engagement of deployed chatbots. A/B testing on groups of 10,000 new daily chatbot users on the Chai Research platform shows that this approach increases the MCL by up to 70%, which translates to a more than 30% increase in user retention for a GPT-J 6B model. Future work aims to use the reward model to realise a data fly-wheel, where the latest user conversations can be used to alternately fine-tune the language model and the reward model.
翻译:随着预训练大型语言模型的出现,已经部署了一系列社交聊天机器人进行闲聊。虽然这些聊天机器人展示了语言能力和流利度,但并不能保证具有吸引力,并且很难留住用户。本文研究了开发社交聊天机器人以优先考虑用户互动、增强留存率的方法,具体地探讨了利用人类反馈来高效地开发高度互动的聊天机器人。所提出的方法使用从用户交互中收集的自动伪标签来训练一个奖励模型,该模型可以用于拒绝推理时由聊天机器人模型生成的低得分样本响应。介绍了可视化评估指标,如平均对话长度(MCL),作为衡量已部署聊天机器人的吸引力水平的代理。在Chai Research平台上每日10,000新聊天机器人用户的A/B测试表明,这种方法可以将MCL提高70%,这相当于GPT-J 6B模型用户留存率提高了30%以上。未来的工作旨在使用奖励模型来实现数据滚轮,其中最新的用户对话可以被用来交替微调语言模型和奖励模型。