The emergence of pretrained large language models has led to the deployment of a range of social chatbots for chitchat. Although these chatbots demonstrate language ability and fluency, they are not guaranteed to be engaging and can struggle to retain users. This work investigates the development of social chatbots that prioritize user engagement to enhance retention, specifically examining the use of human feedback to efficiently develop highly engaging chatbots. The proposed approach uses automatic pseudo-labels collected from user interactions to train a reward model that can be used to reject low-scoring sample responses generated by the chatbot model at inference time. Intuitive evaluation metrics, such as mean conversation length (MCL), are introduced as proxies to measure the level of engagement of deployed chatbots. A/B testing on groups of 10,000 new daily chatbot users on the Chai Research platform shows that this approach increases the MCL by up to 70%, which translates to a more than 30% increase in user retention for a GPT-J 6B model. Future work aims to use the reward model to realise a data fly-wheel, where the latest user conversations can be used to alternately fine-tune the language model and the reward model.
翻译:预先培训的大型语言模式的出现导致一系列社交聊天室模式的部署。虽然这些聊天室模式展示了语言能力和流畅度,但不能保证它们能够参与并能够努力留住用户。这项工作调查了社会聊天室的发展,这些社交聊天室将用户的参与作为优先事项,以加强保留,特别是研究人类反馈的使用情况,以有效开发高度参与的聊天室。拟议方法使用从用户互动中收集的自动假标签,以培训奖励模式,用以拒绝聊天室模式在发酵时产生的低可见度抽样反应。直观评价指标,如平均对话长度(MCL),作为衡量已部署聊天室参与程度的替代指标。A/B测试查伊研究平台上每天1万个新的聊天室用户组,显示这一方法将最低访问室增加70%,从而将GPT-J 6B模型的用户保留率增加30%以上。未来工作的目的是将奖励模式用于实现数据飞行和最新用户对话使用的奖励模式。</s>