We present GLM-Dialog, a large-scale language model (LLM) with 10B parameters capable of knowledge-grounded conversation in Chinese using a search engine to access the Internet knowledge. GLM-Dialog offers a series of applicable techniques for exploiting various external knowledge including both helpful and noisy knowledge, enabling the creation of robust knowledge-grounded dialogue LLMs with limited proper datasets. To evaluate the GLM-Dialog more fairly, we also propose a novel evaluation method to allow humans to converse with multiple deployed bots simultaneously and compare their performance implicitly instead of explicitly rating using multidimensional metrics.Comprehensive evaluations from automatic to human perspective demonstrate the advantages of GLM-Dialog comparing with existing open source Chinese dialogue models. We release both the model checkpoint and source code, and also deploy it as a WeChat application to interact with users. We offer our evaluation platform online in an effort to prompt the development of open source models and reliable dialogue evaluation systems. The additional easy-to-use toolkit that consists of short text entity linking, query generation, and helpful knowledge classification is also released to enable diverse applications. All the source code is available on Github.
翻译:我们提出了具有10B参数的大型语言模型GLM-Dialog(LLM),该模型具有10B参数,能够以中文进行基于知识的对话,使用搜索引擎获取互联网知识。GLM-Dialog提供一系列应用技术,以利用各种外部知识,包括有用和吵闹的知识,从而能够创建强有力的基于知识的对话LMLMLM(LM),拥有有限的适当数据集。为了更公平地评估GLM-Dialog(LM),我们还提出了一个新的评价方法,使人类能够与多个已部署的机器人同时交谈,并隐含地比较其性能,而不是使用多维度指标进行明确的评级。从自动到人的全面评价显示了GLM-Dialog(GLM-Dialog)与现有开放源的中国对话模型相比的优势。我们发布了示范检查站和源码,并将其作为 WeChat 应用程序,用于与用户互动。我们提供在线评价平台,以努力推动开发开放源模式和可靠的对话评价系统。额外的简易使用工具包,包括短文本实体连接、查询生成和帮助知识分类。还发布,以使各种应用。</s>