Knowledgeable FAQ chatbots are a valuable resource to any organization. Unlike traditional call centers or FAQ web pages, they provide instant responses and are always available. Our experience running a COVID19 chatbot revealed the lack of resources available for FAQ answering in non-English languages. While powerful and efficient retrieval-based models exist for English, it is rarely the case for other languages which do not have the same amount of training data available. In this work, we propose a novel pretaining procedure to adapt ConveRT, an English SOTA conversational agent, to other languages with less training data available. We apply it for the first time to the task of Dutch FAQ answering related to the COVID19 vaccine. We show it performs better than an open-source alternative in a low-data regime and high-data regime.
翻译:与传统的呼叫中心或常见聊天室网页不同,我们使用COVID19聊天室的经验显示,对于以非英语回答常见问题来说,缺乏资源。虽然英语有强大而高效的检索模型,但其他语言没有同等数量的培训数据,却很少出现这种模式。在这项工作中,我们建议采用新的预设程序,使ConveRT(英语SOTA交谈代理)适应培训数据较少的其他语言。我们第一次将它应用到荷兰常见聊天室与COVID19疫苗有关的答复中。我们在低数据制度和高数据制度中,表现优于开放源替代方法。