Chatbots are envisioned to dramatically change the future of Software Engineering, allowing practitioners to chat and inquire about their software projects and interact with different services using natural language. At the heart of every chatbot is a Natural Language Understanding (NLU) component that enables the chatbot to understand natural language input. Recently, many NLU platforms were provided to serve as an off-the-shelf NLU component for chatbots, however, selecting the best NLU for Software Engineering chatbots remains an open challenge. Therefore, in this paper, we evaluate four of the most commonly used NLUs, namely IBM Watson, Google Dialogflow, Rasa, and Microsoft LUIS to shed light on which NLU should be used in Software Engineering based chatbots. Specifically, we examine the NLUs' performance in classifying intents, confidence scores stability, and extracting entities. To evaluate the NLUs, we use two datasets that reflect two common tasks performed by Software Engineering practitioners, 1) the task of chatting with the chatbot to ask questions about software repositories 2) the task of asking development questions on Q&A forums (e.g., Stack Overflow). According to our findings, IBM Watson is the best performing NLU when considering the three aspects (intents classification, confidence scores, and entity extraction). However, the results from each individual aspect show that, in intents classification, IBM Watson performs the best with an F1-measure > 84%, but in confidence scores, Rasa comes on top with a median confidence score higher than 0.91. Our results also show that all NLUs, except for Dialogflow, generally provide trustable confidence scores. For entity extraction, Microsoft LUIS and IBM Watson outperform other NLUs in the two SE tasks. Our results provide guidance to software engineering practitioners when deciding which NLU to use in their chatbots.
翻译:聊天室的构想是大幅改变软件工程的未来,让开业者能够对软件工程项目进行交谈和询问,并使用自然语言与不同服务进行互动。 每一个聊天室的核心是自然语言理解( NLU) 部分, 使聊天室能够理解自然语言输入。 最近, 提供了许多 NLU 平台, 用作聊天室的非现版 NLU 部分。 然而, 为软件工程聊天室选择最好的 NLU 部分仍然是一个公开的挑战。 因此, 在本文中, 我们评估了四个最常用的NLU 部分, 即 IBM Watson、 Google Dialog 流、 Rasa 和 Microsoft LUIIS, 来说明NLU 在基于聊天室的软件工程中应用的自然语言理解。 具体地, 我们检查NLU在对意图、 信任度稳定性稳定、 和提取实体的NLVI 中, 使用两个反映两个共同任务的数据集, 与聊天室的聊天室来询问关于软件库的问题 。 运行 Stallix 。