Despite the valuable information contained in software chat messages, disentangling them into distinct conversations is an essential prerequisite for any in-depth analyses that utilize this information. To provide a better understanding of the current state-of-the-art, we evaluate five popular dialog disentanglement approaches on software-related chat. We find that existing approaches do not perform well on disentangling software-related dialogs that discuss technical and complex topics. Further investigation on how well the existing disentanglement measures reflect human satisfaction shows that existing measures cannot correctly indicate human satisfaction on disentanglement results. Therefore, in this paper, we introduce and evaluate a novel measure, named DLD. Using results of human satisfaction, we further summarize four most frequently appeared bad disentanglement cases on software-related chat to insight future improvements. These cases include (i) ignoring interaction patterns; (ii) ignoring contextual information; (iii) mixing up topics; and (iv) ignoring user relationships. We believe that our findings provide valuable insights on the effectiveness of existing dialog disentanglement approaches and these findings would promote a better application of dialog disentanglement in software engineering.
翻译:尽管软件聊天信息中载有宝贵的信息,但将其分解为不同的对话是利用这些信息进行任何深入分析的必要先决条件。为了更好地了解目前的最新技术,我们评估了与软件有关的聊天方面的五种大众对话分解方法。我们发现,现有方法在分离讨论技术和复杂议题的软件相关对话方面效果不佳。进一步调查现有的分解措施如何反映人类的满意度,表明现有措施不能正确表明人类对分解结果的满意度。因此,在本文件中,我们提出和评价了一种称为DLD的新措施。利用人类满意度的结果,我们进一步总结了四种与软件有关的聊天中最经常出现的不易分解情况,以洞察未来的改进。这些案例包括:(一) 忽视互动模式;(二) 忽视背景信息;(三) 将各种专题混在一起;(四) 忽视用户关系。我们认为,我们的调查结果提供了有价值的见解,说明现有分解方法的有效性,这些结论将促进更好地应用软件工程学中的对话分解。