Collecting and annotating task-oriented dialog data is difficult, especially for highly specific domains that require expert knowledge. At the same time, informal communication channels such as instant messengers are increasingly being used at work. This has led to a lot of work-relevant information that is disseminated through those channels and needs to be post-processed manually by the employees. To alleviate this problem, we present TexPrax, a messaging system to collect and annotate problems, causes, and solutions that occur in work-related chats. TexPrax uses a chatbot to directly engage the employees to provide lightweight annotations on their conversation and ease their documentation work. To comply with data privacy and security regulations, we use an end-to-end message encryption and give our users full control over their data which has various advantages over conventional annotation tools. We evaluate TexPrax in a user-study with German factory employees who ask their colleagues for solutions on problems that arise during their daily work. Overall, we collect 202 task-oriented German dialogues containing 1,027 sentences with sentence-level expert annotations. Our data analysis also reveals that real-world conversations frequently contain instances with code-switching, varying abbreviations for the same entity, and dialects which NLP systems should be able to handle.
翻译:搜集和说明面向任务的对话数据十分困难,特别是在需要专家知识的高度特定领域。同时,正在越来越多地使用诸如即时信使等非正式沟通渠道。这导致通过这些渠道传播大量与工作有关的信息,需要雇员手工处理。为了缓解这一问题,我们向TexPrax提供信息系统,用于收集和说明工作相关聊天中出现的问题、原因和解决办法。TexPrax使用一个聊天机直接与雇员接触,以提供其谈话的轻量说明并方便其文件工作。为了遵守数据隐私和安全条例,我们使用端对端信息加密,并给予用户充分控制其数据,因为这些数据比传统批注工具有各种优势。我们在与德国工厂雇员的用户研究中评估TexPrax,这些雇员要求同事解决日常工作中出现的问题。总的来说,我们收集了202个任务导向的德国对话,其中有1 027个句子和句级专家说明。我们的数据分析还显示,现实世界对话经常包含代码转换的缩略语,而N-world对话应当包含N-L