孟加拉语对话剂的终端到终端自然语言理解管道 (End-to-End Natural Language Understanding Pipeline for Bangla Conversational Agents)

Chatbots are intelligent software built to be used as a replacement for human interaction. Existing studies typically do not provide enough support for low-resource languages like Bangla. Due to the increasing popularity of social media, we can also see the rise of interactions in Bangla transliteration (mostly in English) among the native Bangla speakers. In this paper, we propose a novel approach to build a Bangla chatbot aimed to be used as a business assistant which can communicate in low-resource languages like Bangla and Bangla Transliteration in English with high confidence consistently. Since annotated data was not available for this purpose, we had to work on the whole machine learning life cycle (data preparation, machine learning modeling, and model deployment) using Rasa Open Source Framework, fastText embeddings, Polyglot embeddings, Flask, and other systems as building blocks. While working with the skewed annotated dataset, we try out different components and pipelines to evaluate which works best and provide possible reasoning behind the observed results. Finally, we present a pipeline for intent classification and entity extraction which achieves reasonable performance (accuracy: 83.02%, precision: 80.82%, recall: 83.02%, F1-score: 80%).

翻译：聊天室是用来替代人类互动的智能软件。现有的研究通常不能为孟加拉语等低资源语言提供足够的支持。由于社交媒体越来越受欢迎, 我们还可以看到当地孟加拉语使用者在孟加拉语转异化(主要是英语)方面互动的兴起。在本文中,我们提出一种新颖的办法来建造孟加拉语聊天室,旨在用作商业助理,能够以诸如孟加拉语和孟加拉语等低资源语言以高度信任的方式用英语进行沟通。由于没有为此提供附加说明的数据,我们不得不利用拉萨开放源框架、快速嵌入、聚球嵌入、弗拉斯克和其他系统作为建筑块来完成整个机器学习生命周期(数据编制、机器学习模型和模型部署)的工作(数据编制、机器学习模型和模型部署)。在与一个附加说明的数据集合作的同时,我们尝试了不同的组件和管道来评估哪些最有效,并为观察到的结果提供可能的推理依据。最后,我们展示了一个意图分类和实体提取的管道,以达到合理的性能(准确性:83.02 % 精确度:80.02 % 精确度: 精确度: 精确度:80. 精确度: 精确度: 精确度:80. 精确度: 精确度:80. 精确度:80. 精确度: 精确度: 精确度: 精确度:80. 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 精确度: 80. 精确度: 80. 80.