Chatbots are intelligent software built to be used as a replacement for human interaction. Existing studies typically do not provide enough support for low-resource languages like Bangla. Due to the increasing popularity of social media, we can also see the rise of interactions in Bangla transliteration (mostly in English) among the native Bangla speakers. In this paper, we propose a novel approach to build a Bangla chatbot aimed to be used as a business assistant which can communicate in Bangla and Bangla Transliteration in English with high confidence consistently. Since annotated data was not available for this purpose, we had to work on the whole machine learning life cycle (data preparation, machine learning modeling, and model deployment) using Rasa Open Source Framework, fastText embeddings, Polyglot embeddings, Flask, and other systems as building blocks. While working with the skewed annotated dataset, we try out different setups and pipelines to evaluate which works best and provide possible reasoning behind the observed results. Finally, we present a pipeline for intent classification and entity extraction which achieves reasonable performance (accuracy: 83.02%, precision: 80.82%, recall: 83.02%, F1-score: 80%).
翻译:聊天室是用来替代人类互动的智能软件。 现有的研究通常不能为孟加拉国语等低资源语言提供足够的支持。 由于社交媒体越来越受欢迎, 我们还可以看到孟加拉语当地孟加拉语使用者在孟加拉语转异化(主要是英语)方面互动的兴起。 在本文中, 我们提出一种新颖的办法来建造孟加拉语聊天室, 目的是作为商业助理, 能够用孟加拉语和孟加拉语以高度信任的方式用英语进行交流。 由于没有为此目的提供附加说明的数据, 我们不得不用拉萨开放源框架、 快图嵌入、多球嵌入、 Flask 和其他系统作为建筑块来进行整个机器学习生命周期( 数据编制、 机器学习模型和模型部署) 。 在与附加注释的数据集合作的同时, 我们尝试不同的设置和管道, 来评估哪种效果最好, 并为观察到的结果提供可能的推理。 最后, 我们展示了一个用于意图分类和实体提取的管道, 从而实现合理的性能( 准确性: 83.02%, 精确性: 80.82% 精确度: F.821 回忆: 正确性: 80.