Bangla 自然语言处理:全面审查古典、机器学习和深学习方法 (Bangla Natural Language Processing: A Comprehensive Review of Classical, Machine Learning, and Deep Learning Based Methods)

The Bangla language is the seventh most spoken language, with 265 million native and non-native speakers worldwide. However, English is the predominant language for online resources and technical knowledge, journals, and documentation. Consequently, many Bangla-speaking people, who have limited command of English, face hurdles to utilize English resources. To bridge the gap between limited support and increasing demand, researchers conducted many experiments and developed valuable tools and techniques to create and process Bangla language materials. Many efforts are also ongoing to make it easy to use the Bangla language in the online and technical domains. There are some review papers to understand the past, previous, and future Bangla Natural Language Processing (BNLP) trends. The studies are mainly concentrated on the specific domains of BNLP, such as sentiment analysis, speech recognition, optical character recognition, and text summarization. There is an apparent scarcity of resources that contain a comprehensive study of the recent BNLP tools and methods. Therefore, in this paper, we present a thorough review of 71 BNLP research papers and categorize them into 11 categories, namely Information Extraction, Machine Translation, Named Entity Recognition, Parsing, Parts of Speech Tagging, Question Answering System, Sentiment Analysis, Spam and Fake Detection, Text Summarization, Word Sense Disambiguation, and Speech Processing and Recognition. We study articles published between 1999 to 2021, and 50\% of the papers were published after 2015. We discuss Classical, Machine Learning and Deep Learning approaches with different datasets while addressing the limitations and current and future trends of the BNLP.

翻译：孟加拉语是第七种最通用的语言,全世界有2.65亿母语和非母语语言,但英语是在线资源和技术知识、期刊和文献的主要语言,因此,许多讲孟加拉语的人,对英语的掌握有限,在利用英语资源方面面临障碍;为了缩小支助有限与需求增加之间的差距,研究人员进行了许多实验,开发了宝贵的工具和技术,以创造和处理孟加拉语材料;还正在作出许多努力,以便方便在网上和技术领域使用孟加拉语。有一些审查文件,以了解过去、过去和未来孟加拉语自然语言处理(BNLP)的趋势。因此,许多讲孟加拉语的人,对英语的掌握有限,在使用英语资源方面面临着障碍。为了缩小支持有限与需求之间的差距,研究人员进行了许多实验,开发了宝贵的工具和技术,以创造和处理孟加拉语材料。因此,我们在本文件中对71份BNLP研究论文进行了彻底审查,并将其分为11个类别,即信息提取、机器翻译、深层次实体的自然语言处理(BNLP)趋势,主要集中于BNLP的具体领域,如情绪分析、语音分析、语音分析、SDSDARSA、SMA、SL 20 和SAL ARIAD、SL的论文的全文分析,以及SDSL的章节。