Text classification problem is a very broad field of study in the field of natural language processing. In short, the text classification problem is to determine which of the previously determined classes the given text belongs to. Successful studies have been carried out in this field in the past studies. In the study, Bidirectional Encoder Representations for Transformers (BERT), which is a frequently preferred method for solving the classification problem in the field of natural language processing, is used. By solving classification problems through a single model to be used in a chatbot architecture, it is aimed to alleviate the load on the server that will be created by more than one model used for solving more than one classification problem. At this point, with the masking method applied during the estimation of a single BERT model, which was created for classification in more than one subject, the estimation of the model was provided on a problem-based basis. Three separate data sets covering different fields from each other are divided by various methods in order to complicate the problem, and classification problems that are very close to each other in terms of field are also included in this way. The dataset used in this way consists of five classification problems with 154 classes. A BERT model containing all classification problems and other BERT models trained specifically for the problems were compared with each other in terms of performance and the space they occupied on the server.
翻译:文本分类问题是自然语言处理领域一个非常广泛的研究领域。简而言之,文本分类问题是确定哪些先前确定的类别属于哪个类别,在过去的研究中已经在这一领域进行了成功的研究。在研究中,为变换器编写的双向编码演示(BERT),这是解决自然语言处理领域分类问题的一个常见的首选方法。通过在聊天室结构中使用的单一模型解决分类问题,目的是减轻服务器上由一个以上用于解决不止一个分类问题的模型产生的负荷。在这一点上,在为不止一个主题分类而创建的单一BERT模型的估算期间,采用了遮掩方法,对模型进行了基于问题的估算。涉及不同领域的三个独立的数据集因不同方法而分裂,使问题复杂化,而且就实地而言非常接近于彼此的分类问题也以这种方式列入。在这种方式中,它们使用的数据集包括五个分类方法,即用于估算一个单一的BERT模型期间使用的遮掩方法,该模型为不止一个主题的分类,该模型以问题为基础,对不同领域进行估算;在每一个空间分类中,每个类型都含有专门培训的ATRERF的其他问题。