Understanding customer feedback is becoming a necessity for companies to identify problems and improve their products and services. Text classification and sentiment analysis can play a major role in analyzing this data by using a variety of machine and deep learning approaches. In this work, different transformer-based models are utilized to explore how efficient these models are when working with a German customer feedback dataset. In addition, these pre-trained models are further analyzed to determine if adapting them to a specific domain using unlabeled data can yield better results than off-the-shelf pre-trained models. To evaluate the models, two downstream tasks from the GermEval 2017 are considered. The experimental results show that transformer-based models can reach significant improvements compared to a fastText baseline and outperform the published scores and previous models. For the subtask Relevance Classification, the best models achieve a micro-averaged $F1$-Score of 96.1 % on the first test set and 95.9 % on the second one, and a score of 85.1 % and 85.3 % for the subtask Polarity Classification.
翻译:了解客户反馈正成为各公司了解问题和改进产品和服务的必要条件。文本分类和情绪分析可以使用各种机器和深层学习方法,在分析这些数据方面发挥重大作用。在这项工作中,利用不同的变压器模型探索这些模型在与德国客户反馈数据集合作时的效率。此外,进一步分析这些经过预先培训的模型,以确定利用未贴标签的数据将其适应特定领域是否产生比现成的预先培训模型更好的结果。为了评估模型,考虑了2017年GermEval的两件下游任务。实验结果表明,变压器模型与快速传输基线相比可以取得显著改进,并超越已公布的分数和以往模型。对于子任务相关性分类,最佳模型在第一个测试集上实现了96.1%的微平均值,在第二个测试集上达到95.9%的数值,在次任务极分上达到85.1%和85.3%。</s>