In this paper I present a classifier for automatic identification of linguistic politeness in Hindi texts. I have used the manually annotated corpus of over 25,000 blog comments to train an SVM. Making use of the discursive and interactional approaches to politeness the paper gives an exposition of the normative, conventionalised politeness structures of Hindi. It is seen that using these manually recognised structures as features in training the SVM significantly improves the performance of the classifier on the test set. The trained system gives a significantly high accuracy of over 77% which is within 2% of human accuracy.
翻译:在本文中,我提出了一个用于自动识别印地语文本语言礼貌的分类器,我使用25 000多份博客评论的人工附加说明文集来培训SVM。利用对礼貌的不准确和互动方法,本文对印地语规范、传统化的礼貌结构进行了阐述。人们看到,使用这些人工识别的结构作为培训印地语文本的特征,极大地提高了SVM在测试集上的性能。经过培训的系统给出了超过77%的高精度,在人精度的2%之内。