With the widespread use of the internet, the size of the text data increases day by day. Poems can be given as an example of the growing text. In this study, we aim to classify poetry according to poet. Firstly, data set consisting of three different poetry of poets written in English have been constructed. Then, text categorization techniques are implemented on it. Chi-Square technique are used for feature selection. In addition, five different classification algorithms are tried. These algorithms are Sequential minimal optimization, Naive Bayes, C4.5 decision tree, Random Forest and k-nearest neighbors. Although each classifier showed very different results, over the 70% classification success rate was taken by sequential minimal optimization technique.
翻译:随着互联网的广泛使用,文本数据的规模日复一日地增加。诗歌可以作为不断增长的文字的一个例子。在本研究中,我们的目标是根据诗人对诗作进行分类。首先,建立了由三种不同诗人以英语写成的数据集。然后,在它上采用了文字分类技术。在特征选择中使用了千方技术。此外,还尝试了五种不同的分类算法。这些算法是序列式最低优化、Nive Bayes、C4.5决定树、随机森林和K-近邻。尽管每个分类者显示了非常不同的结果,但70%以上的分类成功率是按顺序最低优化技术进行的。