Determining the author of a text is a difficult task. Here we compare multiple AI techniques for classifying literary texts written by multiple authors by taking into account a limited number of speech parts (prepositions, adverbs, and conjunctions). We also introduce a new dataset composed of texts written in the Romanian language on which we have run the algorithms. The compared methods are Artificial Neural Networks, Support Vector Machines, Multi Expression Programming, Decision Trees with C5.0, and k-Nearest Neighbour. Numerical experiments show, first of all, that the problem is difficult, but some algorithms are able to generate decent errors on the test set.
翻译:确定文本的作者是一项艰巨的任务。 我们在这里比较了多种AI技术,对多作者编写的文学文本进行分类,同时考虑到数量有限的部分语言(预言、副词和连字符),我们还引入了一套新数据集,由我们使用算法的罗马尼亚语文本组成。比较的方法有人工神经网络、支持矢量机、多表达式编程、C5.0决策树和K-Nearest邻居。 数字实验显示,首先,这个问题很困难,但有些算法能够在测试集上产生适当的错误。