用机器学习分析拉丁文的写作风格:处理旧问题的新方法 (Latin writing styles analysis with Machine Learning: New approach to old questions)

In the Middle Ages texts were learned by heart and spread using oral means of communication from generation to generation. Adaptation of the art of prose and poems allowed keeping particular descriptions and compositions characteristic for many literary genres. Taking into account such a specific construction of literature composed in Latin, we can search for and indicate the probability patterns of familiar sources of specific narrative texts. Consideration of Natural Language Processing tools allowed us the transformation of textual objects into numerical ones and then application of machine learning algorithms to extract information from the dataset. We carried out the task consisting of the practical use of those concepts and observation to create a tool for analyzing narrative texts basing on open-source databases. The tool focused on creating specific search tools resources which could enable us detailed searching throughout the text. The main objectives of the study take into account finding similarities between sentences and between documents. Next, we applied machine learning algorithms on chosen texts to calculate specific features of them (for instance authorship or centuries) and to recognize sources of anonymous texts with a certain percentage.

翻译：在中世纪,通过代代相传的口头交流手段,从心到流传了文字; 改编了文稿和诗歌艺术,保留了许多文学类的特征和构成特征; 考虑到拉丁文的文献的这种具体构造,我们可以搜索并指明具体叙述文本的熟悉来源的概率模式; 考虑自然语言处理工具,使我们能够将文字对象转换为数字对象,然后运用机器学习算法从数据集中提取信息; 我们开展了一项任务,包括实际使用这些概念和观察,以建立一个工具,用于分析基于开放源数据库的叙述性文字; 该工具侧重于创建具体的搜索工具资源,使我们能够在整个文本中进行详细搜索; 研究的主要目标考虑到在判决和文件之间找到相似之处; 下一步,我们在选定的文本上应用机器学习算法,以计算其具体特征(例如作者或几个世纪),并用一定百分比识别匿名文本的来源。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/