In the Middle Ages texts were learned by heart and spread using oral means of communication from generation to generation. Adaptation of the art of prose and poems allowed keeping particular descriptions and compositions characteristic for many literary genres. Taking into account such a specific construction of literature composed in Latin, we can search for and indicate the probability patterns of familiar sources of specific narrative texts. Consideration of Natural Language Processing tools allowed us the transformation of textual objects into numerical ones and then application of machine learning algorithms to extract information from the dataset. We carried out the task consisting of the practical use of those concepts and observation to create a tool for analyzing narrative texts basing on open-source databases. The tool focused on creating specific search tools resources which could enable us detailed searching throughout the text. The main objectives of the study take into account finding similarities between sentences and between documents. Next, we applied machine learning algorithms on chosen texts to calculate specific features of them (for instance authorship or centuries) and to recognize sources of anonymous texts with a certain percentage.
翻译:在中世纪,通过代代相传的口头交流手段,从心到流传了文字; 改编了文稿和诗歌艺术,保留了许多文学类的特征和构成特征; 考虑到拉丁文的文献的这种具体构造,我们可以搜索并指明具体叙述文本的熟悉来源的概率模式; 考虑自然语言处理工具,使我们能够将文字对象转换为数字对象,然后运用机器学习算法从数据集中提取信息; 我们开展了一项任务,包括实际使用这些概念和观察,以建立一个工具,用于分析基于开放源数据库的叙述性文字; 该工具侧重于创建具体的搜索工具资源,使我们能够在整个文本中进行详细搜索; 研究的主要目标考虑到在判决和文件之间找到相似之处; 下一步,我们在选定的文本上应用机器学习算法,以计算其具体特征(例如作者或几个世纪),并用一定百分比识别匿名文本的来源。