This article presents the results of investigations using topic modeling of the Voynich Manuscript (Beinecke MS408). Topic modeling is a set of computational methods which are used to identify clusters of subjects within text. We use latent dirichlet allocation, latent semantic analysis, and nonnegative matrix factorization to cluster Voynich pages into `topics'. We then compare the topics derived from the computational models to clusters derived from the Voynich illustrations and from paleographic analysis. We find that computationally derived clusters match closely to a conjunction of scribe and subject matter (as per the illustrations), providing further evidence that the Voynich Manuscript contains meaningful text.
翻译:本文介绍利用《沃因尼克马努记》(Beinecke MS408)专题模型(Beinecke MS408)专题模型进行的研究的结果。 主题模型是用来确定文本内各组主题的一套计算方法。 我们用潜在的二极分分配、 潜在的语义分析和非负矩阵因子化将沃因尼克页分组成“ 专题 ” 。 然后我们将从计算模型中得出的专题与从《沃因尼克插图》和《古生物学分析》中得出的组群进行比较。 我们发现,从计算中得出的组群与编名和主题事项(如插图所示)的组合密切匹配,进一步证明《沃因尼克马努记》包含有意义的文字。