In many scientific research, it is often imperative to determine whether pairs of entities have similarities in themselves or not. There are standard approaches to this problem, such as Jaccard, Sorensen Dice, and Simpson. Recently, a better index for the analysis of cooccurrence and similarity was developed and it reversed all the results obtained by standard indices and supported theoretical predictions. In this paper, we propose a new method of similarity using MLE, PCA, LDA, and clustering. Our index depends strongly on the data before introducing randomness in prevalence. Then we propose a new method of randomization which changed the whole pattern of the results. Before randomization, it was strongly dependent o the prevalence and hence was following the pattern of the Jaccard index. So, we introduce the new randomization technique, and hence the whole results reversed and followed that of alpha. Also, we will show some limitations of alpha which we try to resolve through different pathways.
翻译:在许多科学研究中,确定实体对是否具有相似性通常是必要的。有一些标准方法可以解决这个问题,例如贾卡德、索雷森·迪斯和辛普森等方法。最近,针对共现和相似性分析开发了更好的指数,它颠覆了所有标准指数获得的结果,并支持理论预测。在本文中,我们提出了一种使用MLE、PCA、LDA和聚类的新的相似性方法。我们的指数在引入流行病随机性之前强烈依赖于数据。然后,我们提出了一种新的随机化方法,改变了整个结果的模式。在随机化之前,它强烈依赖于流行病,因此遵循延卡德指数的模式。因此,我们引入了新的随机化技术,然后整个结果被颠覆,并遵循alpha的模式。此外,我们还将展示一些alpha的局限性,并尝试通过不同的途径解决这些问题。