Similarity index is an important scientific tool frequently used to determine whether different pairs of entities are similar with respect to some prefixed characteristics. Some standard measures of similarity index include Jaccard index, S{\o}rensen-Dice index, and Simpson's index. Recently, a better index ($\hat{\alpha}$) for the co-occurrence and/or similarity has been developed, and this measure really outperforms and gives theoretically supported reasonable predictions. However, the measure $\hat{\alpha}$ is not data dependent. In this article we propose a new measure of similarity which depends strongly on the data before introducing randomness in prevalence. Then, we propose a new method of randomization which changes the whole pattern of results. Before randomization our measure is similar to the Jaccard index, while after randomization it is close to $\hat{\alpha}$. We consider the popular ecological dataset from the Tuscan Archipelago, Italy; and compare the performance of the proposed index to other measures. Since our proposed index is data dependent, it has some interesting properties which we illustrate in this article through numerical studies.
翻译:暂无翻译