We propose Fuzzy Jaccard Index (FUJI) -- a scale-invariant score for assessment of the similarity between two ranked/ordered lists. FUJI improves upon the Jaccard index by incorporating a membership function which takes into account the particular ranks, thus producing both more stable and more accurate similarity estimates. We provide theoretical insights into the properties of the FUJI score as well as propose an efficient algorithm for computing it. We also present empirical evidence of its performance on different synthetic scenarios. Finally, we demonstrate its utility in a typical machine learning setting -- comparing feature ranking lists relevant to a given machine learning task. In real-life, and in particular high-dimensional domains, where only a small percentage of the whole feature space might be relevant, a robust and confident feature ranking leads to interpretable findings as well as efficient computation and good predictive performance. In such cases, FUJI correctly distinguishes between existing feature ranking approaches, while being more robust and efficient than the benchmark similarity scores.
翻译:我们提出Fuzzy Jacccard指数(FUJI) -- -- 用于评估两个排名/顺序排列名单之间相似性的一个规模差异性评分。Fuzzy Jacccard指数(FUJI)通过纳入一个成员函数,将特定等级考虑在内,从而产生更稳定、更准确的相似性估计,从而改进了计分指数。我们从理论上洞察Fuzzy Jaccccard指数的特性,并为计算它提出了一种高效的算法。我们还提出了不同合成假设情景上的业绩经验性证据。最后,我们展示了它在典型的机器学习环境中的效用 -- -- 比较与特定机器学习任务相关的特征排名清单。在现实生活中,特别是高维领域,只有一小部分的功能排位可能具有相关性,在其中,只有强而自信的特征排位才能导致可解释的结果以及高效的计算和良好的预测性表现。在这种情况下,FUJIJI正确区分了现有特征排位方法,同时比基准相似性分得更有力、效率更高。