Background Deriving feature rankings is essential in bioinformatics studies since the ordered features are important in guiding subsequent research. Feature rankings may be distorted by influential points (IP), but such effects are rarely mentioned in previous studies. This study aimed to investigate the impact of IPs on feature rankings and propose a new method to detect IPs. Method The present study utilized a case-deletion (i.e., leave-one-out) approach to assess the impact of cases. The influence of a case was measured by comparing the rank changes before and after the deletion of that case. We proposed a rank comparison method using adaptive top-prioritized weights that highlighted the rank changes of the top-ranked features. The weights were adjustable to the distribution of rank changes. Results Potential IPs could be observed in several datasets. The presence of IPs could significantly alter the results of the following analysis (e.g., enriched pathways), suggesting the necessity of IPs detection when deriving feature rankings. Compared with existing methods, the novel rank comparison method could identify rank changes of important (top-ranked) features because of employing the adaptive weights adjusted to the distribution of rank changes. Conclusions IPs detection should be routinely performed when deriving feature rankings. The new method for IPs detection exhibited favorable features compared with existing methods.
翻译:背景:在生物信息学研究中,推导出特征排名是必要的,因为有序特征在指导随后的研究中很重要。特征排名可能被影响点(IP)所扭曲,但这种影响很少在以前的研究中被提到。本研究旨在调查IP对特征排名的影响并提出一种新的检测IP的方法。
方法:本研究利用一种案例删除(即逐一排除)方法来评估案例的影响。通过比较在删除该案例前后的排名变化来衡量案例的影响力。我们提出了一种排名比较方法,使用自适应的顶部优先权重来强调顶部特征的排名变化。权重可根据排名变化的分布进行调整。
结果:在几个数据集中可以观察到潜在的IP。存在IP可以显著改变以下分析的结果(例如,富集通路),表明在推导特征排名时有必要检测IP。与现有方法相比,新的排名比较方法可以识别重要(靠前的)特征的排名变化,因为采用了根据排名变化分布调整的自适应权重。
结论:在推导特征排名时应定期进行IP检测。与现有方法相比,新的IP检测方法展现出了良好的特性。