Conformal prediction and other randomized model-free inference techniques are gaining increasing attention as general solutions to rigorously calibrate the output of any machine learning algorithm for novelty detection. This paper contributes to the field by developing a novel method for mitigating their algorithmic randomness, leading to an even more interpretable and reliable framework for powerful novelty detection under false discovery rate control. The idea is to leverage suitable conformal e-values instead of p-values to quantify the significance of each finding, which allows the evidence gathered from multiple mutually dependent analyses of the same data to be seamlessly aggregated. Further, the proposed method can reduce randomness without much loss of power, partly thanks to an innovative way of weighting conformal e-values based on additional side information carefully extracted from the same data. Simulations with synthetic and real data confirm this solution can be effective at eliminating random noise in the inferences obtained with state-of-the-art alternative techniques, sometimes also leading to higher power.
翻译:作为严格校准任何机器学习算法以进行新发现检测的产出的一般解决办法,正日益受到越来越多的关注。本文通过开发一种新型的减少算法随机性的方法,为在虚假发现率控制下进行强大的新发现检测提供了更加可解释和可靠的框架,从而为该领域作出了贡献。其想法是利用适当的符合电子价值而不是P值来量化每项发现的重要性,使从对同一数据进行多重相互依赖的分析中获得的证据能够无缝地汇总。此外,拟议方法可以减少随机性,而不会失去太多的功率,部分原因是根据从同一数据仔细提取的额外侧面信息对符合的电子价值进行加权的创新方法。合成和真实数据的模拟证实这一解决办法可以有效地消除以最先进的替代技术获得的推断中的随机噪音,有时还会导致更高的功率。