In this paper, the mathematical analysis of the Isolation Random Forest Method (IRF Method) for anomaly detection is presented. We show that the IRF space can be endowed with a probability induced by the Isolation Tree algorithm (iTree). In this setting, the convergence of the IRF method is proved using the Law of Large Numbers. A couple of counterexamples are presented to show that the original method is inconclusive and no quality certificate can be given, when using it as a means to detect anomalies. Hence, an alternative version of IRF is proposed, whose mathematical foundation, as well as its limitations, are fully justified. Finally, numerical experiments are presented to compare the performance of the classic IRF with the proposed one.
翻译:本文介绍了关于异常现象检测的隔离随机森林方法(IRF方法)的数学分析,我们表明,隔离树算法(iTree)可以给IRF空间带来一种概率,在这一背景下,使用大数字法证明了IRF方法的趋同。提供了几个反示例,以表明原始方法没有结论,在使用该方法作为检测异常情况的手段时,无法提供质量证书。因此,提出了另外一种IRF版本,其数学基础及其局限性是完全合理的。最后,提出了数字实验,将经典IRF的性能与拟议方法作比较。