The isolation forest algorithm for outlier detection exploits a simple yet effective observation: if taking some multivariate data and making uniformly random cuts across the feature space recursively, it will take fewer such random cuts for an outlier to be left alone in a given subspace as compared to regular observations. The original idea proposed an outlier score based on the tree depth (number of random cuts) required for isolation, but experiments here show that using information about the size of the feature space taken and the number of points assigned to it can result in improved results in many situations without any modification to the tree structure, especially in the presence of categorical features.
翻译:外部探测的孤立森林算法利用了一个简单而有效的观察:如果采用一些多变数据,并在特征空间之间进行统一随机的随机切开,与常规观测相比,将较少的这种随机切开使外层单独留在特定的子空间里。 最初的想法是根据隔离所需的树深度(随机切开数量)提出一个外部分数,但这里的实验表明,使用关于特征空间大小和所分配点数的信息,在许多情况下可以取得更好的结果,而不会对树结构作任何修改,特别是在树结构存在绝对特征的情况下。