Data cubes are multidimensional databases, often built from several separate databases, that serve as flexible basis for data analysis. Surprisingly, outlier detection on data cubes has not yet been treated extensively. In this work, we provide the first framework to evaluate robust outlier detection methods in data cubes (RODD). We introduce a novel random forest-based outlier detection approach (RODD-RF) and compare it with more traditional methods based on robust location estimators. We propose a general type of test data and examine all methods in a simulation study. Moreover, we apply ROOD-RF to real world data. The results show that RODD-RF can lead to improved outlier detection.
翻译:数据立方体是多维数据库,通常由几个独立的数据库建立,作为数据分析的灵活基础。令人惊讶的是,数据立方体的异常探测尚未广泛处理。在这项工作中,我们提供了第一个框架,用以评价数据立方体(RODD)的强度异常探测方法。我们引入了一种新型的基于森林的随机异常探测方法(RODD-RF),并将其与基于稳健位置测量器的较传统方法进行比较。我们提出了一般类型的测试数据,并在模拟研究中检查了所有方法。此外,我们对真实世界数据采用了ROOD-RF。结果显示,RODD-RF可以导致改进外部检测。</s>