Outlier detection (OD) literature exhibits numerous algorithms as it applies to diverse domains. However, given a new detection task, it is unclear how to choose an algorithm to use, nor how to set its hyperparameter(s) (HPs) in unsupervised settings. HP tuning is an ever-growing problem with the arrival of many new detectors based on deep learning, which usually come with a long list of HPs. Surprisingly, the issue of model selection in the outlier mining literature has been "the elephant in the room"; a significant factor in unlocking the utmost potential of deep methods, yet little said or done to systematically tackle the issue. In the first part of this paper, we conduct the first large-scale analysis on the HP sensitivity of deep OD methods, and through more than 35,000 trained models, quantitatively demonstrate that model selection is inevitable. Next, we design a HP-robust and scalable deep hyper-ensemble model called ROBOD that assembles models with varying HP configurations, bypassing the choice paralysis. Importantly, we introduce novel strategies to speed up ensemble training, such as parameter sharing, batch/simultaneous training, and data subsampling, that allow us to train fewer models with fewer parameters. Extensive experiments on both image and tabular datasets show that ROBOD achieves and retains robust, state-of-the-art detection performance as compared to its modern counterparts, while taking only $2$-$10$\% of the time by the naive hyper-ensemble with independent training.
翻译:外部探测( OD) 文献显示, 应用到不同域的模型检测( OD) 。 但是, 由于新的检测任务, 不清楚如何选择使用算法, 或如何在不受监督的环境下设置超参数( HP) 。 由于许多基于深层次学习的新探测器的到来, 通常带来长长的HP 列表, HP 调调是一个日益严重的问题。 令人惊讶的是, 外部采矿文献中的模型选择问题是“ 室内的大象 ” ; 释放深层方法最大潜力的一个重要因素, 但却很少有人说或做来系统解决这一问题。 在本文的第一部分, 我们对深层OD( HP) 方法的HP敏感度进行第一次大规模分析, 并通过超过35 000个经过培训的模型, 从数量上证明模型的选择是不可避免的。 其次, 我们设计了一个名为 ROBOD 的模型, 以不同的 HP 格式组合集成模型, 绕过选择瘫痪 。 ( ) 。 ( ) 我们引入了新战略, 加速进行高层次的 数字, 和低层次的ROB 的 模型, 测试, 将 我们的 标的 标的 的 显示 和 数据 以 分级的 分数 。