Automated machine learning has been widely researched and adopted in the field of supervised classification and regression, but progress in unsupervised settings has been limited. We propose a novel approach to automate outlier detection based on meta-learning from previous datasets with outliers. Our premise is that the selection of the optimal outlier detection technique depends on the inherent properties of the data distribution. We leverage optimal transport in particular, to find the dataset with the most similar underlying distribution, and then apply the outlier detection techniques that proved to work best for that data distribution. We evaluate the robustness of our approach and find that it outperforms the state of the art methods in unsupervised outlier detection. This approach can also be easily generalized to automate other unsupervised settings.
翻译:在监督分类和回归领域,对自动机学习进行了广泛研究和采用,但在不受监督的环境下进展有限。我们提议采用新颖的方法,根据从以往数据集中用离子进行元学习的结果,自动检测外部数据。我们的前提是,最佳外部检测技术的选择取决于数据分布的固有特性。我们尤其利用最佳的传输方法,找到数据集,找到最相似的基本分布,然后运用已证明最有利于数据传播的外部检测技术。我们评估了我们方法的稳健性,发现它优于未经监督的外部检测的先进方法。这个方法还可以容易地推广到其他不受监督的环境下的自动化。