Unsupervised anomaly detection tackles the problem of finding anomalies inside datasets without the labels availability; since data tagging is typically hard or expensive to obtain, such approaches have seen huge applicability in recent years. In this context, Isolation Forest is a popular algorithm able to define an anomaly score by means of an ensemble of peculiar trees called isolation trees. These are built using a random partitioning procedure that is extremely fast and cheap to train. However, we find that the standard algorithm might be improved in terms of memory requirements, latency and performances; this is of particular importance in low resources scenarios and in TinyML implementations on ultra-constrained microprocessors. Moreover, Anomaly Detection approaches currently do not take advantage of weak supervisions: being typically consumed in Decision Support Systems, feedback from the users, even if rare, can be a valuable source of information that is currently unexplored. Beside showing iForest training limitations, we propose here TiWS-iForest, an approach that, by leveraging weak supervision is able to reduce Isolation Forest complexity and to enhance detection performances. We showed the effectiveness of TiWS-iForest on real word datasets and we share the code in a public repository to enhance reproducibility.
翻译:未经监督的异常点检测解决了在没有标签的情况下在数据集内发现异常的问题;由于数据标记通常很难或昂贵,近年来,这类方法具有巨大的适用性;在这方面,隔离森林是一种流行算法,能够通过混合的称为隔离树的奇特树木来界定异常点。这些是使用随机分割程序建造的,这种随机分割程序非常快,培训费用非常低。然而,我们发现标准算法在记忆要求、延缓度和性能方面可能有所改进;在低资源假设和对超受限制的微处理器的微调ML执行中,这一点特别重要。此外,异常探测方法目前没有利用薄弱的监管手段:通常在决策支持系统中消耗,用户的反馈,即使很少,也可以成为目前尚未探索的宝贵信息的宝贵来源。我们在这里建议采用一种方法,即借助薄弱的监督能够减少隔离森林的复杂性,加强探测性性性能。我们展示了在TRWS-Forest数据库中真实的共享数据。