TiWS-iForest: 脆弱监视和微小ML情景中的隔离森林 (TiWS-iForest: Isolation Forest in Weakly Supervised and Tiny ML scenarios)

Unsupervised anomaly detection tackles the problem of finding anomalies inside datasets without the labels availability; since data tagging is typically hard or expensive to obtain, such approaches have seen huge applicability in recent years. In this context, Isolation Forest is a popular algorithm able to define an anomaly score by means of an ensemble of peculiar trees called isolation trees. These are built using a random partitioning procedure that is extremely fast and cheap to train. However, we find that the standard algorithm might be improved in terms of memory requirements, latency and performances; this is of particular importance in low resources scenarios and in TinyML implementations on ultra-constrained microprocessors. Moreover, Anomaly Detection approaches currently do not take advantage of weak supervisions: being typically consumed in Decision Support Systems, feedback from the users, even if rare, can be a valuable source of information that is currently unexplored. Beside showing iForest training limitations, we propose here TiWS-iForest, an approach that, by leveraging weak supervision is able to reduce Isolation Forest complexity and to enhance detection performances. We showed the effectiveness of TiWS-iForest on real word datasets and we share the code in a public repository to enhance reproducibility.

翻译：未经监督的异常点检测解决了在没有标签的情况下在数据集内发现异常的问题;由于数据标记通常很难或昂贵,近年来,这类方法具有巨大的适用性;在这方面,隔离森林是一种流行算法,能够通过混合的称为隔离树的奇特树木来界定异常点。这些是使用随机分割程序建造的,这种随机分割程序非常快,培训费用非常低。然而,我们发现标准算法在记忆要求、延缓度和性能方面可能有所改进;在低资源假设和对超受限制的微处理器的微调ML执行中,这一点特别重要。此外,异常探测方法目前没有利用薄弱的监管手段:通常在决策支持系统中消耗,用户的反馈,即使很少,也可以成为目前尚未探索的宝贵信息的宝贵来源。我们在这里建议采用一种方法,即借助薄弱的监督能够减少隔离森林的复杂性,加强探测性性性能。我们展示了在TRWS-Forest数据库中真实的共享数据。

相关内容

异常检测

关注 102

在数据挖掘中，异常检测（英语：anomaly detection）对不符合预期模式或数据集中其他项目的项目、事件或观测值的识别。通常异常项目会转变成银行欺诈、结构缺陷、医疗问题、文本错误等类型的问题。异常也被称为离群值、新奇、噪声、偏差和例外。特别是在检测滥用与网络入侵时，有趣性对象往往不是罕见对象，但却是超出预料的突发活动。这种模式不遵循通常统计定义中把异常点看作是罕见对象，于是许多异常检测方法（特别是无监督的方法）将对此类数据失效，除非进行了合适的聚集。相反，聚类分析算法可能可以检测出这些模式形成的微聚类。有三大类异常检测方法。[1] 在假设数据集中大多数实例都是正常的前提下，无监督异常检测方法能通过寻找与其他数据最不匹配的实例来检测出未标记测试数据的异常。监督式异常检测方法需要一个已经被标记“正常”与“异常”的数据集，并涉及到训练分类器（与许多其他的统计分类问题的关键区别是异常检测的内在不均衡性）。半监督式异常检测方法根据一个给定的正常训练数据集创建一个表示正常行为的模型，然后检测由学习模型生成的测试实例的可能性。

【经典书】强化学习算法，98页pdf

专知会员服务

130+阅读 · 2021年8月25日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日