Anomaly detection, where data instances are discovered containing feature patterns different from the majority, plays a fundamental role in various applications. However, it is challenging for existing methods to handle the scenarios where the instances are systems whose characteristics are not readily observed as data. Appropriate interactions are needed to interact with the systems and identify those with abnormal responses. Detecting system-wise anomalies is a challenging task due to several reasons including: how to formally define the system-wise anomaly detection problem; how to find the effective activation signal for interacting with systems to progressively collect the data and learn the detector; how to guarantee stable training in such a non-stationary scenario with real-time interactions? To address the challenges, we propose InterSAD (Interactive System-wise Anomaly Detection). Specifically, first, we adopt Markov decision process to model the interactive systems, and define anomalous systems as anomalous transition and anomalous reward systems. Then, we develop an end-to-end approach which includes an encoder-decoder module that learns system embeddings, and a policy network to generate effective activation for separating embeddings of normal and anomaly systems. Finally, we design a training method to stabilize the learning process, which includes a replay buffer to store historical interaction data and allow them to be re-sampled. Experiments on two benchmark environments, including identifying the anomalous robotic systems and detecting user data poisoning in recommendation models, demonstrate the superiority of InterSAD compared with state-of-the-art baselines methods.
翻译:异常检测在各种应用中均发挥着基础性的作用,它发现了包含与大多数的特征模式不同的数据实例。但是,现有方法难以处理实例为系统的场景,因为这些系统的特征不容易作为数据观察到。需要恰当的交互来与系统进行交互并识别具有异常反应的系统。由于存在多种原因(包括如何正式定义系统级异常检测问题、如何查找与系统交互的有效激活信号以逐步收集数据并学习检测器以及如何保证在此实时交互的非固定情况下稳定训练),检测系统级异常是一项具有挑战性的任务。为了解决这些挑战,我们提出了InterSAD(交互式系统级异常检测)。具体而言,我们采用马尔科夫决策过程来对交互式系统进行建模,并将异常系统定义为异常转换和异常奖励系统。然后,我们开发了一个端到端的方法,其中包括学习系统嵌入的编码器-解码器模块和生成有效激活的策略网络,以分离正常和异常系统的嵌入。最后,我们设计了一种训练方法来稳定学习过程,其中包括一个回放缓冲区来存储历史互动数据并允许对其进行重新采样。在两个基准环境(包括识别异常机器人系统和检测推荐模型中的用户数据污染)上的实验表明,InterSAD与最先进的基准方法相比具有优越性。