Anomaly detection is essential in many application domains, such as cyber security, law enforcement, medicine, and fraud protection. However, the decision-making of current deep learning approaches is notoriously hard to understand, which often limits their practical applicability. To overcome this limitation, we propose a framework for learning inherently interpretable anomaly detectors from sequential data. More specifically, we consider the task of learning a deterministic finite automaton (DFA) from a given multi-set of unlabeled sequences. We show that this problem is computationally hard and develop two learning algorithms based on constraint optimization. Moreover, we introduce novel regularization schemes for our optimization problems that improve the overall interpretability of our DFAs. Using a prototype implementation, we demonstrate that our approach shows promising results in terms of accuracy and F1 score.
翻译:摘要:离群点检测在许多应用领域如网络安全、执法、医疗和反欺诈中至关重要。然而,目前基于深度学习的方法难以解释,这限制了它们的实际应用。为了克服这一局限性,我们提出了一种从序列数据中学习可解释离群点检测器的框架。具体而言,我们考虑从给定的无标记序列的多重集合中学习确定性有限自动机(DFA)的任务。我们证明了这个问题是计算上困难的,并基于约束优化开发了两种学习算法。此外,我们引入了用于优化问题的新型正则化方案,提高了我们的DFA的整体可解释性。使用原型实现,我们证明了我们的方法在准确性和F1分数方面表现出有希望的结果。