Institutions are increasingly relying on machine learning models to identify and alert on abnormal events, such as fraud, cyber attacks and system failures. These alerts often need to be manually investigated by specialists. Given the operational cost of manual inspections, the suspicious events are selected by alerting systems with carefully designed thresholds. In this paper, we consider an imbalanced binary classification problem, where events arrive sequentially and only a limited number of suspicious events can be inspected. We model the event arrivals as a non-homogeneous Poisson process, and compare various suspicious event selection methods including those based on static and adaptive thresholds. For each method, we analytically characterize the tradeoff between the minority-class detection rate and the inspection capacity as a function of the data class imbalance and the classifier confidence score densities. We implement the selection methods on a real public fraud detection dataset and compare the empirical results with analytical bounds. Finally, we investigate how class imbalance and the choice of classifier impact the tradeoff.
翻译:各机构越来越多地依靠机器学习模式来识别和警惕异常事件,如欺诈、网络袭击和系统故障等。这些警报往往需要专家人工调查。鉴于人工检查的操作成本,可疑事件是通过精心设计的临界值警报系统选择的。在本文中,我们考虑到一个不平衡的二元分类问题,即事件按顺序出现,只能对少量的可疑事件进行检查。我们把事件抵达模拟为非同质的普瓦森程序,比较各种可疑事件选择方法,包括以静态和适应性阈值为基础的选择方法。我们从每种方法中分析少数人级检测率和检查能力之间的权衡,以此作为数据级不平衡和分类者信心得分密度的函数。我们采用关于真实公共欺诈检测数据集的筛选方法,并将经验结果与分析界限进行比较。最后,我们调查阶级不平衡和分类者的选择如何影响交易。