In fraud detection applications, the investigator is typically limited to controlling a restricted number k of cases. The most efficient manner of allocating the resources is then to try selecting the k cases with the highest probability of being fraudulent. The prediction model used for this purpose must normally be regularized to avoid overfitting and consequently bad prediction performance. A new loss function, denoted the fraud loss, is proposed for selecting the model complexity via a tuning parameter. A simulation study is performed to find the optimal settings for validation. Further, the performance of the proposed procedure is compared to the most relevant competing procedure, based on the area under the receiver operating characteristic curve (AUC), in a set of simulations, as well as on a VAT fraud dataset. In most cases, choosing the complexity of the model according to the fraud loss, gave a better than, or comparable performance to the AUC in terms of the fraud loss.
翻译:在欺诈检测应用中,调查员通常限于控制有限的案件数量Kk;然后,最高效地分配资源的方式是尝试选择欺诈可能性最高的k案件;为此目的使用的预测模型通常必须正规化,以避免过度适应,从而造成不良预测性能;提议通过调试参数选择模型复杂性的新的损失功能,即欺诈损失;进行模拟研究,寻找最佳的验证环境;此外,将拟议程序的执行情况与最相关的竞争程序进行比较,这种竞争程序以接收器操作特征曲线(AUC)下的区域、一套模拟数据以及增值税欺诈数据集为基础;在多数情况下,根据欺诈损失选择模型的复杂性,在欺诈损失方面优于或者与ACU相比。