未经监督的用于可解释的医疗保险欺诈检测的机器学习 (Unsupervised Machine Learning for Explainable Medicare Fraud Detection)

The US federal government spends more than a trillion dollars per year on health care, largely provided by private third parties and reimbursed by the government. A major concern in this system is overbilling, waste and fraud by providers, who face incentives to misreport on their claims in order to receive higher payments. In this paper, we develop novel machine learning tools to identify providers that overbill Medicare, the US federal health insurance program for elderly adults and the disabled. Using large-scale Medicare claims data, we identify patterns consistent with fraud or overbilling among inpatient hospitalizations. Our proposed approach for Medicare fraud detection is fully unsupervised, not relying on any labeled training data, and is explainable to end users, providing reasoning and interpretable insights into the potentially suspicious behavior of the flagged providers. Data from the Department of Justice on providers facing anti-fraud lawsuits and several case studies validate our approach and findings both quantitatively and qualitatively.

翻译：美国政府每年花费超过一万亿美元用于医疗保健,大部分由私人第三方提供,并由政府偿还。这个系统的一个主要关切是,供应商过度收费、浪费和欺诈,他们面临误报索赔的诱因,以获得更高的付款。在本文中,我们开发了新型机器学习工具,以识别超标医疗保险、美国联邦老年和残疾人医疗保险计划等服务提供者。我们使用大型医疗保险索偿数据,确定与住院病人欺诈或超额收费相一致的模式。我们提出的美第奇塔欺诈检测方法完全不受监督,不依赖任何有标签的培训数据,并向终端用户解释,提供理由和可解释的关于被点名的提供者潜在可疑行为的解释。司法部关于面临反欺诈诉讼的提供者的数据以及若干案例研究证实了我们在定量和定性方面的做法和调查结果。