在医疗欺诈检测中地物提取和分类抽样构成的影响 (Impact of the composition of feature extraction and class sampling in medicare fraud detection)

With healthcare being critical aspect, health insurance has become an important scheme in minimizing medical expenses. Following this, the healthcare industry has seen a significant increase in fraudulent activities owing to increased insurance, and fraud has become a significant contributor to rising medical care expenses, although its impact can be mitigated using fraud detection techniques. To detect fraud, machine learning techniques are used. The Centers for Medicaid and Medicare Services (CMS) of the United States federal government released "Medicare Part D" insurance claims is utilized in this study to develop fraud detection system. Employing machine learning algorithms on a class-imbalanced and high dimensional medicare dataset is a challenging task. To compact such challenges, the present work aims to perform feature extraction following data sampling, afterward applying various classification algorithms, to get better performance. Feature extraction is a dimensionality reduction approach that converts attributes into linear or non-linear combinations of the actual attributes, generating a smaller and more diversified set of attributes and thus reducing the dimensions. Data sampling is commonlya used to address the class imbalance either by expanding the frequency of minority class or reducing the frequency of majority class to obtain approximately equal numbers of occurrences for both classes. The proposed approach is evaluated through standard performance metrics. Thus, to detect fraud efficiently, this study applies autoencoder as a feature extraction technique, synthetic minority oversampling technique (SMOTE) as a data sampling technique, and various gradient boosted decision tree-based classifiers as a classification algorithm. The experimental results show the combination of autoencoders followed by SMOTE on the LightGBM classifier achieved best results.

翻译：由于医疗保健是关键方面,医疗保险已成为尽量减少医疗费用的一个重要计划,此后,由于保险增加,医疗保险行业的欺诈活动大幅增加,而且欺诈已成为促使医疗费用增加的一个重要因素,尽管其影响可以通过欺诈检测技术得到减轻,但欺诈已成为促使医疗费用增加的一个重要因素。为侦查欺诈,采用了机器学习技术。为侦查欺诈,采用了机器学习技术。美国联邦政府医疗援助和医疗保险服务中心(医疗中心)发布了“医疗部分D”保险索赔,以开发欺诈检测系统。在课堂平衡和高高度医疗数据集中采用机器学习算法是一项具有挑战性的任务。为了压缩此类挑战,目前的工作的目的是在数据取样后进行特征提取,采用各种分类算法,以提高绩效。为检测实际属性,将属性转化为直线性或非线性组合,产生更小、更多样化的属性,从而降低规模。数据抽样通常用于解决课堂失衡问题,扩大少数群体类的频率或降低多数类的频率,以在数据采集数据取样后进行特征提取特征提取,同时将精细的精细的精细的精细的精细的精细的精细的精细方法用于测量。