The costs and impacts of government corruption range from impairing a country's economic growth to affecting its citizens' well-being and safety. Public contracting between government dependencies and private sector instances, referred to as public procurement, is a fertile land of opportunity for corrupt practices, generating substantial monetary losses worldwide. Thus, identifying and deterring corrupt activities between the government and the private sector is paramount. However, due to several factors, corruption in public procurement is challenging to identify and track, leading to corrupt practices going unnoticed. This paper proposes a machine learning model based on an ensemble of random forest classifiers, which we call hyper-forest, to identify and predict corrupt contracts in M\'exico's public procurement data. This method's results correctly detect most of the corrupt and non-corrupt contracts evaluated in the dataset. Furthermore, we found that the most critical predictors considered in the model are those related to the relationship between buyers and suppliers rather than those related to features of individual contracts. Also, the method proposed here is general enough to be trained with data from other countries. Overall, our work presents a tool that can help in the decision-making process to identify, predict and analyze corruption in public procurement contracts.
翻译:政府腐败的成本和影响从损害一国的经济增长到影响其公民的福利和安全。政府依赖性和私营部门(称为公共采购)之间的公开订约是腐败行径的良机之地,在全世界造成了巨大的货币损失。因此,查明和阻止政府与私营部门之间的腐败活动至关重要。然而,由于若干因素,公共采购中的腐败对查明和追踪腐败做法具有挑战性,导致腐败做法得不到注意。本文件提议了一个机械学习模式,其基础是随机的森林分类人员组合,我们称之为超森林,在M\'exico的公共采购数据中查明和预测腐败合同。这种方法的结果正确地检测了在数据集中评估的大多数腐败和非腐败合同。此外,我们发现,模型中考虑的最关键的预测因素是那些与买方和供应商之间的关系有关,而不是与个别合同特征有关。此外,此处提出的方法很笼统,足以用其他国家的数据来培训。总体而言,我们的工作提供了一个工具,有助于在决策过程中查明、预测和分析以及公共采购中的腐败。