In this paper, we propose a robust election simulation model and independently developed election anomaly detection algorithm that demonstrates the simulation's utility. The simulation generates artificial elections with similar properties and trends as elections from the real world, while giving users control and knowledge over all the important components of the elections. We generate a clean election results dataset without fraud as well as datasets with varying degrees of fraud. We then measure how well the algorithm is able to successfully detect the level of fraud present. The algorithm determines how similar actual election results are as compared to the predicted results from polling and a regression model of other regions that have similar demographics. We use k-means to partition electoral regions into clusters such that demographic homogeneity is maximized among clusters. We then use a novelty detection algorithm implemented as a one-class Support Vector Machine where the clean data is provided in the form of polling predictions and regression predictions. The regression predictions are built from the actual data in such a way that the data supervises itself. We show both the effectiveness of the simulation technique and the machine learning model in its success in identifying fraudulent regions.
翻译:在本文中,我们提出了一个强有力的选举模拟模型,并独立开发了显示模拟效用的选举异常现象检测算法。模拟产生与真实世界的选举性质和趋势相类似的人为选举,同时赋予用户对选举所有重要组成部分的控制权和知识。我们产生了一个干净的选举结果数据集,没有欺诈,也没有不同程度的欺诈的数据集。然后我们衡量算法在多大程度上能够成功地检测到目前的欺诈程度。算法确定实际选举结果与投票的预测结果和人口统计类似的其他区域的回归模型相比如何相近。我们使用 k 手段将选举区域划分为组群,使人口同质性在组群中最大化。我们随后使用新颖的检测算法,作为单级支持矢量机,以投票预测和回归预测的形式提供干净的数据。回归预测是以数据监督本身的实际数据为基础的。我们既展示了模拟技术的有效性,也展示了机器学习模型在识别欺诈区域方面是否成功。