Data collected in criminal investigations may suffer from: (i) incompleteness, due to the covert nature of criminal organisations; (ii) incorrectness, caused by either unintentional data collection errors and intentional deception by criminals; (iii) inconsistency, when the same information is collected into law enforcement databases multiple times, or in different formats. In this paper we analyse nine real criminal networks of different nature (i.e., Mafia networks, criminal street gangs and terrorist organizations) in order to quantify the impact of incomplete data and to determine which network type is most affected by it. The networks are firstly pruned following two specific methods: (i) random edges removal, simulating the scenario in which the Law Enforcement Agencies (LEAs) fail to intercept some calls, or to spot sporadic meetings among suspects; (ii) nodes removal, that catches the hypothesis in which some suspects cannot be intercepted or investigated. Finally we compute spectral (i.e., Adjacency, Laplacian and Normalised Laplacian Spectral Distances) and matrix (i.e., Root Euclidean Distance) distances between the complete and pruned networks, which we compare using statistical analysis. Our investigation identified two main features: first, the overall understanding of the criminal networks remains high even with incomplete data on criminal interactions (i.e., 10% removed edges); second, removing even a small fraction of suspects not investigated (i.e., 2% removed nodes) may lead to significant misinterpretation of the overall network.
翻译:在刑事调查中收集的数据可能受到以下因素的影响:(一)由于犯罪组织的隐蔽性质而收集的数据不完整;(二)由于无意的数据收集错误和犯罪分子故意欺骗而导致的不正确;(三)如果将同一信息多次或以不同形式收集到执法数据库中,则不一致;在本文件中,我们分析性质不同的九个真正的犯罪网络(即黑手党网络、街头犯罪团伙和恐怖组织),以便量化不完整数据的影响,并确定哪些网络类型受到其最深层的影响。这些网络首先遵循两种具体方法:(一) 随机边缘清除,模拟执法机构无法拦截一些电话或无法安排嫌疑人间断断开会议的总体网络的情景;(二) 结点清除,从而得出一些嫌疑人无法被拦截或调查的假设。最后,我们可能算出光谱(即相近、拉普拉蒂和普通化的拉普莱斯特洛分界网)和矩阵(即根欧普利德距离),首先按照以下两种具体方法运行。