In fingerprint-based systems, the size of databases increases considerably with population growth. In developing countries, because of the difficulty in using a central system when enlisting voters, it often happens that several regional voter databases are created and then merged to form a central database. A process is used to remove duplicates and ensure uniqueness by voter. Until now, companies specializing in biometrics use several costly computing servers with algorithms to perform large-scale deduplication based on fingerprints. These algorithms take a considerable time because of their complexity in O (n2), where n is the size of the database. This article presents an algorithm that can perform this operation in O (2n), with just a computer. It is based on the development of an index obtained using a 5 * 5 matrix performed on each fingerprint. This index makes it possible to build clusters of O (1) in size in order to compare fingerprints. This approach has been evaluated using close to 11 4000 fingerprints, and the results obtained show that this approach allows a penetration rate of less than 1%, an almost O (1) identification, and an O (n) deduplication. A base of 10 000 000 fingerprints can be deduplicated with a just computer in less than two hours, contrary to several days and servers for the usual tools. Keywords: fingerprint, cluster, index, deduplication.
翻译:在发展中国家,由于在争取选民时很难使用中央系统,往往会建立几个区域选民数据库,然后合并成一个中央数据库。使用一个程序来消除重复和确保选民的独特性。到目前为止,专门生物鉴别学的公司使用数个昂贵的计算机服务器,使用算法进行基于指纹的大规模解析。这些算法需要相当长的时间,因为它们在O(n2),即数据库的大小。这篇文章提出了一个算法,可以在O(2n)进行这一操作,只有一台计算机。它以利用每个指纹5*5矩阵获得的索引为基础。这个指数使得有可能建立O(1)群,以便比较指纹。这种方法使用近11 400的指纹进行了评估,其结果显示,这种方法的渗透率低于1%,几乎为O(1)识别值,O(n)是重复值。10 000个指纹的基数,可以用每个指纹的5* 5 5 矩阵为基础来制作索引。这个指数使得有可能建立O(1) 组群集,以便比较指纹。这个方法已经用近11 400的算法进行了评估,结果显示,这一方法可以使渗透率低于1%,几乎为O(1) 和O (n) 。10 000 指纹的基数日的基码可以被拆解为两小时。