基于指纹的大规模拆分 (Large scale deduplication based on fingerprints) - 专知论文

会员服务 ·

0

Performer · Less · 簇 · 缩放 · TOOLS ·

2021 年 1 月 13 日

Large scale deduplication based on fingerprints

翻译：基于指纹的大规模拆分

Jean Aymar Biyiha Nlend,Ibrahim Moukouop Nguena,Thomas Bouetou Bouetou

from arxiv, 18 pages, 12 figures

In fingerprint-based systems, the size of databases increases considerably with population growth. In developing countries, because of the difficulty in using a central system when enlisting voters, it often happens that several regional voter databases are created and then merged to form a central database. A process is used to remove duplicates and ensure uniqueness by voter. Until now, companies specializing in biometrics use several costly computing servers with algorithms to perform large-scale deduplication based on fingerprints. These algorithms take a considerable time because of their complexity in O (n2), where n is the size of the database. This article presents an algorithm that can perform this operation in O (2n), with just a computer. It is based on the development of an index obtained using a 5 * 5 matrix performed on each fingerprint. This index makes it possible to build clusters of O (1) in size in order to compare fingerprints. This approach has been evaluated using close to 11 4000 fingerprints, and the results obtained show that this approach allows a penetration rate of less than 1%, an almost O (1) identification, and an O (n) deduplication. A base of 10 000 000 fingerprints can be deduplicated with a just computer in less than two hours, contrary to several days and servers for the usual tools. Keywords: fingerprint, cluster, index, deduplication.

翻译：在发展中国家,由于在争取选民时很难使用中央系统,往往会建立几个区域选民数据库,然后合并成一个中央数据库。使用一个程序来消除重复和确保选民的独特性。到目前为止,专门生物鉴别学的公司使用数个昂贵的计算机服务器,使用算法进行基于指纹的大规模解析。这些算法需要相当长的时间,因为它们在O(n2),即数据库的大小。这篇文章提出了一个算法,可以在O(2n)进行这一操作,只有一台计算机。它以利用每个指纹5*5矩阵获得的索引为基础。这个指数使得有可能建立O(1)群,以便比较指纹。这种方法使用近11 400的指纹进行了评估,其结果显示,这种方法的渗透率低于1%,几乎为O(1)识别值,O(n)是重复值。10 000个指纹的基数,可以用每个指纹的5* 5 5 矩阵为基础来制作索引。这个指数使得有可能建立O(1) 组群集,以便比较指纹。这个方法已经用近11 400的算法进行了评估,结果显示,这一方法可以使渗透率低于1%,几乎为O(1) 和O (n) 。10 000 指纹的基数日的基码可以被拆解为两小时。

0

相关内容

Performer

深度生成模型综述(中文版)，43页pdf

专知会员服务

184+阅读 · 2020年11月23日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

专知会员服务

37+阅读 · 2020年5月9日

【CVPR2020】物体实例持续学习，Continual Learning of Object Instances

【CVPR2020】物体实例持续学习，Continual Learning of Object Instances

专知会员服务

32+阅读 · 2020年4月26日

【用十亿级半监督学习实现最先进图像与视频分类】《Billion-scale semi-supervised learning for state-of-the-art image and video classification | Facebook》

【用十亿级半监督学习实现最先进图像与视频分类】《Billion-scale semi-supervised learning for state-of-the-art image and video classification | Facebook》

专知会员服务

16+阅读 · 2019年10月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

基于深度学习的医疗影像论文汇总（Deep Learning Papers on Medical Image Analysis）

基于深度学习的医疗影像论文汇总（Deep Learning Papers on Medical Image Analysis）

AI研习社

17+阅读 · 2017年10月21日

深度学习医学图像分析文献集

深度学习医学图像分析文献集

机器学习研究会

19+阅读 · 2017年10月13日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

手把手教你由TensorFlow上手PyTorch（附代码）

手把手教你由TensorFlow上手PyTorch（附代码）

数据派THU

5+阅读 · 2017年10月1日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

Wireless Fingerprinting via Deep Learning: The Impact of Confounding Factors

Arxiv

0+阅读 · 2021年3月9日

Clusterability in Neural Networks

Arxiv

0+阅读 · 2021年3月4日

High-Performance Large-Scale Image Recognition Without Normalization

Arxiv

5+阅读 · 2021年2月11日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

AdderNet: Do We Really Need Multiplications in Deep Learning?

AdderNet: Do We Really Need Multiplications in Deep Learning?

Arxiv

10+阅读 · 2019年12月31日

Deep learning and its application to medical image segmentation

Arxiv

6+阅读 · 2018年3月23日

Visual Interpretability for Deep Learning: a Survey

Arxiv

16+阅读 · 2018年2月7日

Online Representation Learning with Single and Multi-layer Hebbian Networks for Image Classification

Arxiv

5+阅读 · 2018年1月29日

Geometry in Active Learning for Binary and Multi-class Image Segmentation

Arxiv

9+阅读 · 2018年1月16日

Deep Learning based Recommender System: A Survey and New Perspectives

Arxiv

6+阅读 · 2017年8月3日

VIP会员

文章信息

相关主题

相关VIP内容

深度生成模型综述(中文版)，43页pdf

专知会员服务

184+阅读 · 2020年11月23日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

专知会员服务

37+阅读 · 2020年5月9日

【CVPR2020】物体实例持续学习，Continual Learning of Object Instances

【CVPR2020】物体实例持续学习，Continual Learning of Object Instances

专知会员服务

32+阅读 · 2020年4月26日

【用十亿级半监督学习实现最先进图像与视频分类】《Billion-scale semi-supervised learning for state-of-the-art image and video classification | Facebook》

【用十亿级半监督学习实现最先进图像与视频分类】《Billion-scale semi-supervised learning for state-of-the-art image and video classification | Facebook》

专知会员服务

16+阅读 · 2019年10月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

网络科学赋能人工智能: 现状与展望

【NeurIPS2025教程】解释人工智能模型：可解释人工智能、数据中心人工智能与机制可解释性的方法与机遇

人工智能赋能作战行动：以俄乌战争为例

【ETHZ博士论文】表征学习在推进深度学习中的作用：效率、可扩展性与推理

相关资讯

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

基于深度学习的医疗影像论文汇总（Deep Learning Papers on Medical Image Analysis）

基于深度学习的医疗影像论文汇总（Deep Learning Papers on Medical Image Analysis）

AI研习社

17+阅读 · 2017年10月21日

深度学习医学图像分析文献集

深度学习医学图像分析文献集

机器学习研究会

19+阅读 · 2017年10月13日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

手把手教你由TensorFlow上手PyTorch（附代码）

手把手教你由TensorFlow上手PyTorch（附代码）

数据派THU

5+阅读 · 2017年10月1日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

Wireless Fingerprinting via Deep Learning: The Impact of Confounding Factors

Arxiv

0+阅读 · 2021年3月9日

Clusterability in Neural Networks

Arxiv

0+阅读 · 2021年3月4日

High-Performance Large-Scale Image Recognition Without Normalization

Arxiv

5+阅读 · 2021年2月11日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

AdderNet: Do We Really Need Multiplications in Deep Learning?

AdderNet: Do We Really Need Multiplications in Deep Learning?

Arxiv

10+阅读 · 2019年12月31日

Deep learning and its application to medical image segmentation

Arxiv

6+阅读 · 2018年3月23日

Visual Interpretability for Deep Learning: a Survey

Arxiv

16+阅读 · 2018年2月7日

Online Representation Learning with Single and Multi-layer Hebbian Networks for Image Classification

Arxiv

5+阅读 · 2018年1月29日

Geometry in Active Learning for Binary and Multi-class Image Segmentation

Arxiv

9+阅读 · 2018年1月16日

Deep Learning based Recommender System: A Survey and New Perspectives

Arxiv

6+阅读 · 2017年8月3日

微信扫码咨询专知VIP会员