LES3:基于学习的精确相似点搜索 (LES3: Learning-based Exact Set Similarity Search) - 专知论文

会员服务 ·

0

相似度 · 情景 · 确切的 · Extensibility · 剪枝 ·

2021 年 7 月 22 日

LES3: Learning-based Exact Set Similarity Search

翻译：LES3:基于学习的精确相似点搜索

Yifan Li,Xiaohui Yu,Nick Koudas

from arxiv, Accepted by VLDB2021

Set similarity search is a problem of central interest to a wide variety of applications such as data cleaning and web search. Past approaches on set similarity search utilize either heavy indexing structures, incurring large search costs or indexes that produce large candidate sets. In this paper, we design a learning-based exact set similarity search approach, LES3. Our approach first partitions sets into groups, and then utilizes a light-weight bitmap-like indexing structure, called token-group matrix (TGM), to organize groups and prune out candidates given a query set. In order to optimize pruning using the TGM, we analytically investigate the optimal partitioning strategy under certain distributional assumptions. Using these results, we then design a learning-based partitioning approach called L2P and an associated data representation encoding, PTR, to identify the partitions. We conduct extensive experiments on real and synthetic datasets to fully study LES3, establishing the effectiveness and superiority over other applicable approaches.

翻译：相近搜索是数据清理和网络搜索等多种应用中心关注的一个问题。以往的相近搜索方法使用重指数结构、导致大量搜索成本或产生大型候选数据集的索引。在本文中,我们设计了基于学习的精确相似搜索方法,即LES3。我们的第一个分区将组合成一组,然后使用轻量位图相似的索引结构,称为象征性群体矩阵(TGM)来组织群体和对被查询的候选人进行筛选。为了优化利用TGM, 我们根据某些分配假设对最佳分割战略进行了分析研究。然后,我们利用这些结果设计了一个称为L2P的基于学习的分割方法和一个相关的数据表示编码,即PTR,以确定分区。我们对真实的和合成的数据集进行了广泛的实验,以充分研究LES3, 确定相对于其他适用方法的有效性和优越性。

0

相关内容

相似度

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

113+阅读 · 2021年4月17日

深度学习搜索，Exploring Deep Learning for Search

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

用于大型遥感影像检索的深度学习，Deep Learning for Image Search and Retrieval in Large Remote Sensing Archives

用于大型遥感影像检索的深度学习，Deep Learning for Image Search and Retrieval in Large Remote Sensing Archives

专知会员服务

39+阅读 · 2020年4月6日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

67+阅读 · 2020年3月28日

【论文推荐】逆问题，深度学习，对称性破缺，Inverse Problems, Deep Learning, and Symmetry Breaking

【论文推荐】逆问题，深度学习，对称性破缺，Inverse Problems, Deep Learning, and Symmetry Breaking

专知会员服务

26+阅读 · 2020年3月27日

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

专知会员服务

98+阅读 · 2019年12月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

已删除

将门创投

12+阅读 · 2018年6月25日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

随波逐流：Similarity-Adaptive and Discrete Optimization

随波逐流：Similarity-Adaptive and Discrete Optimization

我爱读PAMI

5+阅读 · 2018年2月6日

[DLdigest-8] 每日一道算法

[DLdigest-8] 每日一道算法

深度学习每日摘要

4+阅读 · 2017年11月2日

Distributed Principal Subspace Analysis for Partitioned Big Data: Algorithms, Analysis, and Implementation

Arxiv

0+阅读 · 2021年9月22日

Learning by Examples Based on Multi-level Optimization

Arxiv

0+阅读 · 2021年9月22日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

Large Scale Local Online Similarity/Distance Learning Framework based on Passive/Aggressive

Arxiv

5+阅读 · 2018年4月5日

Efficient and Deep Person Re-Identification using Multi-Level Similarity

Arxiv

4+阅读 · 2018年4月2日

Instance Similarity Deep Hashing for Multi-Label Image Retrieval

Arxiv

5+阅读 · 2018年3月19日

Triplet-based Deep Similarity Learning for Person Re-Identification

Arxiv

3+阅读 · 2018年2月9日

VIP会员

文章信息

相关主题

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

113+阅读 · 2021年4月17日

深度学习搜索，Exploring Deep Learning for Search

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

用于大型遥感影像检索的深度学习，Deep Learning for Image Search and Retrieval in Large Remote Sensing Archives

用于大型遥感影像检索的深度学习，Deep Learning for Image Search and Retrieval in Large Remote Sensing Archives

专知会员服务

39+阅读 · 2020年4月6日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

67+阅读 · 2020年3月28日

【论文推荐】逆问题，深度学习，对称性破缺，Inverse Problems, Deep Learning, and Symmetry Breaking

【论文推荐】逆问题，深度学习，对称性破缺，Inverse Problems, Deep Learning, and Symmetry Breaking

专知会员服务

26+阅读 · 2020年3月27日

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

专知会员服务

98+阅读 · 2019年12月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】迈向鲁棒的零样本强化学习

一种基于视觉算法生成三维场景重建的多任务系统 | 2025最新200页

【普林斯顿博士论文】量化、评估与缓解现代机器学习系统中的风险

遥感中基于深度学习的领域自适应方法：全面综述

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

已删除

将门创投

12+阅读 · 2018年6月25日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

随波逐流：Similarity-Adaptive and Discrete Optimization

随波逐流：Similarity-Adaptive and Discrete Optimization

我爱读PAMI

5+阅读 · 2018年2月6日

[DLdigest-8] 每日一道算法

[DLdigest-8] 每日一道算法

深度学习每日摘要

4+阅读 · 2017年11月2日

相关论文

Distributed Principal Subspace Analysis for Partitioned Big Data: Algorithms, Analysis, and Implementation

Arxiv

0+阅读 · 2021年9月22日

Learning by Examples Based on Multi-level Optimization

Arxiv

0+阅读 · 2021年9月22日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

Large Scale Local Online Similarity/Distance Learning Framework based on Passive/Aggressive

Arxiv

5+阅读 · 2018年4月5日

Efficient and Deep Person Re-Identification using Multi-Level Similarity

Arxiv

4+阅读 · 2018年4月2日

Instance Similarity Deep Hashing for Multi-Label Image Retrieval

Arxiv

5+阅读 · 2018年3月19日

Triplet-based Deep Similarity Learning for Person Re-Identification

Arxiv

3+阅读 · 2018年2月9日

微信扫码咨询专知VIP会员