KOIOS: Top-k Semantic Overlap Set Search (KOIOS: Top-k Semantic Overlap Set Search) - 专知论文

会员服务 ·

0

相似度 · 图匹配 · 情景 · 搜索 · 高效的算法 ·

2023 年 4 月 20 日

KOIOS: Top-k Semantic Overlap Set Search

翻译：KOIOS: Top-k Semantic Overlap Set Search

Pranay Mundra,Jianhao Zhang,Fatemeh Nargesian,Nikolaus Augsten

We study the top-k set similarity search problem using semantic overlap. While vanilla overlap requires exact matches between set elements, semantic overlap allows elements that are syntactically different but semantically related to increase the overlap. The semantic overlap is the maximum matching score of a bipartite graph, where an edge weight between two set elements is defined by a user-defined similarity function, e.g., cosine similarity between embeddings. Common techniques like token indexes fail for semantic search since similar elements may be unrelated at the character level. Further, verifying candidates is expensive (cubic versus linear for syntactic overlap), calling for highly selective filters. We propose KOIOS, the first exact and efficient algorithm for semantic overlap search. KOIOS leverages sophisticated filters to minimize the number of required graph-matching calculations. Our experiments show that for medium to large sets less than 5% of the candidate sets need verification, and more than half of those sets are further pruned without requiring the expensive graph matching. We show the efficiency of our algorithm on four real datasets and demonstrate the improved result quality of semantic over vanilla set similarity search.

翻译：KOIOS: 基于语义重叠的 top-k 集合相似度搜索问题的研究。虽然传统的重叠要求集合元素的完全匹配，但是语义重叠允许语法上不同但语义相关的元素增加重叠。语义重叠是一个二分图的最大匹配分数，其中两个集合元素之间的边权值由用户定义的相似度函数（例如嵌入之间的余弦相似度）定义。传统的技术如标记索引在语义搜索上失灵，因为相似的元素可以在字符级别上毫不相关。此外，候选集的验证非常昂贵（对于语法上的重叠，为立方级别，而对于语义重叠，为线性级别），需要高度选择性的过滤器。我们提出了 KOIOS，这是一种基于精心设计的过滤器的确切且高效的算法，以最小化所需的图匹配计算数量。我们的实验证明，对于中大型集合，不到 5% 的候选集需要验证，其中超过一半的集合在不需要进行昂贵的图匹配的情况下被进一步修剪。我们展示了我们的算法在四个真实数据集上的效率，并演示了语义优于传统的集合相似度搜索的结果质量提高。

0

相关内容

相似度

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

专知会员服务

43+阅读 · 2020年7月19日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

专知会员服务

31+阅读 · 2020年5月20日

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

专知会员服务

61+阅读 · 2020年5月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

做目标检测，这6篇就够了：CVPR 2020目标检测论文盘点

做目标检测，这6篇就够了：CVPR 2020目标检测论文盘点

机器之心

23+阅读 · 2020年7月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

论文浅尝 | Global Relation Embedding for Relation Extraction

论文浅尝 | Global Relation Embedding for Relation Extraction

开放知识图谱

12+阅读 · 2019年3月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI界的State of the Art都在这里了

AI界的State of the Art都在这里了

机器之心

12+阅读 · 2018年12月10日

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

专知

33+阅读 · 2018年4月23日

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

专知

15+阅读 · 2018年2月13日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Zakharov系统的解的动力学行为研究

国家自然科学基金

0+阅读 · 2015年12月31日

函数空间、几何和Mahler测度

国家自然科学基金

0+阅读 · 2014年12月31日

EB病毒ncRNA在Burkitt淋巴瘤发病中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Dicer在先天性巨结肠发病中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于组合地图模型的图像检索算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

牛肉中残留Zeranol诱导人乳腺癌细胞增殖及激活STAT3信号通路的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

拟Frobenius-Lusztig核

国家自然科学基金

0+阅读 · 2012年12月31日

基于Caco-2细胞模型的蛋清肽结构与完整吸收关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-135a调控TRPC1在糖尿病肾病发病中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

线性积分方程的Galerkin快速谱方法

国家自然科学基金

0+阅读 · 2009年12月31日

HIPA: Hierarchical Patch Transformer for Single Image Super Resolution

Arxiv

0+阅读 · 2023年6月7日

Representation-agnostic distance-driven perturbation for optimizing ill-conditioned problems

Arxiv

0+阅读 · 2023年6月5日

LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation

Arxiv

0+阅读 · 2023年6月5日

NNSplitter: An Active Defense Solution for DNN Model via Automated Weight Obfuscation

Arxiv

0+阅读 · 2023年6月3日

Syntax-Aware Graph-to-Graph Transformer for Semantic Role Labelling

Arxiv

0+阅读 · 2023年6月2日

Invertible residual networks in the context of regularization theory for linear inverse problems

Arxiv

0+阅读 · 2023年6月2日

Multi-Agent Simulation for AI Behaviour Discovery in Operations Research

Arxiv

39+阅读 · 2021年8月30日

Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction

Arxiv

18+阅读 · 2019年12月25日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Global Relation Embedding for Relation Extraction

Arxiv

10+阅读 · 2018年4月19日

VIP会员

文章信息

相关主题

高效的算法

相关VIP内容

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

专知会员服务

43+阅读 · 2020年7月19日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

专知会员服务

31+阅读 · 2020年5月20日

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

专知会员服务

61+阅读 · 2020年5月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

做目标检测，这6篇就够了：CVPR 2020目标检测论文盘点

做目标检测，这6篇就够了：CVPR 2020目标检测论文盘点

机器之心

23+阅读 · 2020年7月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

论文浅尝 | Global Relation Embedding for Relation Extraction

论文浅尝 | Global Relation Embedding for Relation Extraction

开放知识图谱

12+阅读 · 2019年3月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI界的State of the Art都在这里了

AI界的State of the Art都在这里了

机器之心

12+阅读 · 2018年12月10日

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

专知

33+阅读 · 2018年4月23日

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

专知

15+阅读 · 2018年2月13日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

HIPA: Hierarchical Patch Transformer for Single Image Super Resolution

Arxiv

0+阅读 · 2023年6月7日

Representation-agnostic distance-driven perturbation for optimizing ill-conditioned problems

Arxiv

0+阅读 · 2023年6月5日

LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation

Arxiv

0+阅读 · 2023年6月5日

NNSplitter: An Active Defense Solution for DNN Model via Automated Weight Obfuscation

Arxiv

0+阅读 · 2023年6月3日

Syntax-Aware Graph-to-Graph Transformer for Semantic Role Labelling

Arxiv

0+阅读 · 2023年6月2日

Invertible residual networks in the context of regularization theory for linear inverse problems

Arxiv

0+阅读 · 2023年6月2日

Multi-Agent Simulation for AI Behaviour Discovery in Operations Research

Arxiv

39+阅读 · 2021年8月30日

Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction

Arxiv

18+阅读 · 2019年12月25日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Global Relation Embedding for Relation Extraction

Arxiv

10+阅读 · 2018年4月19日

相关基金

Zakharov系统的解的动力学行为研究

国家自然科学基金

0+阅读 · 2015年12月31日

函数空间、几何和Mahler测度

国家自然科学基金

0+阅读 · 2014年12月31日

EB病毒ncRNA在Burkitt淋巴瘤发病中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Dicer在先天性巨结肠发病中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于组合地图模型的图像检索算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

牛肉中残留Zeranol诱导人乳腺癌细胞增殖及激活STAT3信号通路的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

拟Frobenius-Lusztig核

国家自然科学基金

0+阅读 · 2012年12月31日

基于Caco-2细胞模型的蛋清肽结构与完整吸收关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-135a调控TRPC1在糖尿病肾病发病中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

线性积分方程的Galerkin快速谱方法

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员