具有高效文本检索预感的超高维度偏小表示式 (Ultra-High Dimensional Sparse Representations with Binarization for Efficient Text Retrieval) - 专知论文

会员服务 ·

0

稀疏 · 稀疏编码 · 多词一义性 · 一词多义性 · MoDELS ·

2021 年 10 月 15 日

Ultra-High Dimensional Sparse Representations with Binarization for Efficient Text Retrieval

翻译：具有高效文本检索预感的超高维度偏小表示式

Kyoung-Rok Jang,Junmo Kang,Giwon Hong,Sung-Hyon Myaeng,Joohee Park,Taewon Yoon,Heecheol Seo

from arxiv, To appear at EMNLP 2021

The semantic matching capabilities of neural information retrieval can ameliorate synonymy and polysemy problems of symbolic approaches. However, neural models' dense representations are more suitable for re-ranking, due to their inefficiency. Sparse representations, either in symbolic or latent form, are more efficient with an inverted index. Taking the merits of the sparse and dense representations, we propose an ultra-high dimensional (UHD) representation scheme equipped with directly controllable sparsity. UHD's large capacity and minimal noise and interference among the dimensions allow for binarized representations, which are highly efficient for storage and search. Also proposed is a bucketing method, where the embeddings from multiple layers of BERT are selected/merged to represent diverse linguistic aspects. We test our models with MS MARCO and TREC CAR, showing that our models outperforms other sparse models

翻译：神经信息检索的语义匹配能力可以改善象征性方法的同义和多细胞问题。然而,神经模型的密度表示由于效率低,更适合重新排列,但神经模型的密度表示由于其效率低而更适合重新排列。以象征或潜在形式出现的粗化表示,以反向指数更有效率。根据分散和密集的表示的优点,我们提议一个超高维(UHD)代表机制,配有直接可控制的聚度。UHD的巨大容量和最小的噪音及各维度的干扰使得能够进行二元化的表示,对于储存和搜索来说,这些表示效率很高。还提出了一种桶装方法,从多层BERT中挑选/合并嵌入,以代表不同的语言方面。我们用MS MARCO和TREC CAR测试我们的模型,显示我们的模型比其他稀少的模式要强。

0

相关内容

【ICCV2021】多层次对比学习的跨模态检索方法

【ICCV2021】多层次对比学习的跨模态检索方法

专知会员服务

23+阅读 · 2021年10月24日

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

专知会员服务

30+阅读 · 2020年10月9日

【微软亚研】预训练文本表示作为元学习，Pre-training Text Representations

【微软亚研】预训练文本表示作为元学习，Pre-training Text Representations

专知会员服务

40+阅读 · 2020年4月17日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

专知会员服务

54+阅读 · 2019年12月22日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【ICLR 2019】表示形式语言：比较有限自动机和循环神经网络 REPRESENTING FORMAL LANGUAGES：A COMPARISON BETWEEN FINITE AUTOMATA AND RECURRENT NEURAL NETWORKS

【ICLR 2019】表示形式语言：比较有限自动机和循环神经网络 REPRESENTING FORMAL LANGUAGES：A COMPARISON BETWEEN FINITE AUTOMATA AND RECURRENT NEURAL NETWORKS

专知会员服务

7+阅读 · 2019年11月17日

【KDD2019教程】从浅层到深层的语言表达:预训练、微调，等等，From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond

【KDD2019教程】从浅层到深层的语言表达:预训练、微调，等等，From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond

专知会员服务

16+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

专知

8+阅读 · 2019年12月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Jointly Improving Summarization and Sentiment Classification

Jointly Improving Summarization and Sentiment Classification

黑龙江大学自然语言处理实验室

3+阅读 · 2018年6月12日

Linguistically Regularized LSTMs for Sentiment Classification

Linguistically Regularized LSTMs for Sentiment Classification

黑龙江大学自然语言处理实验室

8+阅读 · 2018年5月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval

Arxiv

6+阅读 · 2021年10月12日

HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval

Arxiv

7+阅读 · 2021年8月18日

Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval

Arxiv

4+阅读 · 2021年5月8日

A Universal Representation Transformer Layer for Few-Shot Image Classification

Arxiv

7+阅读 · 2020年9月2日

Multi-Scale Self-Attention for Text Classification

Arxiv

4+阅读 · 2019年12月2日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

CEDR: Contextualized Embeddings for Document Ranking

Arxiv

4+阅读 · 2019年8月19日

Efficient end-to-end learning for quantizable representations

Arxiv

6+阅读 · 2018年5月15日

VIP会员

文章信息

相关主题

多词一义性

一词多义性

相关VIP内容

【ICCV2021】多层次对比学习的跨模态检索方法

【ICCV2021】多层次对比学习的跨模态检索方法

专知会员服务

23+阅读 · 2021年10月24日

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

专知会员服务

30+阅读 · 2020年10月9日

【微软亚研】预训练文本表示作为元学习，Pre-training Text Representations

【微软亚研】预训练文本表示作为元学习，Pre-training Text Representations

专知会员服务

40+阅读 · 2020年4月17日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

专知会员服务

54+阅读 · 2019年12月22日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【ICLR 2019】表示形式语言：比较有限自动机和循环神经网络 REPRESENTING FORMAL LANGUAGES：A COMPARISON BETWEEN FINITE AUTOMATA AND RECURRENT NEURAL NETWORKS

【ICLR 2019】表示形式语言：比较有限自动机和循环神经网络 REPRESENTING FORMAL LANGUAGES：A COMPARISON BETWEEN FINITE AUTOMATA AND RECURRENT NEURAL NETWORKS

专知会员服务

7+阅读 · 2019年11月17日

【KDD2019教程】从浅层到深层的语言表达:预训练、微调，等等，From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond

【KDD2019教程】从浅层到深层的语言表达:预训练、微调，等等，From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond

专知会员服务

16+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

专知

8+阅读 · 2019年12月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Jointly Improving Summarization and Sentiment Classification

Jointly Improving Summarization and Sentiment Classification

黑龙江大学自然语言处理实验室

3+阅读 · 2018年6月12日

Linguistically Regularized LSTMs for Sentiment Classification

Linguistically Regularized LSTMs for Sentiment Classification

黑龙江大学自然语言处理实验室

8+阅读 · 2018年5月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval

Arxiv

6+阅读 · 2021年10月12日

HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval

Arxiv

7+阅读 · 2021年8月18日

Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval

Arxiv

4+阅读 · 2021年5月8日

A Universal Representation Transformer Layer for Few-Shot Image Classification

Arxiv

7+阅读 · 2020年9月2日

Multi-Scale Self-Attention for Text Classification

Arxiv

4+阅读 · 2019年12月2日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

CEDR: Contextualized Embeddings for Document Ranking

Arxiv

4+阅读 · 2019年8月19日

Efficient end-to-end learning for quantizable representations

Arxiv

6+阅读 · 2018年5月15日

微信扫码咨询专知VIP会员