OPORP: 一次变换+一次随机投影 (OPORP: One Permutation + One Random Projection) - 专知论文

会员服务 ·

0

估计/估计量 · 向量化 · 规范化的 · Projection · 方差 ·

2023 年 2 月 7 日

OPORP: One Permutation + One Random Projection

翻译：OPORP: 一次变换+一次随机投影

Ping Li,Xiaoyun Li

Consider two $D$-dimensional data vectors (e.g., embeddings): $u, v$. In many embedding-based retrieval (EBR) applications where the vectors are generated from trained models, $D=256\sim 1024$ are common. In this paper, OPORP (one permutation + one random projection) uses a variant of the ``count-sketch'' type of data structures for achieving data reduction/compression. With OPORP, we first apply a permutation on the data vectors. A random vector $r$ is generated i.i.d. with moments: $E(r_i) = 0, E(r_i^2)=1, E(r_i^3) =0, E(r_i^4)=s$. We multiply (as dot product) $r$ with all permuted data vectors. Then we break the $D$ columns into $k$ equal-length bins and aggregate (i.e., sum) the values in each bin to obtain $k$ samples from each data vector. One crucial step is to normalize the $k$ samples to the unit $l_2$ norm. We show that the estimation variance is essentially: $(s-1)A + \frac{D-k}{D-1}\frac{1}{k}\left[ (1-\rho^2)^2 -2A\right]$, where $A\geq 0$ is a function of the data ($u,v$). This formula reveals several key properties: (1) We need $s=1$. (2) The factor $\frac{D-k}{D-1}$ can be highly beneficial in reducing variances. (3) The term $\frac{1}{k}(1-\rho^2)^2$ is actually the asymptotic variance of the classical correlation estimator. We illustrate that by letting the $k$ in OPORP to be $k=1$ and repeat the procedure $m$ times, we exactly recover the work of ``very spars random projections'' (VSRP). This immediately leads to a normalized estimator for VSRP which substantially improves the original estimator of VSRP. In summary, with OPORP, the two key steps: (i) the normalization and (ii) the fixed-length binning scheme, have considerably improved the accuracy in estimating the cosine similarity, which is a routine (and crucial) task in modern embedding-based retrieval (EBR) applications.

翻译：考虑两个维基数据矢量 (例如, 嵌入) : 美元 : 美元 : 美元 : 在许多嵌入式的矢量回收( EBR) 应用中, 由经过训练的模型生成的矢量, 美元=256\sim 1024美元是常见的。在本文中, OPORP (一个变换 + 一个随机投影) 使用“ 计算- 获取” 数据结构的变式来减少/ 压缩数据。有了 OPORP, 我们首先在数据矢量矢量矢量矢量上应用一个变异性 : 美元美元 = 美元美元 = 美元美元。美元美元 = 美元美元 = 美元 = 美元 = 1, 美元美元 = 1 。原始值= 0, E (r_ i_ 4) 以所有数据矢量变数。然后, 我们将美元 = =xxxxxxxxx 。。

0

相关内容

估计/估计量

估计/估计量

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【硬核书】树与网络上的概率，716页pdf

【硬核书】树与网络上的概率，716页pdf

专知会员服务

77+阅读 · 2021年12月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

专知

12+阅读 · 2018年5月18日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

抗MRSA活性rhodomyrtosone B类似物的合成和构效关系研究

国家自然科学基金

0+阅读 · 2015年12月31日

蛋白激酶ERK1/2介导GLP-1改善胰岛β细胞功能障碍作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

混凝土开裂状态对其内部钢筋锈蚀的影响研究

国家自然科学基金

0+阅读 · 2012年12月31日

海洋吡咯生物碱的设计、合成与活性研究

国家自然科学基金

0+阅读 · 2011年12月31日

层topos中的拓扑结构与序结构

国家自然科学基金

0+阅读 · 2011年12月31日

复形范畴中的Gorenstein同调维数

国家自然科学基金

0+阅读 · 2009年12月31日

OFDM光信号传输及信号处理基础理论与关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

平行因子分析模型基础理论及其在非平稳信号处理中的应用

国家自然科学基金

0+阅读 · 2009年12月31日

氮杂双环手性有序介孔有机硅催化剂的设计、合成及催化反应研究

国家自然科学基金

1+阅读 · 2009年12月31日

On the $α$-spectral radius of hypergraphs

Arxiv

0+阅读 · 2023年3月29日

Quadratic Gradient: Combining Gradient Algorithms and Newton's Method as One

Arxiv

0+阅读 · 2023年3月29日

Size-effects of metamaterial beams subjected to pure bending: on boundary conditions and parameter identification in the relaxed micromorphic model

Arxiv

0+阅读 · 2023年3月28日

A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit

A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit

Arxiv

0+阅读 · 2023年3月27日

Unsourced Random Access with the MIMO Receiver: Projection Decoding Analysis

Arxiv

0+阅读 · 2023年3月27日

On the Convergence of Numerical Integration as a Finite Matrix Approximation to Multiplication Operator

Arxiv

0+阅读 · 2023年3月27日

Right-left asymmetry of the eigenvector method: A simulation study

Arxiv

0+阅读 · 2023年3月27日

On the Convergence of AdaGrad on $\R^{d}$: Beyond Convexity, Non-Asymptotic Rate and Acceleration

Arxiv

0+阅读 · 2023年3月24日

Optimal Permutation Estimation in Crowd-Sourcing problems

Arxiv

0+阅读 · 2023年3月24日

Estimating Maximal Symmetries of Regression Functions via Subgroup Lattices

Arxiv

0+阅读 · 2023年3月23日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【硬核书】树与网络上的概率，716页pdf

【硬核书】树与网络上的概率，716页pdf

专知会员服务

77+阅读 · 2021年12月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄罗斯核条令演变趋势》最新56页报告

【CMU博士论文】以人为中心的强化学习

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

认知优势：人工智能在国家安全决策中的核心作用

相关资讯

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

专知

12+阅读 · 2018年5月18日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

相关论文

On the $α$-spectral radius of hypergraphs

Arxiv

0+阅读 · 2023年3月29日

Quadratic Gradient: Combining Gradient Algorithms and Newton's Method as One

Arxiv

0+阅读 · 2023年3月29日

Size-effects of metamaterial beams subjected to pure bending: on boundary conditions and parameter identification in the relaxed micromorphic model

Arxiv

0+阅读 · 2023年3月28日

A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit

A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit

Arxiv

0+阅读 · 2023年3月27日

Unsourced Random Access with the MIMO Receiver: Projection Decoding Analysis

Arxiv

0+阅读 · 2023年3月27日

On the Convergence of Numerical Integration as a Finite Matrix Approximation to Multiplication Operator

Arxiv

0+阅读 · 2023年3月27日

Right-left asymmetry of the eigenvector method: A simulation study

Arxiv

0+阅读 · 2023年3月27日

On the Convergence of AdaGrad on $\R^{d}$: Beyond Convexity, Non-Asymptotic Rate and Acceleration

Arxiv

0+阅读 · 2023年3月24日

Optimal Permutation Estimation in Crowd-Sourcing problems

Arxiv

0+阅读 · 2023年3月24日

Estimating Maximal Symmetries of Regression Functions via Subgroup Lattices

Arxiv

0+阅读 · 2023年3月23日

相关基金

抗MRSA活性rhodomyrtosone B类似物的合成和构效关系研究

国家自然科学基金

0+阅读 · 2015年12月31日

蛋白激酶ERK1/2介导GLP-1改善胰岛β细胞功能障碍作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

混凝土开裂状态对其内部钢筋锈蚀的影响研究

国家自然科学基金

0+阅读 · 2012年12月31日

海洋吡咯生物碱的设计、合成与活性研究

国家自然科学基金

0+阅读 · 2011年12月31日

层topos中的拓扑结构与序结构

国家自然科学基金

0+阅读 · 2011年12月31日

复形范畴中的Gorenstein同调维数

国家自然科学基金

0+阅读 · 2009年12月31日

OFDM光信号传输及信号处理基础理论与关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

平行因子分析模型基础理论及其在非平稳信号处理中的应用

国家自然科学基金

0+阅读 · 2009年12月31日

氮杂双环手性有序介孔有机硅催化剂的设计、合成及催化反应研究

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员