Efficient distributed representations beyond negative sampling (Efficient distributed representations beyond negative sampling) - 专知论文

会员服务 ·

0

分布式表示 · 负采样 · 表示 · 样本 · 负例 ·

2023 年 3 月 30 日

Efficient distributed representations beyond negative sampling

翻译：Efficient distributed representations beyond negative sampling

Lorenzo Dall'Amico,Enrico Maria Belliardo

This article describes an efficient method to learn distributed representations, also known as embeddings. This is accomplished minimizing an objective function similar to the one introduced in the Word2Vec algorithm and later adopted in several works. The optimization computational bottleneck is the calculation of the softmax normalization constants for which a number of operations scaling quadratically with the sample size is required. This complexity is unsuited for large datasets and negative sampling is a popular workaround, allowing one to obtain distributed representations in linear time with respect to the sample size. Negative sampling consists, however, in a change of the loss function and hence solves a different optimization problem from the one originally proposed. Our contribution is to show that the sotfmax normalization constants can be estimated in linear time, allowing us to design an efficient optimization strategy to learn distributed representations. We test our approximation on two popular applications related to word and node embeddings. The results evidence competing performance in terms of accuracy with respect to negative sampling with a remarkably lower computational time.

翻译：高效的负采样之外的分布式表示方法本文介绍了一种学习分布式表示方法（也称为嵌入）的高效方法。我们通过最小化类似于Word2Vec算法中引入的色目函数来实现。在此优化计算中，计算softmax归一化常数是一个计算瓶颈，需要执行许多操作，其规模与样本量的平方成正比。此复杂性不适用于大型数据集，负采样是一个流行的解决方法，使得我们可以以与样本大小线性时间成比例的速度获得分布式表示。然而，负采样是通过改变损失函数来解决不同的优化问题。我们的贡献是表明softmax归一化常数可以在线性时间内估计，从而允许我们设计一种高效的优化策略来学习分布式表示方法。我们在与单词和节点嵌入相关的两个流行应用程序上测试了我们的近似方法。结果显示，与负采样相比，我们的方法在准确性方面具有相当的竞争性，且计算时间显着更短。

0

相关内容

分布式表示

分布式表示

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【ACL2022】解释生成的多尺度分布深度变分自编码器, Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation

【ACL2022】解释生成的多尺度分布深度变分自编码器, Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation

专知会员服务

12+阅读 · 2022年3月24日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

可解释高效异构图卷积网络，Interpretable and Efficient Heterogeneous Graph Convolutional Network

可解释高效异构图卷积网络，Interpretable and Efficient Heterogeneous Graph Convolutional Network

专知会员服务

63+阅读 · 2020年7月12日

【KDD2020-清华大学】理解图表示学习中的负采样，Understanding Negative Sampling in Graph Representation Learning

【KDD2020-清华大学】理解图表示学习中的负采样，Understanding Negative Sampling in Graph Representation Learning

专知会员服务

58+阅读 · 2020年5月21日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

毛冬青PA类五环三萜选择性拮抗ADP的机制和构效研究

国家自然科学基金

0+阅读 · 2015年12月31日

偕二氟取代Combretastatins衍生物的设计与合成

国家自然科学基金

0+阅读 · 2014年12月31日

肿瘤抗原HCA587与STAT3的相互作用及其促进肿瘤转移的分子机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

M2L2型水溶性金属-药物配合物的定向合成与抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

弹性力学中鞍点问题的高效预处理方法与理论

国家自然科学基金

0+阅读 · 2013年12月31日

基于PEG-长链脂肪烷修饰的人生长激素长效剂型

国家自然科学基金

0+阅读 · 2013年12月31日

面向新能源汽车吸能件的碳纤维复合材料冲击损伤机理与多尺度优化设计研究

国家自然科学基金

0+阅读 · 2012年12月31日

小样本空间制图

国家自然科学基金

0+阅读 · 2012年12月31日

“用户行为数据”稀疏表示的理论与方法

国家自然科学基金

1+阅读 · 2012年12月31日

几何动力学在非完整系统几何数值积分中的应用研究

国家自然科学基金

0+阅读 · 2008年12月31日

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning

Arxiv

0+阅读 · 2023年5月19日

Efficient sampling of non log-concave posterior distributions with mixture of noises

Arxiv

0+阅读 · 2023年5月18日

Q-SHED: Distributed Optimization at the Edge via Hessian Eigenvectors Quantization

Arxiv

0+阅读 · 2023年5月18日

Clustering-Aware Negative Sampling for Unsupervised Sentence Representation

Arxiv

0+阅读 · 2023年5月17日

Distributed Graph Neural Network Training: A Survey

Arxiv

16+阅读 · 2022年11月1日

Deep Generative Models on 3D Representations: A Survey

Arxiv

14+阅读 · 2022年10月27日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Distributed Graph Convolutional Networks

Arxiv

19+阅读 · 2020年7月13日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

VIP会员

文章信息

相关主题

分布式表示

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【ACL2022】解释生成的多尺度分布深度变分自编码器, Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation

【ACL2022】解释生成的多尺度分布深度变分自编码器, Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation

专知会员服务

12+阅读 · 2022年3月24日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

可解释高效异构图卷积网络，Interpretable and Efficient Heterogeneous Graph Convolutional Network

可解释高效异构图卷积网络，Interpretable and Efficient Heterogeneous Graph Convolutional Network

专知会员服务

63+阅读 · 2020年7月12日

【KDD2020-清华大学】理解图表示学习中的负采样，Understanding Negative Sampling in Graph Representation Learning

【KDD2020-清华大学】理解图表示学习中的负采样，Understanding Negative Sampling in Graph Representation Learning

专知会员服务

58+阅读 · 2020年5月21日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【阿姆斯特丹博士论文】在测试时学习泛化

迈向深度基础模型：基于视觉的深度估计最新趋势

如何对齐？北大最新271页ICML2025教程《语言模型的对齐方法：一种机器学习视角》

《人工智能知识工程指南（1.0）》正式发布，44页pdf

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning

Arxiv

0+阅读 · 2023年5月19日

Efficient sampling of non log-concave posterior distributions with mixture of noises

Arxiv

0+阅读 · 2023年5月18日

Q-SHED: Distributed Optimization at the Edge via Hessian Eigenvectors Quantization

Arxiv

0+阅读 · 2023年5月18日

Clustering-Aware Negative Sampling for Unsupervised Sentence Representation

Arxiv

0+阅读 · 2023年5月17日

Distributed Graph Neural Network Training: A Survey

Arxiv

16+阅读 · 2022年11月1日

Deep Generative Models on 3D Representations: A Survey

Arxiv

14+阅读 · 2022年10月27日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Distributed Graph Convolutional Networks

Arxiv

19+阅读 · 2020年7月13日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

相关基金

毛冬青PA类五环三萜选择性拮抗ADP的机制和构效研究

国家自然科学基金

0+阅读 · 2015年12月31日

偕二氟取代Combretastatins衍生物的设计与合成

国家自然科学基金

0+阅读 · 2014年12月31日

肿瘤抗原HCA587与STAT3的相互作用及其促进肿瘤转移的分子机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

M2L2型水溶性金属-药物配合物的定向合成与抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

弹性力学中鞍点问题的高效预处理方法与理论

国家自然科学基金

0+阅读 · 2013年12月31日

基于PEG-长链脂肪烷修饰的人生长激素长效剂型

国家自然科学基金

0+阅读 · 2013年12月31日

面向新能源汽车吸能件的碳纤维复合材料冲击损伤机理与多尺度优化设计研究

国家自然科学基金

0+阅读 · 2012年12月31日

小样本空间制图

国家自然科学基金

0+阅读 · 2012年12月31日

“用户行为数据”稀疏表示的理论与方法

国家自然科学基金

1+阅读 · 2012年12月31日

几何动力学在非完整系统几何数值积分中的应用研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员