超模:国家劳工政策局任务的资源与业绩之间的权衡取舍,具有超多维计算功能,并嵌入n-gg统计 (HyperEmbed: Tradeoffs Between Resources and Performance in NLP Tasks with Hyperdimensional Computing enabled Embedding of n-gram Statistics) - 专知论文

会员服务 ·

0

N元 · Performer · 统计量 · 分布式表示 · NLP ·

2021 年 5 月 31 日

HyperEmbed: Tradeoffs Between Resources and Performance in NLP Tasks with Hyperdimensional Computing enabled Embedding of n-gram Statistics

翻译：超模:国家劳工政策局任务的资源与业绩之间的权衡取舍,具有超多维计算功能,并嵌入n-gg统计

Pedro Alonso,Kumar Shridhar,Denis Kleyko,Evgeny Osipov,Marcus Liwicki

from arxiv, 9 pages, 1 figure, 6 tables

Recent advances in Deep Learning have led to a significant performance increase on several NLP tasks, however, the models become more and more computationally demanding. Therefore, this paper tackles the domain of computationally efficient algorithms for NLP tasks. In particular, it investigates distributed representations of n-gram statistics of texts. The representations are formed using hyperdimensional computing enabled embedding. These representations then serve as features, which are used as input to standard classifiers. We investigate the applicability of the embedding on one large and three small standard datasets for classification tasks using nine classifiers. The embedding achieved on par F1 scores while decreasing the time and memory requirements by several times compared to the conventional n-gram statistics, e.g., for one of the classifiers on a small dataset, the memory reduction was 6.18 times; while train and test speed-ups were 4.62 and 3.84 times, respectively. For many classifiers on the large dataset, memory reduction was ca. 100 times and train and test speed-ups were over 100 times. Importantly, the usage of distributed representations formed via hyperdimensional computing allows dissecting strict dependency between the dimensionality of the representation and n-gram size, thus, opening a room for tradeoffs.

翻译：深层学习最近的进展导致若干NLP任务的业绩大幅提高,然而,模型在计算上要求越来越高,因此,本文处理的是NLP任务计算效率算法的领域,特别是调查正克统计的分布式表示,这些表示是使用超维计算促成嵌入的嵌入方式形成的。这些表示作为特征,然后用作标准分类器的投入。我们用9个分类器调查将一个和三个小标准数据集嵌入一个和三个标准数据集用于分类任务的适用性。在F1分上实现的嵌入,而与传统的正克统计相比,时间和内存要求减少数倍,例如,对于一个小数据集的分类师来说,记忆减少6.18倍;而培训和测试加速分别是4.62和3.84倍。对于大型数据集的许多分类器来说,记忆减少100倍,培训和测试加速超过100倍。使用通过超维度计算得出的分布式表示法,从而可以使全方位代表面和正方位之间出现严格的依赖性。

0

相关内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

深度学习自然语言处理概述，216页ppt，Jindřich Helcl

深度学习自然语言处理概述，216页ppt，Jindřich Helcl

专知会员服务

216+阅读 · 2020年4月26日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

临床自然语言处理中的嵌入综述，SECNLP: A survey of embeddings

临床自然语言处理中的嵌入综述，SECNLP: A survey of embeddings

专知会员服务

38+阅读 · 2020年3月23日

【图机器学习论文】综述：图嵌入技术、应用和性能（Graph Embedding Techniques, Applications, and Performance: A Survey）

【图机器学习论文】综述：图嵌入技术、应用和性能（Graph Embedding Techniques, Applications, and Performance: A Survey）

专知会员服务

73+阅读 · 2019年12月16日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

已删除

将门创投

5+阅读 · 2018年6月7日

Linguistically Regularized LSTMs for Sentiment Classification

Linguistically Regularized LSTMs for Sentiment Classification

黑龙江大学自然语言处理实验室

8+阅读 · 2018年5月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

自然语言处理 (三)　之　word embedding

自然语言处理 (三)　之　word embedding

DeepLearning中文论坛

19+阅读 · 2015年8月3日

Geometric Lower Bounds for Distributed Parameter Estimation under Communication Constraints

Arxiv

0+阅读 · 2021年7月22日

Robust Nonparametric Regression with Deep Neural Networks

Arxiv

0+阅读 · 2021年7月21日

ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models with Huge Embedding Table

Arxiv

7+阅读 · 2021年4月17日

Coupled Layer-wise Graph Convolution for Transportation Demand Prediction

Arxiv

12+阅读 · 2020年12月15日

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

16+阅读 · 2020年3月30日

Attention Forcing for Sequence-to-sequence Model Training

Attention Forcing for Sequence-to-sequence Model Training

Arxiv

7+阅读 · 2019年9月26日

Domain Representation for Knowledge Graph Embedding

Domain Representation for Knowledge Graph Embedding

Arxiv

14+阅读 · 2019年9月11日

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Arxiv

7+阅读 · 2019年2月3日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

Efficient Parallel Translating Embedding For Knowledge Graphs

Arxiv

9+阅读 · 2018年1月9日

VIP会员

文章信息

相关主题

分布式表示

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

深度学习自然语言处理概述，216页ppt，Jindřich Helcl

深度学习自然语言处理概述，216页ppt，Jindřich Helcl

专知会员服务

216+阅读 · 2020年4月26日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

临床自然语言处理中的嵌入综述，SECNLP: A survey of embeddings

临床自然语言处理中的嵌入综述，SECNLP: A survey of embeddings

专知会员服务

38+阅读 · 2020年3月23日

【图机器学习论文】综述：图嵌入技术、应用和性能（Graph Embedding Techniques, Applications, and Performance: A Survey）

【图机器学习论文】综述：图嵌入技术、应用和性能（Graph Embedding Techniques, Applications, and Performance: A Survey）

专知会员服务

73+阅读 · 2019年12月16日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

已删除

将门创投

5+阅读 · 2018年6月7日

Linguistically Regularized LSTMs for Sentiment Classification

Linguistically Regularized LSTMs for Sentiment Classification

黑龙江大学自然语言处理实验室

8+阅读 · 2018年5月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

自然语言处理 (三)　之　word embedding

自然语言处理 (三)　之　word embedding

DeepLearning中文论坛

19+阅读 · 2015年8月3日

相关论文

Geometric Lower Bounds for Distributed Parameter Estimation under Communication Constraints

Arxiv

0+阅读 · 2021年7月22日

Robust Nonparametric Regression with Deep Neural Networks

Arxiv

0+阅读 · 2021年7月21日

ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models with Huge Embedding Table

Arxiv

7+阅读 · 2021年4月17日

Coupled Layer-wise Graph Convolution for Transportation Demand Prediction

Arxiv

12+阅读 · 2020年12月15日

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

16+阅读 · 2020年3月30日

Attention Forcing for Sequence-to-sequence Model Training

Attention Forcing for Sequence-to-sequence Model Training

Arxiv

7+阅读 · 2019年9月26日

Domain Representation for Knowledge Graph Embedding

Domain Representation for Knowledge Graph Embedding

Arxiv

14+阅读 · 2019年9月11日

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Arxiv

7+阅读 · 2019年2月3日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

Efficient Parallel Translating Embedding For Knowledge Graphs

Arxiv

9+阅读 · 2018年1月9日

微信扫码咨询专知VIP会员