通过传承化,每秒解码数十亿整数 (Decoding billions of integers per second through vectorization) - 专知论文

会员服务 ·

0

向量化 · 解码 · 可约的 · Better · state-of-the-art ·

2021 年 1 月 30 日

Decoding billions of integers per second through vectorization

翻译：通过传承化,每秒解码数十亿整数

Daniel Lemire,Leonid Boytsov

from arxiv, For software, see https://github.com/lemire/FastPFor, For data, see http://boytsov.info/datasets/clueweb09gap/

In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.

翻译：在许多重要应用中,例如搜索引擎和关联数据库系统,数据以整数阵列的形式存储。编码,最重要的是,这些阵列解码耗用了相当长的CPU时间。因此,为降低压缩和降压的相关成本做出了巨大努力。特别是,研究人员利用了现代处理器和SIMD指示的超卡路里性质。然而,我们引入了一个名为SIMD-BP128的新矢量化计划,它比先前提议的矢量化方法有所改进。它的速率是桌面处理器(Varint-G8IU和PFOR)上以前最快的方案(Varint-G8IU和PFOR)的两倍。同时,SIMD-BP128为每个整数节省了2位。为了更好地压缩,我们提议了另一个新的矢量化计划(SIMD-FastPFOR),其压缩率在州级计划(Soint-8b)的10%之内,而在解码过程中速度为2倍。

0

相关内容

向量化

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

如何画出漂亮BERT模型图？这份10页PPT帮你快速搞定，来自Jimmy Lin

如何画出漂亮BERT模型图？这份10页PPT帮你快速搞定，来自Jimmy Lin

专知会员服务

87+阅读 · 2020年7月22日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

164+阅读 · 2020年6月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

专知

21+阅读 · 2018年6月18日

已删除

将门创投

3+阅读 · 2018年3月13日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Getting around the Halting Problem

Arxiv

0+阅读 · 2021年3月26日

The Complexity of Reachability in Affine Vector Addition Systems with States

Arxiv

0+阅读 · 2021年3月26日

Control Synthesis using Signal Temporal Logic Specifications with Integral and Derivative Predicates

Arxiv

0+阅读 · 2021年3月26日

Arbitrarily high-order conservative schemes for the generalized Korteweg-de Vries equation

Arxiv

0+阅读 · 2021年3月25日

A faster reduction of the dynamic time warping distance to the longest increasing subsequence length

Arxiv

0+阅读 · 2021年3月24日

Majorant series for the $N$-body problem

Arxiv

0+阅读 · 2021年3月23日

FuxiCTR: An Open Benchmark for Click-Through Rate Prediction

Arxiv

8+阅读 · 2020年9月12日

All Word Embeddings from One Embedding

Arxiv

4+阅读 · 2020年5月25日

Conditional BERT Contextual Augmentation

Conditional BERT Contextual Augmentation

Arxiv

8+阅读 · 2018年12月17日

LPCNet: Improving Neural Speech Synthesis Through Linear Prediction

Arxiv

3+阅读 · 2018年10月28日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

如何画出漂亮BERT模型图？这份10页PPT帮你快速搞定，来自Jimmy Lin

如何画出漂亮BERT模型图？这份10页PPT帮你快速搞定，来自Jimmy Lin

专知会员服务

87+阅读 · 2020年7月22日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

164+阅读 · 2020年6月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

中文资讯 | 洛克希德·马丁获美军10亿美元高超声速导弹合同

《欧盟及全球军用无人机系统：型号、性能与监管框架》最新报告

中文版3600字 | 人工智能对指挥控制系统的加速效应及其陆军实施启示

《复杂环境下的军事情报革新：运用复杂性科学与实战研究成果转型传统情报工作》最新322页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

专知

21+阅读 · 2018年6月18日

已删除

将门创投

3+阅读 · 2018年3月13日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Getting around the Halting Problem

Arxiv

0+阅读 · 2021年3月26日

The Complexity of Reachability in Affine Vector Addition Systems with States

Arxiv

0+阅读 · 2021年3月26日

Control Synthesis using Signal Temporal Logic Specifications with Integral and Derivative Predicates

Arxiv

0+阅读 · 2021年3月26日

Arbitrarily high-order conservative schemes for the generalized Korteweg-de Vries equation

Arxiv

0+阅读 · 2021年3月25日

A faster reduction of the dynamic time warping distance to the longest increasing subsequence length

Arxiv

0+阅读 · 2021年3月24日

Majorant series for the $N$-body problem

Arxiv

0+阅读 · 2021年3月23日

FuxiCTR: An Open Benchmark for Click-Through Rate Prediction

Arxiv

8+阅读 · 2020年9月12日

All Word Embeddings from One Embedding

Arxiv

4+阅读 · 2020年5月25日

Conditional BERT Contextual Augmentation

Conditional BERT Contextual Augmentation

Arxiv

8+阅读 · 2018年12月17日

LPCNet: Improving Neural Speech Synthesis Through Linear Prediction

Arxiv

3+阅读 · 2018年10月28日

微信扫码咨询专知VIP会员