Apache AsterixDB的基于 LSM 的图示缩缩框架 (An LSM-based Tuple Compaction Framework for Apache AsterixDB) - 专知论文

会员服务 ·

0

Storage · Performer · Apache · CouchBase · 推断 ·

2019 年 10 月 17 日

An LSM-based Tuple Compaction Framework for Apache AsterixDB

翻译：Apache AsterixDB的基于 LSM 的图示缩缩框架

Wail Y. Alkowaileet,Sattam Alsubaiee,Michael J. Carey

from arxiv, 16 pages, 25 figures

Document database systems store self-describing records, such as JSON, "as-is" without requiring the users to pre-define a schema. This provides users with the flexibility to change the structure of incoming records without worrying about taking the system offline or hindering the performance of currently running queries. However, the flexibility of such systems does not come without a cost. The large amount of redundancy in the stored records can introduce an unnecessary storage overhead and impact query performance. Our focus in this paper is to address the storage overhead issue by introducing a tuple compactor framework that infers and extracts the schema from self-describing records during the data ingestion process. As many prominent document store systems, such as MongoDB and Couchbase, adopt Log Structured Merge (LSM) trees in their storage engines, our framework exploits LSM lifecycle events to piggyback the schema inference and extraction operations. We have implemented and empirically evaluated our approach to measure its impact on storage, data ingestion, and query performance in the context of Apache AsterixDB.

翻译：文档数据库系统存储自我描述记录,如JSON,“As-is” 等文档数据库系统存储自我描述记录,而不要求用户事先确定一个计划。这为用户提供了改变输入记录结构的灵活性,而不必担心系统脱线或妨碍当前查询的运行。然而,这种系统的灵活性并非没有成本。存储记录中的大量冗余可能带来不必要的存储间接费用和影响查询性能。我们本文件的重点是通过引入一个图普式压缩机框架来解决存储间接费用问题,该框架在数据摄入过程中从自我输入记录中推断和提取 schema。许多突出的文件存储系统,如MongoDB和Couchbase, 在其存储引擎中采用log结构Merge(LSM)树,我们的框架利用LSM生命周期事件来利用SM的预估测和提取操作。我们实施并用经验评估了我们测量其在阿帕契AsterixDB范围内储存、数据摄取和查询性能的方法。

1

相关内容

Storage

Storage

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

TensorFlow官方开源的神经结构学习（Neural Structured Learning）库

TensorFlow官方开源的神经结构学习（Neural Structured Learning）库

专知会员服务

18+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Arxiv

5+阅读 · 2019年6月18日

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Arxiv

4+阅读 · 2018年11月26日

Energy-Based Hindsight Experience Prioritization

Arxiv

3+阅读 · 2018年10月8日

Knowledge Based Machine Reading Comprehension

Knowledge Based Machine Reading Comprehension

Arxiv

4+阅读 · 2018年9月12日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

The Web as a Knowledge-base for Answering Complex Questions

Arxiv

5+阅读 · 2018年3月18日

Image Captioning using Deep Neural Architectures

Arxiv

20+阅读 · 2018年1月17日

Multilingual Topic Models

Arxiv

3+阅读 · 2017年12月18日

A Big Data Analysis Framework Using Apache Spark and Deep Learning

Arxiv

3+阅读 · 2017年11月25日

VIP会员

文章信息

相关主题

相关VIP内容

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

TensorFlow官方开源的神经结构学习（Neural Structured Learning）库

TensorFlow官方开源的神经结构学习（Neural Structured Learning）库

专知会员服务

18+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Arxiv

5+阅读 · 2019年6月18日

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Arxiv

4+阅读 · 2018年11月26日

Energy-Based Hindsight Experience Prioritization

Arxiv

3+阅读 · 2018年10月8日

Knowledge Based Machine Reading Comprehension

Knowledge Based Machine Reading Comprehension

Arxiv

4+阅读 · 2018年9月12日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

The Web as a Knowledge-base for Answering Complex Questions

Arxiv

5+阅读 · 2018年3月18日

Image Captioning using Deep Neural Architectures

Arxiv

20+阅读 · 2018年1月17日

Multilingual Topic Models

Arxiv

3+阅读 · 2017年12月18日

A Big Data Analysis Framework Using Apache Spark and Deep Learning

Arxiv

3+阅读 · 2017年11月25日

微信扫码咨询专知VIP会员