多层流分析分析大数据湖 (A Big Data Lake for Multilevel Streaming Analytics) - 专知论文

会员服务 ·

0

流 · HDFS · Hadoop · Use Case · CASES ·

2020 年 9 月 25 日

A Big Data Lake for Multilevel Streaming Analytics

翻译：多层流分析分析大数据湖

Ruoran Liu,Haruna Isah,Farhana Zulkernine

from arxiv, 6 pages

Large organizations are seeking to create new architectures and scalable platforms to effectively handle data management challenges due to the explosive nature of data rarely seen in the past. These data management challenges are largely posed by the availability of streaming data at high velocity from various sources in multiple formats. The changes in data paradigm have led to the emergence of new data analytics and management architecture. This paper focuses on storing high volume, velocity and variety data in the raw formats in a data storage architecture called a data lake. First, we present our study on the limitations of traditional data warehouses in handling recent changes in data paradigms. We discuss and compare different open source and commercial platforms that can be used to develop a data lake. We then describe our end-to-end data lake design and implementation approach using the Hadoop Distributed File System (HDFS) on the Hadoop Data Platform (HDP). Finally, we present a real-world data lake development use case for data stream ingestion, staging, and multilevel streaming analytics which combines structured and unstructured data. This study can serve as a guide for individuals or organizations planning to implement a data lake solution for their use cases.

翻译：大型组织正在寻求建立新的结构和可扩展平台,以便有效地处理由于过去很少看到的数据爆炸性造成的数据管理挑战。这些数据管理挑战主要来自各种来源以多种格式提供的高速流数据。数据范式的变化导致了新的数据分析和管理结构的出现。本文侧重于在数据储存结构中以原始格式储存大量、速度和种类的数据。首先,我们介绍了关于传统数据仓库在处理数据模式最近变化时的局限性的研究。我们讨论并比较了可用于开发数据湖的不同开放源和商业平台。然后,我们描述了我们利用Hadoop分配文件系统(HDFS)在Hadoop数据平台上进行的端对端数据湖设计和实施方法。最后,我们介绍了数据流摄入、中位和多层流分析案例,这些案例将结构化和无结构化数据组合在一起。本研究可以指导计划使用数据湖解决方案使用案例的个人或组织。

0

相关内容

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【实用书】流数据处理，Streaming Data，219页pdf

【实用书】流数据处理，Streaming Data，219页pdf

专知会员服务

77+阅读 · 2020年4月24日

【IJCAI 2019 | tutorial】大数据中的小数据挑战Small Data Challenges in Big Data Era ，华为|Guo-Jun Qi，柯达|Jiebo Luo

【IJCAI 2019 | tutorial】大数据中的小数据挑战Small Data Challenges in Big Data Era ，华为|Guo-Jun Qi，柯达|Jiebo Luo

专知会员服务

30+阅读 · 2019年11月30日

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

专知会员服务

28+阅读 · 2019年11月14日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【VLDB2019 tutorial】数据管理：机遇与挑战 Data Lake Management: Challenges and Opportunities，多伦多大学|Fatemeh Nargesian，微软|祝尔康

【VLDB2019 tutorial】数据管理：机遇与挑战 Data Lake Management: Challenges and Opportunities，多伦多大学|Fatemeh Nargesian，微软|祝尔康

专知会员服务

10+阅读 · 2019年8月27日

CCF推荐 | 国际会议信息6条

CCF推荐 | 国际会议信息6条

Call4Papers

9+阅读 · 2019年8月13日

人工智能 | ACCV 2020等国际会议信息5条

人工智能 | ACCV 2020等国际会议信息5条

Call4Papers

6+阅读 · 2019年6月21日

CCF推荐 | 国际会议信息10条

CCF推荐 | 国际会议信息10条

Call4Papers

8+阅读 · 2019年5月27日

人工智能 | CCF推荐期刊专刊约稿信息6条

人工智能 | CCF推荐期刊专刊约稿信息6条

Call4Papers

5+阅读 · 2019年2月18日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

人工智能 | 国际会议信息10条

人工智能 | 国际会议信息10条

Call4Papers

5+阅读 · 2018年12月18日

计算机类 | LICS 2019等国际会议信息7条

计算机类 | LICS 2019等国际会议信息7条

Call4Papers

3+阅读 · 2018年12月17日

A Survey on the Evolution of Stream Processing Systems

A Survey on the Evolution of Stream Processing Systems

Arxiv

9+阅读 · 2020年8月3日

A Survey on Trajectory Data Management, Analytics, and Learning

A Survey on Trajectory Data Management, Analytics, and Learning

Arxiv

16+阅读 · 2020年3月25日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

A Survey on Edge Computing Systems and Tools

Arxiv

35+阅读 · 2019年11月7日

Mobile big data analysis with machine learning

Mobile big data analysis with machine learning

Arxiv

6+阅读 · 2018年8月2日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

A Big Data Analysis Framework Using Apache Spark and Deep Learning

Arxiv

3+阅读 · 2017年11月25日

Graph Summarization: A Survey

Arxiv

5+阅读 · 2017年4月12日

Active Learning from Positive and Unlabeled Data

Arxiv

3+阅读 · 2016年2月24日

Big Data: Understanding Big Data

Arxiv

6+阅读 · 2016年1月15日

VIP会员

文章信息

相关主题

相关VIP内容

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【实用书】流数据处理，Streaming Data，219页pdf

【实用书】流数据处理，Streaming Data，219页pdf

专知会员服务

77+阅读 · 2020年4月24日

【IJCAI 2019 | tutorial】大数据中的小数据挑战Small Data Challenges in Big Data Era ，华为|Guo-Jun Qi，柯达|Jiebo Luo

【IJCAI 2019 | tutorial】大数据中的小数据挑战Small Data Challenges in Big Data Era ，华为|Guo-Jun Qi，柯达|Jiebo Luo

专知会员服务

30+阅读 · 2019年11月30日

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

专知会员服务

28+阅读 · 2019年11月14日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【VLDB2019 tutorial】数据管理：机遇与挑战 Data Lake Management: Challenges and Opportunities，多伦多大学|Fatemeh Nargesian，微软|祝尔康

【VLDB2019 tutorial】数据管理：机遇与挑战 Data Lake Management: Challenges and Opportunities，多伦多大学|Fatemeh Nargesian，微软|祝尔康

专知会员服务

10+阅读 · 2019年8月27日

热门VIP内容

开通专知VIP会员享更多权益服务

《商用大语言模型的升级风险管理：国家安全运用》

【伯克利博士论文】通过真实世界实践赋能机器人自主性

《从装备到文化：美陆军技术素养建设启示录》最新报告

人工智能安全治理白皮书（2025）

相关资讯

CCF推荐 | 国际会议信息6条

CCF推荐 | 国际会议信息6条

Call4Papers

9+阅读 · 2019年8月13日

人工智能 | ACCV 2020等国际会议信息5条

人工智能 | ACCV 2020等国际会议信息5条

Call4Papers

6+阅读 · 2019年6月21日

CCF推荐 | 国际会议信息10条

CCF推荐 | 国际会议信息10条

Call4Papers

8+阅读 · 2019年5月27日

人工智能 | CCF推荐期刊专刊约稿信息6条

人工智能 | CCF推荐期刊专刊约稿信息6条

Call4Papers

5+阅读 · 2019年2月18日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

人工智能 | 国际会议信息10条

人工智能 | 国际会议信息10条

Call4Papers

5+阅读 · 2018年12月18日

计算机类 | LICS 2019等国际会议信息7条

计算机类 | LICS 2019等国际会议信息7条

Call4Papers

3+阅读 · 2018年12月17日

相关论文

A Survey on the Evolution of Stream Processing Systems

A Survey on the Evolution of Stream Processing Systems

Arxiv

9+阅读 · 2020年8月3日

A Survey on Trajectory Data Management, Analytics, and Learning

A Survey on Trajectory Data Management, Analytics, and Learning

Arxiv

16+阅读 · 2020年3月25日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

A Survey on Edge Computing Systems and Tools

Arxiv

35+阅读 · 2019年11月7日

Mobile big data analysis with machine learning

Mobile big data analysis with machine learning

Arxiv

6+阅读 · 2018年8月2日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

A Big Data Analysis Framework Using Apache Spark and Deep Learning

Arxiv

3+阅读 · 2017年11月25日

Graph Summarization: A Survey

Arxiv

5+阅读 · 2017年4月12日

Active Learning from Positive and Unlabeled Data

Arxiv

3+阅读 · 2016年2月24日

Big Data: Understanding Big Data

Arxiv

6+阅读 · 2016年1月15日

微信扫码咨询专知VIP会员