BigOP:生成综合大数据工作量,作为基准框架 (BigOP: Generating Comprehensive Big Data Workloads as a Benchmarking Framework)

Big Data is considered proprietary asset of companies, organizations, and even nations. Turning big data into real treasure requires the support of big data systems. A variety of commercial and open source products have been unleashed for big data storage and processing. While big data users are facing the choice of which system best suits their needs, big data system developers are facing the question of how to evaluate their systems with regard to general big data processing needs. System benchmarking is the classic way of meeting the above demands. However, existent big data benchmarks either fail to represent the variety of big data processing requirements, or target only one specific platform, e.g. Hadoop. In this paper, with our industrial partners, we present BigOP, an end-to-end system benchmarking framework, featuring the abstraction of representative Operation sets, workload Patterns, and prescribed tests. BigOP is part of an open-source big data benchmarking project, BigDataBench. BigOP's abstraction model not only guides the development of BigDataBench, but also enables automatic generation of tests with comprehensive workloads. We illustrate the feasibility of BigOP by implementing an automatic test generation tool and benchmarking against three widely used big data processing systems, i.e. Hadoop, Spark and MySQL Cluster. Three tests targeting three different application scenarios are prescribed. The tests involve relational data, text data and graph data, as well as all operations and workload patterns. We report results following test specifications.

翻译：将大数据转换成真正的宝藏需要大数据系统的支持。各种商业和开放源码产品已经为大数据存储和处理而推出。虽然大数据用户正面临选择哪个系统最适合其需要,但大数据系统开发者正面临如何根据一般大数据处理需求评价其系统的问题。系统基准是满足上述需求的典型方法。然而,现有的大数据基准要么不能代表大数据处理要求的多样性,要么只针对一个特定平台,例如哈道普。在本文件中,我们与我们的工业伙伴一起介绍了BigOP,一个端对端系统基准框架,其特点是具有代表性的成套业务、工作量模式和规定的测试。大数据系统是开源大数据基准项目BigDataBechnch. BigOP的抽象模型的一部分,它不仅指导BigDataBench的开发,而且能够自动生成具有全面工作量的测试。我们通过实施自动测试生成工具的可行性,并对照三种广泛使用的数据规格进行基准测试。

相关内容

大数据

关注 270

从各种各样类型的数据中，快速获得有价值信息的能力，就是大数据技术。明白这一点至关重要，也正是这一点促使该技术具备走向众多企业的潜力。大数据的4个“V”，或者说特点有四个层面：第一，数据体量巨大。从TB级别，跃升到PB级别；第二，数据类型繁多。前文提到的网络日志、视频、图片、地理位置信息等等。第三，价值密度低。以视频为例，连续不间断监控过程中，可能有用的数据仅仅有一两秒。第四，处理速度快。

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

专知会员服务

24+阅读 · 2020年3月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日