Big Data is considered proprietary asset of companies, organizations, and even nations. Turning big data into real treasure requires the support of big data systems. A variety of commercial and open source products have been unleashed for big data storage and processing. While big data users are facing the choice of which system best suits their needs, big data system developers are facing the question of how to evaluate their systems with regard to general big data processing needs. System benchmarking is the classic way of meeting the above demands. However, existent big data benchmarks either fail to represent the variety of big data processing requirements, or target only one specific platform, e.g. Hadoop. In this paper, with our industrial partners, we present BigOP, an end-to-end system benchmarking framework, featuring the abstraction of representative Operation sets, workload Patterns, and prescribed tests. BigOP is part of an open-source big data benchmarking project, BigDataBench (available at http://prof.ict.ac.cn/BigDataBench). BigOP's abstraction model not only guides the development of BigDataBench, but also enables automatic generation of tests with comprehensive workloads. We illustrate the feasibility of BigOP by implementing an automatic test generation tool and benchmarking against three widely used big data processing systems, i.e. Hadoop, Spark and MySQL Cluster. Three tests targeting three different application scenarios are prescribed. The tests involve relational data, text data and graph data, as well as all operations and workload patterns. We report results following test specifications.
翻译:将大数据转化为真正的宝藏需要大数据系统的支持。 各种商业和开放源码产品已经为大数据存储和处理推出。 虽然大数据用户正面临选择哪个系统最适合其需要的选择, 大数据系统开发者正面临如何根据一般大数据处理需求评价其系统的问题。 系统基准是满足上述要求的经典方法。 然而, 现有的大数据基准要么不能代表大数据处理要求的多样性, 要么只针对一个特定平台,例如哈多普。 在本文中,我们与我们的工业伙伴一起,提出了BigOP,一个端对端系统基准框架,其特点是具有代表性的成套操作、工作量模式和规定的测试。 大数据系统是一个开源大数据基准项目的一部分,BigDataBench(见http://prof.ict.ac.cn/BigDataBeint)。 BigOP的所有抽象模型不仅指导了大数据处理要求的开发,而且能够根据BigDataBench, 并且根据BigBebook 3 标准进行自动生成数据测试。我们用了一个测试工具来测试。