The unprecedented growth of data volumes has caused traditional approaches to computing to be re-evaluated. This has started a transition towards the use of very large-scale clusters of commodity hardware and has given rise to the development of many new languages and paradigms for data processing and analysis. In this paper, we propose a compiler technology-based alternative to the development of many different Big Data application infrastructures. Key to this approach is the development of a single intermediate representation that enables the integration of compiler optimization and query optimization, and the re-use of many traditional compiler techniques for parallelization such as data distribution and loop scheduling. We show how the single intermediate can act as a generic intermediate for Big Data languages by mapping SQL and MapReduce onto this intermediate.
翻译:数据量的空前增长导致对传统的计算方法进行重新评价,从而开始向使用大规模商品硬件集群的过渡,并导致开发许多新的数据处理和分析语言和模式,在本文件中,我们提出一个基于技术的汇编者替代发展许多不同的大数据应用基础设施的办法,这种方法的关键是开发一个单一的中间代号,能够将汇编者优化和查询优化结合起来,以及重新使用许多传统的汇编技术,以平行化,例如数据分配和循环列表。我们展示了单一中间体如何通过将SQL和MapRewe在这个中间体上绘制SQL和Mapeduce,作为大数据语言的通用中间体。