项目名称: 支持多种计算与数据共享的编程框架研究
项目编号: No.61303060
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 王鹏
作者单位: 中国科学院信息工程研究所
项目金额: 23万元
中文摘要: 随着大数据处理编程框架的不断丰富,在一个平台内支持多种类型的计算成为发展趋势,构建这样的大数据处理平台面临一系列的挑战。目前的框架管理器仅解决了多种框架共享集群资源的问题,但编程框架缺乏相应的机制,支持在一个应用程序内灵活地使用多种计算,以及在计算之间高效地共享中间数据。为解决上述问题,本项目研究一种串行机制及其程序结构,能在一个框架内支持常见的DAG和BSP计算;探讨一种基于内存数据集的共享机制,允许多种计算之间通过接口访问中间结果;提出一种系统架构与实现方法,能够同时支持这两种机制,并通过扩展已有的Transformer系统,对相关技术进行验证。本项目对研究新型编程框架有重要的学术价值;对数据中心大数据处理平台的研发有重要的指导意义。
中文关键词: 大数据;编程框架;混合编程;;
英文摘要: With the emergence of various domain-specific frameworks, it has become a trend for a powerful data processing platform to support multiple frameworks. To build such a unified platform, it is faced with a series of challenge. The framework manager can host a diverse of frameworks for resource sharing in a cluster. However, the framework lacks the built-in support for combining various computations and online data sharing. The project aims to solve these problems. We investigate a sequential mechanism and program structue for combining two widely-used computations (i.e.,DAG and BSP) in the same application program. We explore a distributed in-memory data sharing approach, allowing for access and mutate shared intermediate state via a common inferface. We present the system design and implementation for supporting the two mechansisms, and evaluate the solutions by extending our Transformer system. The study on this subject not only has great academic value on new programming frameworks, but also has directive significance for the software infrastructure development in data center.
英文关键词: Big data;Programming framework;Hybrid programming;;