项目名称: 云环境中支持混合并行模式的科学工作流的执行优化
项目编号: No.61462076
项目类型: 地区科学基金项目
立项/批准年度: 2015
项目学科: 自动化技术、计算机技术
项目作者: 陈旺虎
作者单位: 西北师范大学
项目金额: 44万元
中文摘要: 科学工作流可集成、构造和协同分布异构的数据、服务和软件,其任务可同时属于数据和计算密集型,促使其选择云作为计算环境,并支持混合并行模式。针对云环境中支持混合并行模式的科学工作流,本申请以提高其执行效率,降低执行费用为目标,研究其执行优化涉及的关键问题。(1)提出一种数据并行任务的输入数据的分片方法,以提高多版本的工作流任务数据及其分片的复用率,并提出工作流任务的多版本数据集在云中的优化缓存策略。(2)建立适应云环境动态计价模式的工作流执行计划的生成方法和所需虚拟机的数量和类型的求解方法,以提高虚拟机资源的利用率,并形成科学工作流执行计划的评价理论。(3)提出混合并行模式科学工作流的动态优化方法,可实现多种并行模式的任务及工作流片段共享虚拟机集群及实例时的负载均衡,并通过允许任务弹性服从局部约束的策略以及当任务以一定概率违反该约束时的调整原则,支持虚拟机资源的动态伸缩,以保持资源的高利用率。
中文关键词: 科学工作流;云计算;数据并行;管道并行;执行优化
英文摘要: Scientific workflows can improve the automation of scientific processes with the capabilities of the integration, construction and coordination of heterogeneous distributed data, services and tools. Tasks in a scientific workflow may be either data-intensive or computation-intensive in many domains nowadays. Thus, scientific workflows using the cloud as their computation environment often mix multiple parallelization patterns. The proposal explores key approaches and theories to enable the optimization of the execution of scientific workflows. The proposed approaches and theories can improve the execution efficiency and reduce the financial cost of scientific workflows with mixed parallelization patterns in cloud. Contributions include: (1)A data caching policy is proposed based on the concept and approaches that can realize efficient task data sharing through the optimization of data splitting. (2)Approaches to the construction of the workflow execution plans and the estimation of the virtual machine amount and types are proposed, which can be adaptive to the dynamic cloud environment, especially to the dynamic charging policies in cloud. The approaches can also improve the utility of virtual machine resources. (3)An approach to the execution optimization of scientific workflows with mixed parallelization patterns is proposed, which can balance workloads of virtual clusters and instances shared by tasks or sub-workflows with various parallelization patterns, and dynamically scale virtual machine resources based on the measures that can permit a workflow task to violate its local constraints with a certain probability.
英文关键词: Scientific Workflow;Cloud Computing;Data Parallelization;Pipeline Parallelization;Execution Optimization