This chapter introduces the state-of-the-art in the emerging area of combining High Performance Computing (HPC) with Big Data Analysis. To understand the new area, the chapter first surveys the existing approaches to integrating HPC with Big Data. Next, the chapter introduces several optimization solutions that focus on how to minimize the data transfer time from computation-intensive applications to analysis-intensive applications as well as minimizing the end-to-end time-to-solution. The solutions utilize SDN to adaptively use both high speed interconnect network and high performance parallel file systems to optimize the application performance. A computational framework called DataBroker is designed and developed to enable a tight integration of HPC with data analysis. Multiple types of experiments have been conducted to show different performance issues in both message passing and parallel file systems and to verify the effectiveness of the proposed research approaches.
翻译:本章介绍将高性能计算(HPC)与大数据分析相结合这一新兴领域的最新技术。为了解新领域,本章首先调查了将HPC与大数据相结合的现有方法。接下来,本章介绍了若干优化解决方案,侧重于如何最大限度地减少数据传输时间,从计算密集型应用到分析密集型应用,以及最大限度地减少端到端的时间到解答。解决方案利用SDN在适应性上使用高速互联网络和高性能平行文件系统优化应用性能。设计并开发了一个称为DataBroker的计算框架,使HPC与数据分析紧密结合。已经进行了多种类型的实验,以显示信息传递和平行文档系统中的不同性能问题,并核实拟议研究方法的有效性。