Matrix computations are widely used in increasing sizes and complexity in the fields of scientific computing and engineering. But with current matrix language implementations it is a challenging task to fully utilize Cloud compute capacities. We present a new framework called cloud matrix machine, which extends the Julia high-performance compute language to automatically parallelize matrix computations for the cloud. With this framework, users are shielded from the complexity of explicitly parallel computations. Instead, users employ a novel matrix data type with lazy evaluation semantics to facilitate implicit parallelization of matrix operations. A combination of offline profiling, dynamic simulation, and scheduling are utilized to select optimal tile sizes, schedule, and execute matrix operations. All computations occur in the Cloud, with minimal user intervention. We conducted an extensive experimental evaluation on a set of eight benchmarks using up to eight nodes (288 vCPUs) in the AWS public cloud. Our framework achieved speedups of up to a factor of 3.49x, within 20.5% of the theoretically possible maximum speedup.
翻译:矩阵计算被广泛用于科学计算和工程领域的日益大小和复杂性。 但是,由于当前矩阵语言的运用,充分利用云计算能力是一项艰巨的任务。 我们提出了一个称为云字矩阵机的新框架, 将朱丽亚高性能计算语言扩展至云的自动平行矩阵计算。 有了这个框架, 用户可以避免明确平行计算的复杂性。 相反, 用户使用带有懒惰评价语义的新矩阵数据类型, 以便利矩阵操作的隐性平行化。 利用离线剖析、 动态模拟和时间安排组合, 来选择最佳的瓷块大小、 时间表和进行矩阵操作。 所有计算都发生在云中, 用户极少介入。 我们对AWS 公共云中8个节点( 288 vCPUs) 的一套八项基准进行了广泛的实验性评估。 我们的框架在理论可能达到的最大速度20.5%的范围内, 实现了3.49x的加速系数。