Elasticity is one important feature in modern cloud computing systems and can result in computation failure or significantly increase computing time. Such elasticity means that virtual machines over the cloud can be preempted under a short notice (e.g., hours or minutes) if a high-priority job appears; on the other hand, new virtual machines may become available over time to compensate the computing resources. Coded Storage Elastic Computing (CSEC) introduced by Yang et al. in 2018 is an effective and efficient approach to overcome the elasticity and it costs relatively less storage and computation load. However, one of the limitations of the CSEC is that it may only be applied to certain types of computations (e.g., linear) and may be challenging to be applied to more involved computations because the coded data storage and approximation are often needed. Hence, it may be preferred to use uncoded storage by directly copying data into the virtual machines. In addition, based on our own measurement, virtual machines on Amazon EC2 clusters often have heterogeneous computation speed even if they have exactly the same configurations (e.g., CPU, RAM, I/O cost). In this paper, we introduce a new optimization framework on Uncoded Storage Elastic Computing (USEC) systems with heterogeneous computing speed to minimize the overall computation time. Under this framework, we propose optimal solutions of USEC systems with or without straggler tolerance using different storage placements. Our proposed algorithms are evaluated using power iteration applications on Amazon EC2.
翻译:在现代云计算系统中,弹性性是一个重要特征,可以导致计算失败或大幅增加计算时间。这种弹性意味着,如果出现高优先任务,云上的虚拟机器可以在短时间内(如:小时或分钟)就能在短通知下先发制人;另一方面,新的虚拟机器可能会随着时间而出现,以补偿计算资源。 Yang等人在2018年推出的编码存储弹性计算(CSEC)是一种有效且高效的方法,可以克服弹性,而且成本相对较低。然而,CSEC的局限性之一是,它只能在某些类型的计算(如:线性)中(如:小时或分钟)应用,而且由于经常需要编码数据储存和近似近度,因此,也许更愿意使用未经编码的储存,直接将数据复制到虚拟机器中。此外,根据我们提议的测量,亚马逊EC2集群的虚拟机器往往具有混合的计算速度,即使它们具有完全相同的配置(例如,CPU、EC、EUR、I/O)应用程序可能具有挑战性地应用到更多涉及的计算方法的计算方法,因为通常需要数据存储成本。因此,我们采用不使用这种格式的系统。