In the recent past, characterizing workloads has been attempted to gain a foothold in the emerging serverless cloud market, especially in the large production cloud clusters of Google, AWS, and so forth. While analyzing and characterizing real workloads from a large production cloud cluster benefits cloud providers, researchers, and daily users, analyzing the workload traces of these clusters has been an arduous task due to the heterogeneous nature of data. This article proposes a scalable infrastructure based on Google's dataproc for analyzing the workload traces of cloud environments. We evaluated the functioning of the proposed infrastructure using the workload traces of Google cloud cluster-usage-traces-v3. We perform the workload characterization on this dataset, focusing on the heterogeneity of the workload, the variations in job durations, aspects of resources consumption, and the overall availability of resources provided by the cluster. The findings reported in the paper will be beneficial for cloud infrastructure providers and users while managing the cloud computing resources, especially serverless platforms.
翻译:近来,在新兴的无服务器云市场,特别是谷歌、AWS等大型生产云群云层群中,人们试图将工作量定性为一个立足点。 在分析和描述大型生产云层群对云层提供者、研究人员和日常用户的实际工作量的同时,分析这些群群的工作量痕迹是一项艰巨的任务,因为数据性质多种多样。本篇文章提议根据谷歌的数据流程建立一个可扩展的基础设施,用于分析云层环境的工作量痕迹。我们利用谷歌云群集-使用-跟踪-V3的工作量微迹评估了拟议基础设施的运作情况。我们对这一数据集进行了工作量定性,重点是工作量的多样性、工作期限的变化、资源消耗的方方面面,以及集群所提供资源的总体可得性。文件中报告的调查结果将有利于云层基础设施提供者和用户,同时管理云计算资源,特别是无服务器平台。