With the growing amount of data, data processing workloads and the management of their resource usage becomes increasingly important. Since managing a dedicated infrastructure is in many situations infeasible or uneconomical, users progressively execute their respective workloads in the cloud. As the configuration of workloads and resources is often challenging, various methods have been proposed that either quickly profile towards a good configuration or determine one based on data from previous runs. Still, performance data to train such methods is often lacking and must be costly collected. In this paper, we propose a collaborative approach for sharing anonymized workload execution traces among users, mining them for general patterns, and exploiting clusters of historical workloads for future optimizations. We evaluate our prototype implementation for mining workload execution graphs on a publicly available trace dataset and demonstrate the predictive value of workload clusters determined through traces only.
翻译:随着数据数量的不断增加,数据处理工作量及其资源使用管理变得日益重要。由于管理专用基础设施在许多情况下是不可行或不经济的,用户在云层中逐步执行各自的工作量。由于工作量和资源配置往往具有挑战性,因此建议采用各种方法,或者快速向良好配置看,或者根据以往运行的数据确定一个配置。然而,培训这些方法的业绩数据往往缺乏,而且必须收集昂贵。在本文件中,我们提议采取协作办法,在用户之间分享匿名的工作量执行记录,为一般模式挖掘这些记录,利用历史工作量群集,以便今后优化。我们评估我们在公开提供的追踪数据集上采矿工作量执行图的原型实施情况,并显示仅通过跟踪确定的工作量组的预测值。