As resource estimation for jobs is difficult, users often overestimate their requirements. Both commercial clouds and academic campus clusters suffer from low resource utilization and long wait times as the resource estimates for jobs, provided by users, is inaccurate. We present an approach to statistically estimate the actual resource requirement of a job in a Little cluster before the run in a Big cluster. The initial estimation on the little cluster gives us a view of how much actual resources a job requires. This initial estimate allows us to accurately allocate resources for the pending jobs in the queue and thereby improve throughput and resource utilization. In our experiments, we determined resource utilization estimates with an average accuracy of 90% for memory and 94% for CPU, while we make better utilization of memory by an average of 22% and CPU by 53%, compared to the default job submission methods on Apache Aurora and Apache Mesos.
翻译:由于对工作的资源估算很困难,用户往往高估了他们的需求。商业云层和学术校园群落都面临资源利用率低和长时间等待时间的问题,因为用户提供的工作资源估算是不准确的。 我们提出了一个方法,在大群群集运行之前从统计角度估算在小群群中工作的实际资源需求。 对小群群的初步估算让我们了解工作需要多少实际资源。 这一初步估算使我们能够准确分配资源用于排队中的待决工作,从而改善吞吐量和资源的利用。 在我们的实验中,我们确定了资源利用估计数,平均精确度为记忆90%,CPU94%,同时我们比阿帕奇奥克拉和阿帕奇梅索斯的默认工作提交方法更好地使用记忆,平均为22%,CPU53%。