Deep learning model inference is a key service in many businesses and scientific discovery processes. This paper introduces RIBBON, a novel deep learning inference serving system that meets two competing objectives: quality-of-service (QoS) target and cost-effectiveness. The key idea behind RIBBON is to intelligently employ a diverse set of cloud computing instances (heterogeneous instances) to meet the QoS target and maximize cost savings. RIBBON devises a Bayesian Optimization-driven strategy that helps users build the optimal set of heterogeneous instances for their model inference service needs on cloud computing platforms -- and, RIBBON demonstrates its superiority over existing approaches of inference serving systems using homogeneous instance pools. RIBBON saves up to 16% of the inference service cost for different learning models including emerging deep learning recommender system models and drug-discovery enabling models.
翻译:深学习模型推断是许多企业和科学发现过程中的一个关键服务。本文介绍RIBBON,这是一个符合两个相互竞争的目标:服务质量(QOS)目标和成本效益的新型深学习推断服务系统。RIBBON背后的关键思想是明智地运用多种云计算实例(异质实例),以实现QOS目标并最大限度地节约成本。RIBBON设计了巴伊西亚最佳优化驱动战略,帮助用户建立一套最佳的多样化实例,以满足其在云计算平台上的模型推断服务需求 -- -- RIBBON展示了它优于使用同质实例集合的现有推论服务系统。RIBBON节省了不同学习模型的16 % 的推断服务成本,包括新兴的深学习推荐系统模型和药物易感染扶持模型。