With the prevalence of big-data-driven applications, such as face recognition on smartphones and tailored recommendations from Google Ads, we are on the road to a lifestyle with significantly more intelligence than ever before. Various neural network powered models are running at the back end of their intelligence to enable quick responses to users. Supporting those models requires lots of cloud-based computational resources, e.g., CPUs and GPUs. The cloud providers charge their clients by the amount of resources that they occupy. Clients have to balance the budget and quality of experiences (e.g., response time). The budget leans on individual business owners, and the required Quality of Experience (QoE) depends on usage scenarios of different applications. For instance, an autonomous vehicle requires an real-time response, but unlocking your smartphone can tolerate delays. However, cloud providers fail to offer a QoE-based option to their clients. In this paper, we propose DQoES, differentiated quality of experience scheduler for deep learning inferences. DQoES accepts clients' specifications on targeted QoEs, and dynamically adjusts resources to approach their targets. Through the extensive cloud-based experiments, DQoES demonstrates that it can schedule multiple concurrent jobs with respect to various QoEs and achieve up to 8x times more satisfied models when compared to the existing system
翻译:由大数据驱动的应用程序(如智能手机的面孔识别和谷歌Ads的特制建议等)十分普遍,因此,我们走上了通往比以往更加智慧的生活方式的道路。各种神经网络动力模型正在其智能的后端运行,以便能够对用户作出快速反应。支持这些模型需要大量的基于云的计算资源,如CPU和GPU。云源提供商用他们占用的资源量向客户收取费用。客户必须平衡预算与经验质量(如响应时间),预算依赖个体业主,所需的经验质量取决于不同应用程序的使用情况。例如,自主的飞行器需要实时反应,但解开你的智能手机可以容忍延误。然而,云源提供商无法向客户提供基于QE的选项。在本文中,我们建议DQES,不同的经验排程计质量,以深入地学习(如回应时间)。DQES接受客户对有针对性的QE的规格,所需经验质量取决于不同应用程序的使用情况。例如,自主的车辆需要实时反应,但解开你的智能手机可以容忍延误。但是,云源无法向客户提供基于多种时间的测试。