Deep-learning-based intelligent services have become prevalent in cyber-physical applications including smart cities and health-care. Deploying deep-learning-based intelligence near the end-user enhances privacy protection, responsiveness, and reliability. Resource-constrained end-devices must be carefully managed in order to meet the latency and energy requirements of computationally-intensive deep learning services. Collaborative end-edge-cloud computing for deep learning provides a range of performance and efficiency that can address application requirements through computation offloading. The decision to offload computation is a communication-computation co-optimization problem that varies with both system parameters (e.g., network condition) and workload characteristics (e.g., inputs). On the other hand, deep learning model optimization provides another source of tradeoff between latency and model accuracy. An end-to-end decision-making solution that considers such computation-communication problem is required to synergistically find the optimal offloading policy and model for deep learning services. To this end, we propose a reinforcement-learning-based computation offloading solution that learns optimal offloading policy considering deep learning model selection techniques to minimize response time while providing sufficient accuracy. We demonstrate the effectiveness of our solution for edge devices in an end-edge-cloud system and evaluate with a real-setup implementation using multiple AWS and ARM core configurations. Our solution provides 35% speedup in the average response time compared to the state-of-the-art with less than 0.9% accuracy reduction, demonstrating the promise of our online learning framework for orchestrating DL inference in end-edge-cloud systems.
翻译:在网络物理应用中,包括智能城市和保健,基于深层学习的智能服务已变得盛行。在终端用户附近部署基于深层学习的智能可以加强隐私保护、反应能力和可靠性。资源限制的终端设备必须谨慎管理,以满足计算密集深层学习服务的潜值和能量要求。为深层学习合作而提供一系列的性能和效率,通过计算卸载,可以满足应用要求。卸载计算的决定是一个通信-计算共同优化问题,与系统参数(例如网络状况)和工作量特点(例如投入)不同。另一方面,深层学习模式优化提供了另一个平衡点,以便满足精度和模型精确性能的要求。一个考虑这种计算-沟通问题的端对端至端决策解决方案,以便协同找到最佳的卸载政策和深层学习服务模式。为此,我们建议采用基于强化的学习基础计算方法来计算卸载解决方案,以便学习最优的下载时间(例如,网络前景)和工作量特性(例如,投入)。另一方面,深层次学习精度模型优化的精确度和模型选择方法,以最小化时间来进行我们最精确的系统,同时用最精确度的精度评估我们最精确度的精度的精度的系统,同时提供最精确的系统,以最精确的精度,以最小的精度来进行最精度的精度评估。