The pervasive integration of Artificial Intelligence models into contemporary mobile computing is notable across numerous use cases, from virtual assistants to advanced image processing. Optimizing the mobile user experience involves minimal latency and high responsiveness from deployed AI models with challenges from execution strategies that fully leverage real time constraints to the exploitation of heterogeneous hardware architecture. In this paper, we research and propose the optimal execution configurations for AI models on an Android system, focusing on two critical tasks: object detection (YOLO family) and image classification (ResNet). These configurations evaluate various model quantization schemes and the utilization of on device accelerators, specifically the GPU and NPU. Our core objective is to empirically determine the combination that achieves the best trade-off between minimal accuracy degradation and maximal inference speed-up.
翻译:人工智能模型在当代移动计算中的普遍集成值得关注,其应用场景从虚拟助手到高级图像处理广泛存在。优化移动用户体验需要部署的AI模型具备最小延迟和高响应性,这面临着从充分利用实时约束的执行策略到异构硬件架构利用等多重挑战。本文研究并提出了Android系统上AI模型的最优执行配置,聚焦于两项关键任务:目标检测(YOLO系列)和图像分类(ResNet)。这些配置评估了多种模型量化方案以及设备端加速器(特别是GPU和NPU)的利用。我们的核心目标是通过实证确定在最小精度损失与最大推理加速之间达到最佳平衡的组合方案。