Android平台上AI模型推理的硬件优化 (Hardware optimization on Android for inference of AI models)

The pervasive integration of Artificial Intelligence models into contemporary mobile computing is notable across numerous use cases, from virtual assistants to advanced image processing. Optimizing the mobile user experience involves minimal latency and high responsiveness from deployed AI models with challenges from execution strategies that fully leverage real time constraints to the exploitation of heterogeneous hardware architecture. In this paper, we research and propose the optimal execution configurations for AI models on an Android system, focusing on two critical tasks: object detection (YOLO family) and image classification (ResNet). These configurations evaluate various model quantization schemes and the utilization of on device accelerators, specifically the GPU and NPU. Our core objective is to empirically determine the combination that achieves the best trade-off between minimal accuracy degradation and maximal inference speed-up.

翻译：人工智能模型在当代移动计算中的普遍集成值得关注，其应用场景从虚拟助手到高级图像处理广泛存在。优化移动用户体验需要部署的AI模型具备最小延迟和高响应性，这面临着从充分利用实时约束的执行策略到异构硬件架构利用等多重挑战。本文研究并提出了Android系统上AI模型的最优执行配置，聚焦于两项关键任务：目标检测（YOLO系列）和图像分类（ResNet）。这些配置评估了多种模型量化方案以及设备端加速器（特别是GPU和NPU）的利用。我们的核心目标是通过实证确定在最小精度损失与最大推理加速之间达到最佳平衡的组合方案。

相关内容

关注 7072

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日