Mobile-centric AI applications have high requirements for resource-efficiency of model inference. Input filtering is a promising approach to eliminate the redundancy so as to reduce the cost of inference. Previous efforts have tailored effective solutions for many applications, but left two essential questions unanswered: (1) theoretical filterability of an inference workload to guide the application of input filtering techniques, thereby avoiding the trial-and-error cost for resource-constrained mobile applications; (2) robust discriminability of feature embedding to allow input filtering to be widely effective for diverse inference tasks and input content. To answer them, we first formalize the input filtering problem and theoretically compare the hypothesis complexity of inference models and input filters to understand the optimization potential. Then we propose the first end-to-end learnable input filtering framework that covers most state-of-the-art methods and surpasses them in feature embedding with robust discriminability. We design and implement InFi that supports six input modalities and multiple mobile-centric deployments. Comprehensive evaluations confirm our theoretical results and show that InFi outperforms strong baselines in applicability, accuracy, and efficiency. InFi achieve 8.5x throughput and save 95% bandwidth, while keeping over 90% accuracy, for a video analytics application on mobile platforms.
翻译:以移动为中心的 AI 应用程序对模型推断的资源效率要求很高。 输入过滤是消除冗余的一种很有希望的方法,可以降低推断成本。 以前的努力已经为许多应用程序量身定制了有效的解决方案,但有两个基本问题没有得到解答:(1) 用于指导输入过滤技术应用的推论工作量的理论过滤性,从而避免了资源限制的移动应用程序的试验和高度成本;(2) 功能嵌入的稳健可分化性,以便输入过滤能够广泛有效完成不同的推断任务和输入内容。为了回答这些问题,我们首先正式确定输入过滤问题,并从理论上比较推断模型和输入过滤过滤器的假设复杂性,以了解优化潜力。然后我们提出第一个端到端的可学习输入过滤框架,以指导输入过滤技术的应用,从而避免资源限制的移动应用程序的试入和高度成本;(2) 我们设计和实施支持六种输入模式和多种移动中心部署的InFi,以证实我们的理论结果,并表明InFi在适用性、准确性、准确性、准确性和效率上超过95的移动定位平台上实现8.5,同时保持一个精确度为95。