Aiming at achieving artificial general intelligence (AGI) for Metaverse, pretrained foundation models (PFMs), e.g., generative pretrained transformers (GPTs), can effectively provide various AI services, such as autonomous driving, digital twins, and AI-generated content (AIGC) for extended reality. With the advantages of low latency and privacy-preserving, serving PFMs of mobile AI services in edge intelligence is a viable solution for caching and executing PFMs on edge servers with limited computing resources and GPU memory. However, PFMs typically consist of billions of parameters that are computation and memory-intensive for edge servers during loading and execution. In this article, we investigate edge PFM serving problems for mobile AIGC services of Metaverse. First, we introduce the fundamentals of PFMs and discuss their characteristic fine-tuning and inference methods in edge intelligence. Then, we propose a novel framework of joint model caching and inference for managing models and allocating resources to satisfy users' requests efficiently. Furthermore, considering the in-context learning ability of PFMs, we propose a new metric to evaluate the freshness and relevance between examples in demonstrations and executing tasks, namely the Age of Context (AoC). Finally, we propose a least context algorithm for managing cached models at edge servers by balancing the tradeoff among latency, energy consumption, and accuracy.
翻译:为了实现元宇宙的人工智能通用性 (AGI),如生成式预训练变压器 (GPT) 等预训练基础模型 (PFMs) 可有效提供各种人工智能服务,例如自动驾驶、数字孪生和人工智能生成内容 (AIGC). 利用低延迟和隐私保护的优势,在边缘智能环境中提供移动 AI 服务的 PFMs 可为有限的计算资源和 GPU 内存的边缘服务器执行 PFMs 缓存和执行功能,从而实现符合用户需求的有效服务。但 PFMs 通常由数十亿个参数组成,其在加载和执行过程中会产生计算和内存密集型负载。本文研究了元宇宙移动 AIGC 服务的边缘 PFM 服务问题。首先,我们介绍了 PFMs 的基础知识,并讨论了边缘智能中它们的特征微调和推断方法。然后,我们提出了一个新的联合模型缓存和推断框架,以有效管理模型和分配资源以满足用户需求。此外,考虑到 PFMs 的上下文学习能力,我们提出了一个新的指标,用于评估演示和执行任务之间的新鲜度和相关性,即上下文年龄 (AoC)。最后,我们提出了一种最少上下文算法,用于通过平衡延迟、能源消耗和精度之间的权衡来管理边缘服务器上的缓存模型。