基于边缘智能的GPT在元宇宙中的火花: 移动AIGC服务的缓存与推断 (Sparks of GPTs in Edge Intelligence for Metaverse: Caching and Inference for Mobile AIGC Services) - 专知论文

会员服务 ·

0

边缘智能 · 边缘 · AIGC · 推断 · 上下文 ·

2023 年 4 月 18 日

Sparks of GPTs in Edge Intelligence for Metaverse: Caching and Inference for Mobile AIGC Services

翻译：基于边缘智能的GPT在元宇宙中的火花: 移动AIGC服务的缓存与推断

Minrui Xu,Dusit Niyato,Hongliang Zhang,Jiawen Kang,Zehui Xiong,Shiwen Mao,Zhu Han

Aiming at achieving artificial general intelligence (AGI) for Metaverse, pretrained foundation models (PFMs), e.g., generative pretrained transformers (GPTs), can effectively provide various AI services, such as autonomous driving, digital twins, and AI-generated content (AIGC) for extended reality. With the advantages of low latency and privacy-preserving, serving PFMs of mobile AI services in edge intelligence is a viable solution for caching and executing PFMs on edge servers with limited computing resources and GPU memory. However, PFMs typically consist of billions of parameters that are computation and memory-intensive for edge servers during loading and execution. In this article, we investigate edge PFM serving problems for mobile AIGC services of Metaverse. First, we introduce the fundamentals of PFMs and discuss their characteristic fine-tuning and inference methods in edge intelligence. Then, we propose a novel framework of joint model caching and inference for managing models and allocating resources to satisfy users' requests efficiently. Furthermore, considering the in-context learning ability of PFMs, we propose a new metric to evaluate the freshness and relevance between examples in demonstrations and executing tasks, namely the Age of Context (AoC). Finally, we propose a least context algorithm for managing cached models at edge servers by balancing the tradeoff among latency, energy consumption, and accuracy.

翻译：为了实现元宇宙的人工智能通用性 (AGI)，如生成式预训练变压器 (GPT) 等预训练基础模型 (PFMs) 可有效提供各种人工智能服务，例如自动驾驶、数字孪生和人工智能生成内容 (AIGC). 利用低延迟和隐私保护的优势，在边缘智能环境中提供移动 AI 服务的 PFMs 可为有限的计算资源和 GPU 内存的边缘服务器执行 PFMs 缓存和执行功能，从而实现符合用户需求的有效服务。但 PFMs 通常由数十亿个参数组成，其在加载和执行过程中会产生计算和内存密集型负载。本文研究了元宇宙移动 AIGC 服务的边缘 PFM 服务问题。首先，我们介绍了 PFMs 的基础知识，并讨论了边缘智能中它们的特征微调和推断方法。然后，我们提出了一个新的联合模型缓存和推断框架，以有效管理模型和分配资源以满足用户需求。此外，考虑到 PFMs 的上下文学习能力，我们提出了一个新的指标，用于评估演示和执行任务之间的新鲜度和相关性，即上下文年龄 (AoC)。最后，我们提出了一种最少上下文算法，用于通过平衡延迟、能源消耗和精度之间的权衡来管理边缘服务器上的缓存模型。

2

相关内容

边缘智能

ChatGPT等AIGC如何移动边缘部署？南洋理工最新《在移动网络中释放边云生成AI的力量:AIGC服务》综述其技术体系

ChatGPT等AIGC如何移动边缘部署？南洋理工最新《在移动网络中释放边云生成AI的力量:AIGC服务》综述其技术体系

专知会员服务

95+阅读 · 2023年3月30日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

韩国学者发布《元宇宙中的区块链》综述，Blockchain for the Metaverse: A Review

韩国学者发布《元宇宙中的区块链》综述，Blockchain for the Metaverse: A Review

专知会员服务

46+阅读 · 2022年3月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【KDD2020】图神经网络生成式预训练，GPT-GNN: Generative Pre-Training of Graph Neural Networks

【KDD2020】图神经网络生成式预训练，GPT-GNN: Generative Pre-Training of Graph Neural Networks

专知会员服务

99+阅读 · 2020年7月3日

【边缘智能综述论文】A Survey on Edge Intelligence

【边缘智能综述论文】A Survey on Edge Intelligence

专知会员服务

123+阅读 · 2020年3月30日

【ACM综述】工业4.0人机交互综述论文，45页pdf，A Survey on Human Machine Interaction in Industry 4.0

【ACM综述】工业4.0人机交互综述论文，45页pdf，A Survey on Human Machine Interaction in Industry 4.0

专知会员服务

59+阅读 · 2020年2月6日

【伯克利 | 情感计算】大规模异构多媒体数据的情感计算:综述论文，37页pdf，171篇参考文献，Affective Computing for Large-Scale Heterogeneous Multimedia Data: A Survey

【伯克利 | 情感计算】大规模异构多媒体数据的情感计算:综述论文，37页pdf，171篇参考文献，Affective Computing for Large-Scale Heterogeneous Multimedia Data: A Survey

专知会员服务

31+阅读 · 2019年11月15日

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

专知会员服务

95+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Uber 的服务网格架构设计

Uber 的服务网格架构设计

InfoQ

1+阅读 · 2022年8月1日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

通过集成 XNNPACK 实现推理速度飞跃

通过集成 XNNPACK 实现推理速度飞跃

TensorFlow

26+阅读 · 2020年7月30日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

SIGIR2019 接收论文列表

SIGIR2019 接收论文列表

专知

18+阅读 · 2019年4月20日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

空间受限的多频段终端MIMO天线研究

国家自然科学基金

0+阅读 · 2014年12月31日

多核混合关键度实时系统中同步感知的调度方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向云计算环境的高效视频编码多粒度优化关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于GPU的脉冲星宽带观测的相干消色散研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多重感知的无线多跳通信网络泛在路由技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

人类复杂疾病连锁不平衡基因定位的统计方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

应用于下一代100Gbps以太网的高速串行接口PHY关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

上下文感知的Web服务自适应计算模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

宽带频谱压缩感知与自适应分配算法

国家自然科学基金

0+阅读 · 2011年12月31日

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

Arxiv

1+阅读 · 2023年6月2日

Integrated Sensing-Communication-Computation for Edge Artificial Intelligence

Arxiv

0+阅读 · 2023年6月1日

Knowledge Base Question Answering for Space Debris Queries

Arxiv

0+阅读 · 2023年5月31日

Bringing DNS Service to 5G Edge for Reduced Latencies in mMTC Applications

Arxiv

0+阅读 · 2023年5月30日

A Telecare System for Use in Traditional Persian Medicine

Arxiv

0+阅读 · 2023年5月27日

ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps

Arxiv

30+阅读 · 2023年5月12日

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Arxiv

151+阅读 · 2023年3月29日

A Survey on XAI for Beyond 5G Security: Technical Aspects, Use Cases, Challenges and Research Directions

Arxiv

25+阅读 · 2022年4月27日

AI for Next Generation Computing: Emerging Trends and Future Directions

Arxiv

19+阅读 · 2022年3月5日

Artificial Intelligence for the Metaverse: A Survey

Arxiv

31+阅读 · 2022年2月15日

VIP会员

文章信息

相关主题

相关VIP内容

ChatGPT等AIGC如何移动边缘部署？南洋理工最新《在移动网络中释放边云生成AI的力量:AIGC服务》综述其技术体系

ChatGPT等AIGC如何移动边缘部署？南洋理工最新《在移动网络中释放边云生成AI的力量:AIGC服务》综述其技术体系

专知会员服务

95+阅读 · 2023年3月30日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

韩国学者发布《元宇宙中的区块链》综述，Blockchain for the Metaverse: A Review

韩国学者发布《元宇宙中的区块链》综述，Blockchain for the Metaverse: A Review

专知会员服务

46+阅读 · 2022年3月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【KDD2020】图神经网络生成式预训练，GPT-GNN: Generative Pre-Training of Graph Neural Networks

【KDD2020】图神经网络生成式预训练，GPT-GNN: Generative Pre-Training of Graph Neural Networks

专知会员服务

99+阅读 · 2020年7月3日

【边缘智能综述论文】A Survey on Edge Intelligence

【边缘智能综述论文】A Survey on Edge Intelligence

专知会员服务

123+阅读 · 2020年3月30日

【ACM综述】工业4.0人机交互综述论文，45页pdf，A Survey on Human Machine Interaction in Industry 4.0

【ACM综述】工业4.0人机交互综述论文，45页pdf，A Survey on Human Machine Interaction in Industry 4.0

专知会员服务

59+阅读 · 2020年2月6日

【伯克利 | 情感计算】大规模异构多媒体数据的情感计算:综述论文，37页pdf，171篇参考文献，Affective Computing for Large-Scale Heterogeneous Multimedia Data: A Survey

【伯克利 | 情感计算】大规模异构多媒体数据的情感计算:综述论文，37页pdf，171篇参考文献，Affective Computing for Large-Scale Heterogeneous Multimedia Data: A Survey

专知会员服务

31+阅读 · 2019年11月15日

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

专知会员服务

95+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

基于大型语言模型的网络威胁情报：利用LLM提取MITRE ATT&CK技术 | 最新文献

无人机（UAV）战略：区域大国与暴力非国家行为体在中东冲突中对无人机的运用 | 130页

神经技术与未来无人机战争的交汇点 | 最新报告

美国从“蛛网行动”中汲取轰炸机舰队保护教训

相关资讯

Uber 的服务网格架构设计

Uber 的服务网格架构设计

InfoQ

1+阅读 · 2022年8月1日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

通过集成 XNNPACK 实现推理速度飞跃

通过集成 XNNPACK 实现推理速度飞跃

TensorFlow

26+阅读 · 2020年7月30日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

SIGIR2019 接收论文列表

SIGIR2019 接收论文列表

专知

18+阅读 · 2019年4月20日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

Arxiv

1+阅读 · 2023年6月2日

Integrated Sensing-Communication-Computation for Edge Artificial Intelligence

Arxiv

0+阅读 · 2023年6月1日

Knowledge Base Question Answering for Space Debris Queries

Arxiv

0+阅读 · 2023年5月31日

Bringing DNS Service to 5G Edge for Reduced Latencies in mMTC Applications

Arxiv

0+阅读 · 2023年5月30日

A Telecare System for Use in Traditional Persian Medicine

Arxiv

0+阅读 · 2023年5月27日

ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps

Arxiv

30+阅读 · 2023年5月12日

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Arxiv

151+阅读 · 2023年3月29日

A Survey on XAI for Beyond 5G Security: Technical Aspects, Use Cases, Challenges and Research Directions

Arxiv

25+阅读 · 2022年4月27日

AI for Next Generation Computing: Emerging Trends and Future Directions

Arxiv

19+阅读 · 2022年3月5日

Artificial Intelligence for the Metaverse: A Survey

Arxiv

31+阅读 · 2022年2月15日

相关基金

空间受限的多频段终端MIMO天线研究

国家自然科学基金

0+阅读 · 2014年12月31日

多核混合关键度实时系统中同步感知的调度方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向云计算环境的高效视频编码多粒度优化关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于GPU的脉冲星宽带观测的相干消色散研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多重感知的无线多跳通信网络泛在路由技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

人类复杂疾病连锁不平衡基因定位的统计方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

应用于下一代100Gbps以太网的高速串行接口PHY关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

上下文感知的Web服务自适应计算模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

宽带频谱压缩感知与自适应分配算法

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员