In this dissertation, we propose a memory and computing coordinated methodology to thoroughly exploit the characteristics and capabilities of the GPU-based heterogeneous system to effectively optimize applications' performance and privacy. Specifically, 1) we propose a task-aware and dynamic memory management mechanism to co-optimize applications' latency and memory footprint, especially in multitasking scenarios. 2) We propose a novel latency-aware memory management framework that analyzes the application characteristics and hardware features to reduce applications' initialization latency and response time. 3) We develop a new model extraction attack that explores the vulnerability of the GPU unified memory system to accurately steal private DNN models. 4) We propose a CPU/GPU Co-Encryption mechanism that can defend against a timing-correlation attack in an integrated CPU/GPU platform to provide a secure execution environment for the edge applications. This dissertation aims at developing a high-performance and secure memory system and architecture in GPU heterogeneous platforms to deploy emerging AI-enabled applications efficiently and safely.
翻译:在这一论文中,我们提议了一种记忆和计算协调方法,以彻底利用基于 GPU 的多元系统的特点和能力,从而有效地优化应用的性能和隐私。具体地说,1我们提议了一个任务感和动态记忆管理机制,以共同优化应用的延时和记忆足迹,特别是在多任务设想中。2我们提议了一个新型的Latency-aware记忆管理框架,分析应用特性和硬件特性,以减少应用的初始性延时和反应时间。3)我们开发了一种新的模型提取攻击,探索GPU 统一记忆系统的脆弱性,以准确窃取私人的 DNN 模型。4我们提议了一个CPU/GPU Co-Empremption机制,这个机制能够在一个综合的CPU/GPU平台中保护时间感应攻击,以便为边缘应用提供一个安全的执行环境。这种分离的目的是在 GPU 混种平台上开发一个高性和安全的记忆系统和结构,以便高效和安全地部署新兴的AI 应用。