加速 Intel GPU 加密计算 (Accelerating Encrypted Computing on Intel GPUs) - 专知论文

会员服务 ·

0

Performer · 优化器 · 英特尔 (Intel) · GPU · 同态加密 ·

2021 年 9 月 29 日

Accelerating Encrypted Computing on Intel GPUs

翻译：加速 Intel GPU 加密计算

Yujia Zhai,Mohannad Ibrahim,Yiqin Qiu,Fabian Boemer,Zizhong Chen,Alexey Titov,Alexander Lyashevsky

Homomorphic Encryption (HE) is an emerging encryption scheme that allows computations to be performed directly on encrypted messages. This property provides promising applications such as privacy-preserving deep learning and cloud computing. Prior works have been proposed to enable practical privacy-preserving applications with architectural-aware optimizations on CPUs, GPUs and FPGAs. However, there is no systematic optimization for the whole HE pipeline on Intel GPUs. In this paper, we present the first-ever SYCL-based GPU backend for Microsoft SEAL APIs. We perform optimizations from instruction level, algorithmic level and application level to accelerate our HE library based on the Cheon, Kim, Kimand Song (CKKS) scheme on Intel GPUs. The performance is validated on two latest Intel GPUs. Experimental results show that our staged optimizations together with optimizations including low-level optimizations and kernel fusion accelerate the Number Theoretic Transform (NTT), a key algorithm for HE, by up to 9.93X compared with the na\"ive GPU baseline. The roofline analysis confirms that our optimized NTT reaches 79.8% and85.7% of the peak performance on two GPU devices. Through the highly optimized NTT and the assembly-level optimization, we obtain 2.32X - 3.05X acceleration for HE evaluation routines. In addition, our all-together systematic optimizations improve the performance of encrypted element-wise polynomial matrix multiplication application by up to 3.10X.

翻译：基因加密( HH) 是一个新兴的加密方案, 允许在加密信件上直接进行计算。此属性提供了有希望的应用, 如隐私保存深层学习和云计算。先前的工程已经提出, 以便在CPU、 GPUs 和 FPGAs 上实现建筑智能优化, 以在 CPU、 GPUs 和 FPGAs 上实现实际的隐私保护应用程序。但是, 在 Intel GPUs 上没有系统优化整个 HE 管道。在本文中, 我们为 Microsoft SEAL API 提供了有史以来第一个基于 SYCL 的 GPU 后端。我们从指令级别、算法级别和应用程序级别上进行优化, 以加快我们基于 Cheon, Kim, Kim and Song Song( CKKS) 的 HE 图书馆。最新版本分析显示, 我们的阶段优化和优化, 包括低级别优化和内气态变换数字( NTTT), 通过 N993X 与 NPU 3. 最高级测试, 最高性优化的性测试, 达到我们最高级的绩效。

0

相关内容

Performer

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

如何加速深度神经网络计算效率？看NVIDIA-ISSCC2021教程，附Slides与视频

如何加速深度神经网络计算效率？看NVIDIA-ISSCC2021教程，附Slides与视频

专知会员服务

34+阅读 · 2021年3月25日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【Manning新书】微服务安全实战，616页pdf，Microservices Security in Action

【Manning新书】微服务安全实战，616页pdf，Microservices Security in Action

专知会员服务

46+阅读 · 2020年7月22日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

【O'Reilly TensorFlow World 2019】在NVIDIA GPU上加速训练，推理和ML应用（Accelerating training, inference, and ML applications on NVIDIA GPUs），NVIDIA，Maggie Zhang ，Nathan Luehr，Josh Romero，Pooya Davoodi，Pooya Davoodi

【O'Reilly TensorFlow World 2019】在NVIDIA GPU上加速训练，推理和ML应用（Accelerating training, inference, and ML applications on NVIDIA GPUs），NVIDIA，Maggie Zhang ，Nathan Luehr，Josh Romero，Pooya Davoodi，Pooya Davoodi

专知会员服务

7+阅读 · 2019年11月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

计算机 | USENIX Security 2020等国际会议信息5条

计算机 | USENIX Security 2020等国际会议信息5条

Call4Papers

7+阅读 · 2019年4月25日

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

计算机类 | SIGMETRICS 2019等国际会议信息7条

计算机类 | SIGMETRICS 2019等国际会议信息7条

Call4Papers

9+阅读 · 2018年10月23日

计算机类 | 11月截稿会议信息9条

计算机类 | 11月截稿会议信息9条

Call4Papers

6+阅读 · 2018年10月14日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

前端高性能计算（4）：GPU加速计算

前端高性能计算（4）：GPU加速计算

前端大全

7+阅读 · 2017年10月26日

【推荐】基于TVM工具链的深度学习编译器 NNVM compiler发布

【推荐】基于TVM工具链的深度学习编译器 NNVM compiler发布

机器学习研究会

5+阅读 · 2017年10月7日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication

Arxiv

0+阅读 · 2021年11月24日

SoK: Untangling File-based Encryption on Mobile Devices

Arxiv

1+阅读 · 2021年11月24日

On the convergence of Broyden's method and some accelerated schemes for singular problems

Arxiv

0+阅读 · 2021年11月24日

A Variant RSA Acceleration with Parallelization

Arxiv

0+阅读 · 2021年11月23日

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Arxiv

0+阅读 · 2021年11月22日

Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data

Arxiv

4+阅读 · 2021年6月18日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks

Arxiv

10+阅读 · 2016年9月30日

VIP会员

文章信息

相关主题

英特尔 (Intel)

相关VIP内容

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

如何加速深度神经网络计算效率？看NVIDIA-ISSCC2021教程，附Slides与视频

如何加速深度神经网络计算效率？看NVIDIA-ISSCC2021教程，附Slides与视频

专知会员服务

34+阅读 · 2021年3月25日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【Manning新书】微服务安全实战，616页pdf，Microservices Security in Action

【Manning新书】微服务安全实战，616页pdf，Microservices Security in Action

专知会员服务

46+阅读 · 2020年7月22日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

【O'Reilly TensorFlow World 2019】在NVIDIA GPU上加速训练，推理和ML应用（Accelerating training, inference, and ML applications on NVIDIA GPUs），NVIDIA，Maggie Zhang ，Nathan Luehr，Josh Romero，Pooya Davoodi，Pooya Davoodi

【O'Reilly TensorFlow World 2019】在NVIDIA GPU上加速训练，推理和ML应用（Accelerating training, inference, and ML applications on NVIDIA GPUs），NVIDIA，Maggie Zhang ，Nathan Luehr，Josh Romero，Pooya Davoodi，Pooya Davoodi

专知会员服务

7+阅读 · 2019年11月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机集群配置对模拟作战环境任务效能的影响研究》最新50页

《俄罗斯作战模式解析：对俄特别军事行动的观察报告》最新325页

军用无人机集群技术尚未成熟——但潜力可期

《无人机改变战争规则，但无法破解陆战固有挑战》最新报告

相关资讯

计算机 | USENIX Security 2020等国际会议信息5条

计算机 | USENIX Security 2020等国际会议信息5条

Call4Papers

7+阅读 · 2019年4月25日

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

计算机类 | SIGMETRICS 2019等国际会议信息7条

计算机类 | SIGMETRICS 2019等国际会议信息7条

Call4Papers

9+阅读 · 2018年10月23日

计算机类 | 11月截稿会议信息9条

计算机类 | 11月截稿会议信息9条

Call4Papers

6+阅读 · 2018年10月14日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

前端高性能计算（4）：GPU加速计算

前端高性能计算（4）：GPU加速计算

前端大全

7+阅读 · 2017年10月26日

【推荐】基于TVM工具链的深度学习编译器 NNVM compiler发布

【推荐】基于TVM工具链的深度学习编译器 NNVM compiler发布

机器学习研究会

5+阅读 · 2017年10月7日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication

Arxiv

0+阅读 · 2021年11月24日

SoK: Untangling File-based Encryption on Mobile Devices

Arxiv

1+阅读 · 2021年11月24日

On the convergence of Broyden's method and some accelerated schemes for singular problems

Arxiv

0+阅读 · 2021年11月24日

A Variant RSA Acceleration with Parallelization

Arxiv

0+阅读 · 2021年11月23日

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Arxiv

0+阅读 · 2021年11月22日

Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data

Arxiv

4+阅读 · 2021年6月18日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks

Arxiv

10+阅读 · 2016年9月30日

微信扫码咨询专知VIP会员