Improved Privacy-Preserving PCA Using Space-optimized Homomorphic Matrix Multiplication

Principal Component Analysis (PCA) is a pivotal technique in the fields of machine learning and data analysis. In this study, we present a novel approach for privacy-preserving PCA using an approximate numerical arithmetic homomorphic encryption scheme. We build our method upon a proposed PCA routine known as the PowerMethod, which takes the covariance matrix as input and produces an approximate eigenvector corresponding to the first principal component of the dataset. Our method surpasses previous approaches (e.g., Pandas CSCML 21) in terms of efficiency, accuracy, and scalability. To achieve such efficiency and accuracy, we have implemented the following optimizations: (i) We optimized a homomorphic matrix multiplication technique (Jiang et al. SIGSAC 2018) that will play a crucial role in the computation of the covariance matrix. (ii) We devised an efficient homomorphic circuit for computing the covariance matrix homomorphically. (iii) We designed a novel and efficient homomorphic circuit for the PowerMethod that incorporates a systematic strategy for homomorphic vector normalization enhancing both its accuracy and practicality. Our matrix multiplication optimization reduces the minimum rotation key space required for a $128\times 128$ homomorphic matrix multiplication by up to 64\%, enabling more extensive parallel computation of multiple matrix multiplication instances. Our homomorphic covariance matrix computation method manages to compute the covariance matrix of the MNIST dataset ($60000\times 256$) in 51 minutes. Our privacy-preserving PCA scheme based on our new homomorphic PowerMethod circuit successfully computes the top 8 principal components of datasets such as MNIST and Fashion-MNIST in approximately 1 hour, achieving an r2 accuracy of 0.7 to 0.9, achieving an average speed improvement of over 4 times and offers higher accuracy compared to previous approaches.

翻译：暂无翻译

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日