Scientific collaborations benefit from collaborative learning of distributed sources, but remain difficult to achieve when data are sensitive. In recent years, privacy preserving techniques have been widely studied to analyze distributed data across different agencies while protecting sensitive information. Secure multiparty computation has been widely studied for privacy protection with high privacy level but intense computation cost. There are also other security techniques sacrificing partial data utility to reduce disclosure risk. A major challenge is to balance data utility and disclosure risk while maintaining high computation efficiency. In this paper, matrix masking technique is applied to encrypt data such that the secure schemes are against malicious adversaries while achieving local differential privacy. The proposed schemes are designed for linear models and can be implemented for both vertical and horizontal partitioning scenarios. Moreover, cross validation is studied to prevent overfitting and select optimal parameters without additional communication cost. Simulation results present the efficiency of proposed schemes to analyze dataset with millions of records and high-dimensional data (n << p).
翻译:最近几年,对隐私保护技术进行了广泛研究,以分析不同机构分布的数据,同时保护敏感信息; 以高隐私水平和密集计算成本对安全的多式计算进行了广泛研究,以保护隐私; 也使用其他安全技术,牺牲部分数据效用,以减少披露风险; 一个重大挑战是平衡数据效用和披露风险,同时保持高计算效率; 在本文件中,采用矩阵掩码技术对数据进行加密,以便安全计划针对恶意对手,同时实现地方差异隐私; 拟议的计划是为线性模式设计的,可以用于纵向和横向分割情景; 此外,还研究交叉验证,以防止过度装配和选择最佳参数,而不增加通信成本; 模拟结果显示拟议计划的效率,用数百万记录和高维数据分析数据集(n ⁇ p p)。