We consider the problem of maintaining sparsity in private distributed storage of confidential machine learning data. In many applications, e.g., face recognition, the data used in machine learning algorithms is represented by sparse matrices which can be stored and processed efficiently. However, mechanisms maintaining perfect information-theoretic privacy require encoding the sparse matrices into randomized dense matrices. It has been shown that, under some restrictions on the storage nodes, sparsity can be maintained at the expense of relaxing the perfect information-theoretic privacy requirement, i.e., allowing some information leakage. In this work, we lift the restrictions imposed on the storage nodes and show that there exists a trade-off between sparsity and the achievable privacy guarantees. We focus on the setting of non-colluding nodes and construct a coding scheme that encodes the sparse input matrices into matrices with the desired sparsity level while limiting the information leakage.
翻译:我们考虑了在私人分发的保密机器学习数据存储中保持宽度的问题。在许多应用程序中,例如面部识别,机器学习算法中所使用的数据由可有效存储和处理的稀薄矩阵代表,然而,保持完美信息理论隐私的机制要求将稀薄矩阵编码为随机密密密矩阵,已经表明,根据对存储节点的某些限制,可以保持宽度,而牺牲了完善的信息理论隐私要求,即允许一些信息渗漏。在这项工作中,我们取消了对存储节点的限制,并表明在宽度和可实现的隐私保障之间存在着平衡。我们侧重于设定非混合节点,并建立一个将稀薄输入矩阵与预期的宽度水平编码成矩阵的编码计划,同时限制信息泄漏。