Various privacy-preserving frameworks that respect the individual's privacy in the analysis of data have been developed in recent years. However, available model classes such as simple statistics or generalized linear models lack the flexibility required for a good approximation of the underlying data-generating process in practice. In this paper, we propose an algorithm for a distributed, privacy-preserving, and lossless estimation of generalized additive mixed models (GAMM) using component-wise gradient boosting (CWB). Making use of CWB allows us to reframe the GAMM estimation as a distributed fitting of base learners using the $L_2$-loss. In order to account for the heterogeneity of different data location sites, we propose a distributed version of a row-wise tensor product that allows the computation of site-specific (smooth) effects. Our adaption of CWB preserves all the important properties of the original algorithm, such as an unbiased feature selection and the feasibility to fit models in high-dimensional feature spaces, and yields equivalent model estimates as CWB on pooled data. Next to a derivation of the equivalence of both algorithms, we also showcase the efficacy of our algorithm on a distributed heart disease data set and compare it with state-of-the-art methods.
翻译:近些年来,制定了尊重个人在数据分析中隐私的各种隐私保护框架,但是,现有的模型类别,如简单统计或一般线性模型等,缺乏必要的灵活性,无法很好地接近基础数据生成过程的实际做法。我们在本文件中提出一个分布式、隐私保护和无损失地估计通用添加混合模型(GAMM)的算法,使用成份式梯度推增法(CWB)。利用CWB,我们可以将GAMM的估算重新设定为使用2美元损失的基底学习者的分布式搭配。为了计算不同数据定位站点的异质性,我们提议了一个分布式的可允许计算特定地点(mooth)效应的行向色素产品版本。我们对CWB的调整保留了原始算法的所有重要特性,例如不偏重地选择和将模型适用于高维特征空间的可行性,并得出与CWB在集合数据上的类似模型估计值。除了得出两种算法的等同值外,我们还展示了我们分布式算法在分布式疾病数据集上的效率。</s>