Federated Learning by nature is susceptible to low-quality, corrupted, or even malicious data that can severely degrade the quality of the learned model. Traditional techniques for data valuation cannot be applied as the data is never revealed. We present a novel technique for filtering, and scoring data based on a practical influence approximation (`lazy' influence) that can be implemented in a privacy-preserving manner. Each participant uses his own data to evaluate the influence of another participant's batch, and reports to the center an obfuscated score using differential privacy. Our technique allows for highly effective filtering of corrupted data in a variety of applications. Importantly, we show that most of the corrupted data can be filtered out (recall of $>90\%$, and even up to $100\%$), even under really strong privacy guarantees ($\varepsilon \leq 1$).
翻译:自然而然的联谊学习容易受到低质量、腐败甚至恶意数据的影响,这些数据可能严重降低所学模型的质量。数据评估的传统技术无法应用,因为数据从未披露。我们展示了一种基于实际影响力近似(“懒惰”影响)的过滤和评分新技术,可以以隐私保护方式实施。每个参与者利用自己的数据评估另一参与者组的影响,并利用不同的隐私向中心报告模糊的得分。我们的技术允许在各种应用中对腐败数据进行高效的过滤。重要的是,我们表明,大多数腐败数据都可以被过滤出去(重新点拨90美元,甚至高达100美元),即便在真正强大的隐私保障下($\varepsilon\leq 1美元 ) 。