Private multi-winner voting is the task of revealing $k$-hot binary vectors satisfying a bounded differential privacy (DP) guarantee. This task has been understudied in machine learning literature despite its prevalence in many domains such as healthcare. We propose three new DP multi-winner mechanisms: Binary, $\tau$, and Powerset voting. Binary voting operates independently per label through composition. $\tau$ voting bounds votes optimally in their $\ell_2$ norm for tight data-independent guarantees. Powerset voting operates over the entire binary vector by viewing the possible outcomes as a power set. Our theoretical and empirical analysis shows that Binary voting can be a competitive mechanism on many tasks unless there are strong correlations between labels, in which case Powerset voting outperforms it. We use our mechanisms to enable privacy-preserving multi-label learning in the central setting by extending the canonical single-label technique: PATE. We find that our techniques outperform current state-of-the-art approaches on large, real-world healthcare data and standard multi-label benchmarks. We further enable multi-label confidential and private collaborative (CaPC) learning and show that model performance can be significantly improved in the multi-site setting.
翻译:私人多赢者投票是披露 $k$- hot 二进制矢量的任务,满足了封闭式的隐私(DP) 保障。 尽管机器学习文献在医疗保健等许多领域十分普遍, 这项任务在机器学习文献中研究不足。 我们提议了三种新的DP多赢者机制: 比纳里、 $tau$ 和 Powerset 投票。 二进制投票通过组成, 每个标签独立运作。 $\ tau$\ tau 投票约束票在严格数据独立的保障中以$\ ell_ 2美元标准为最佳标准。 电源集投票通过将可能的结果视为权力集集, 在整个二进制矢量上运作。 我们的理论和经验分析表明, 双进制投票可以成为许多任务的竞争性机制, 除非标签之间有很强的关联性, 否则双进制投票胜过它。 我们使用我们的机制, 通过扩展罐体单一标签技术, 即 PATE。 我们发现, 我们的技术超越了整个二进制状态- 方法, 在大型、 真实世界的保健数据和标准多标签基准中, 我们能够大大改进了多级的多标签化的多功能学习模式。