We study private and robust multi-armed bandits (MABs), where the agent receives Huber's contaminated heavy-tailed rewards and meanwhile needs to ensure differential privacy. We first present its minimax lower bound, characterizing the information-theoretic limit of regret with respect to privacy budget, contamination level and heavy-tailedness. Then, we propose a meta-algorithm that builds on a private and robust mean estimation sub-routine \texttt{PRM} that essentially relies on reward truncation and the Laplace mechanism only. For two different heavy-tailed settings, we give specific schemes of \texttt{PRM}, which enable us to achieve nearly-optimal regret. As by-products of our main results, we also give the first minimax lower bound for private heavy-tailed MABs (i.e., without contamination). Moreover, our two proposed truncation-based \texttt{PRM} achieve the optimal trade-off between estimation accuracy, privacy and robustness. Finally, we support our theoretical results with experimental studies.
翻译:我们研究的是私人和强大的多武装强盗(MABs ), 代理商在那里得到Huber被污染的重尾奖赏,同时需要确保不同的隐私。 我们首先提出其微缩缩式下限, 描述对隐私预算、污染水平和重尾的遗憾程度的信息理论极限。 然后, 我们提出一个以私人和强势平均估测的次中程线( 即无污染 ) 为基础的元缩略图。 此外, 我们提出的两项基于奖励的三角曲线和拉普尔机制基本上只依赖奖励性脱轨和拉普尔机制。 对于两种不同的重尾饰环境, 我们给出了具体的\ textt{ PRM} 计划, 使我们得以实现近乎最佳的遗憾。 作为我们主要结果的副产品, 我们还为私人重尾巴MABs( 即无污染) 提供了第一个微缩式下限。 此外, 我们提出的两项基于 truncation- textt{PRM} 在估算准确性、 隐私和稳健性之间实现最佳交易。 最后, 我们用实验性研究来支持我们的理论结果。