$k\textt{-专家}$ -- -- 在线政策和基本限值 ($k\texttt{-experts}$ -- Online Policies and Fundamental Limits)

This paper introduces and studies the $k\texttt{-experts}$ problem -- a generalization of the classic Prediction with Expert's Advice (i.e., the $\texttt{Experts}$) problem. Unlike the $\texttt{Experts}$ problem, where the learner chooses exactly one expert, in this problem, the learner selects a subset of $k$ experts from a pool of $N$ experts at each round. The reward obtained by the learner at any round depends on the rewards of the selected experts. The $k\texttt{-experts}$ problem arises in many practical settings, including online ad placements, personalized news recommendations, and paging. Our primary goal is to design an online learning policy having a small regret. In this pursuit, we propose $\texttt{SAGE}$ ($\textbf{Sa}$mpled Hed$\textbf{ge}$) - a framework for designing efficient online learning policies by leveraging statistical sampling techniques. We show that, for many related problems, $\texttt{SAGE}$ improves upon the state-of-the-art bounds for regret and computational complexity. Furthermore, going beyond the notion of regret, we characterize the mistake bounds achievable by online learning policies for a class of stable loss functions. We conclude the paper by establishing a tight regret lower bound for a variant of the $k\texttt{-experts}$ problem and carrying out experiments with standard datasets.

翻译：本文介绍并研究 $k\ textt{- experts} $ 问题。学习者在每回合从一个专家库中挑选一组美元专家。学习者在任何回合中获得的奖赏取决于选定专家的奖赏。 $ktt{- experts} 问题出现在许多实际环境中, 包括在线广告、个性化新闻建议和传呼。我们的首要目标是设计一个有一点点遗憾的在线学习政策。在追寻中, 我们建议 $\ textt{ SAG} $ (\ textb{Sa} $mopled $\ textbf{ ge} —— 一个通过利用统计抽样技术来设计高效的在线学习政策的框架。 $ktt{ 专家} 在很多实际环境中, 包括在线广告、个性化新闻建议、和 pingline address_ explain explain explain explain explications, 我们通过一系列相关的问题, $\ train- deal deal deal ex ex ex ex.