We introduce the $\texttt{$k$-experts}$ problem - a generalization of the classic Prediction with Expert's Advice framework. Unlike the classic version, where the learner selects exactly one expert from a pool of $N$ experts at each round, in this problem, the learner can select a subset of $k$ experts at each round $(1\leq k\leq N)$. The reward obtained by the learner at each round is assumed to be a function of the $k$ selected experts. The primary objective is to design an online learning policy with a small regret. In this pursuit, we propose $\texttt{SAGE}$ ($\textbf{Sa}$mpled Hed$\textbf{ge}$) - a framework for designing efficient online learning policies by leveraging statistical sampling techniques. For a wide class of reward functions, we show that $\texttt{SAGE}$ either achieves the first sublinear regret guarantee or improves upon the existing ones. Furthermore, going beyond the notion of regret, we fully characterize the mistake bounds achievable by online learning policies for stable loss functions. We conclude the paper by establishing a tight regret lower bound for a variant of the $\texttt{$k$-experts}$ problem and carrying out experiments with standard datasets.
翻译:我们引入了 $ textt{ $k$- experts} 问题 - 经典预测与专家咨询框架的概括化。 与经典版本不同的是, 学习者从每回合的专家中挑选一位专家, 这个问题中, 学习者可以在每个回合中选择一组 $k$ $ (1\leq k\leq) 。 学习者每回合获得的奖赏被假定是所选专家的函数 $k美元 。 主要目标是略为遗憾地设计在线学习政策。 与经典版本不同, 与经典版本不同, 学习者从每回合的专家中选择一位专家, 这个问题, 学习者可以在每回合中选择一组 $( 1\ leqk k\ k\ leq\ leq) 美元 问题 。 对于广泛的奖赏功能, 我们显示, $\ textt{Sage} 获得的奖赏是第一个子线性奖赏保证, 或者改进了现有专家的成绩。 此外,, 超越了遗憾的概念, 我们充分定义错误 $ 。, 我们通过在线学习稳定 标准 的实验,, 将 确定一个可实现 标准 标准 标准 的 。