We revisit the classic problem of optimal subset selection in the online learning set-up. Assume that the set $[N]$ consists of $N$ distinct elements. On the $t$th round, an adversary chooses a monotone reward function $f_t: 2^{[N]} \to \mathbb{R}_+$ that assigns a non-negative reward to each subset of $[N].$ An online policy selects (perhaps randomly) a subset $S_t \subseteq [N]$ consisting of $k$ elements before the reward function $f_t$ for the $t$th round is revealed to the learner. As a consequence of its choice, the policy receives a reward of $f_t(S_t)$ on the $t$th round. Our goal is to design an online sequential subset selection policy to maximize the expected cumulative reward accumulated over a time horizon. In this connection, we propose an online learning policy called SCore (Subset Selection with Core) that solves the problem for a large class of reward functions. The proposed SCore policy is based on a new polyhedral characterization of the reward functions called $\alpha$-Core - a generalization of Core from the cooperative game theory literature. We establish a learning guarantee for the SCore policy in terms of a new performance metric called $\alpha$-augmented regret. In this new metric, the performance of the online policy is compared with an unrestricted offline benchmark that can select all $N$ elements at every round. We show that a large class of reward functions, including submodular, can be efficiently optimized with the SCore policy. We also extend the proposed policy to the optimistic learning set-up where the learner has access to additional untrusted hints regarding the reward functions. Finally, we conclude the paper with a list of open problems.
翻译:我们重新审视了在线学习设置中最佳子集选择的经典问题。 假设设定 $[ $N] 由美元构成的元元素构成, 美元回合中, 对手选择单调奖励函数$f_ t: 2 ⁇ [ N]\ to\mathbb{R ⁇ $, 给每个子分组 $[N] 分配非负奖项。 在线政策选择( 可能随机地) 子集 $S_ t\ subseeteq [ N] 。 假设设置的 $[ $N] 由美元元素组成, 美元回合中, 美元回合中, 美元回合中, 美元回合中, 将选择单调的单调奖项奖励函数 $ 。 我们的目标是设计一个在线的子选择政策, 在一个时空段中, 将一个额外的学习政策列表中, 将一个名为 ASore IMI 的 IMIL 。 以新的 AS AS IM AS IM 政策, 以新的 IMU IMU ASal ASal ASal ASal IP IP 。