We introduce the concept of history-restricted no-regret online learning algorithms. An online learning algorithm $\mathcal{A}$ is $M$-history-restricted if its output at time $t$ can be written as a function of the $M$ previous rewards. This class of online learning algorithms is quite natural to consider from many perspectives: they may be better models of human agents and they do not store long-term information (thereby ensuring ``the right to be forgotten''). We first demonstrate that a natural approach to constructing history-restricted algorithms from mean-based no-regret learning algorithms (e.g. running Hedge over the last $M$ rounds) fails, and that such algorithms incur linear regret. We then construct a history-restricted algorithm that achieves a per-round regret of $\Theta(1/\sqrt{M})$, which we complement with a tight lower bound. Finally, we empirically explore distributions where history-restricted online learners have favorable performance compared to other no-regret algorithms.
翻译:我们引入了历史限制的无关系在线学习算法概念。 在线学习算法 $\ mathcal{A} $M$是历史限制算法 。 如果其产出在时间上可以写成, 美元可以作为以前美元奖励的函数。 这种在线学习算法从许多角度考虑是相当自然的: 它们可能是人类代理人的更好模型, 它们并不存储长期信息( 从而确保“ 被遗忘的权利 ” ) 。 我们首先证明, 建立来自平均的无关系学习算法的历史限制算法( 例如, 在上一轮中运行套头 $M$ ) 的自然方法失败, 而这种算法会引发线性遗憾 。 然后我们构建了一种历史限制算法, 实现$( 1/\\ sqrt{M} ) 的全方位遗憾, 我们用一个更窄的界限来补充它。 最后, 我们从经验上探索了历史限制在线学习者与其他无关系算法相比优于其他无关系算法的分布 。