用于在线学习的近最佳内存- Regret 权衡</s> (Near Optimal Memory-Regret Tradeoff for Online Learning)

In the experts problem, on each of $T$ days, an agent needs to follow the advice of one of $n$ ``experts''. After each day, the loss associated with each expert's advice is revealed. A fundamental result in learning theory says that the agent can achieve vanishing regret, i.e. their cumulative loss is within $o(T)$ of the cumulative loss of the best-in-hindsight expert. Can the agent perform well without sufficient space to remember all the experts? We extend a nascent line of research on this question in two directions: $\bullet$ We give a new algorithm against the oblivious adversary, improving over the memory-regret tradeoff obtained by [PZ23], and nearly matching the lower bound of [SWXZ22]. $\bullet$ We also consider an adaptive adversary who can observe past experts chosen by the agent. In this setting we give both a new algorithm and a novel lower bound, proving that roughly $\sqrt{n}$ memory is both necessary and sufficient for obtaining $o(T)$ regret.

翻译：在专家问题中,在每一天,代理人都需要听从“专家”1美元的建议。每天之后,每个专家建议的损失都会暴露出来。学习理论的一个基本结果是,代理人可以取得消失的遗憾,即其累积损失在最佳视线专家累积损失的$(T)美元之内。代理人能否在没有足够的空间记住所有专家的情况下顺利地工作?我们从两个方面扩展了对这一问题的新生研究线:我们给一个模糊的对手一种新的算法,改进了[PZ23]获得的记忆-记录交换,几乎与[SWX22] 的较低约束相匹配。我们还认为一个适应性对手能够观察代理人所挑选的过去专家。在这个背景下,我们既提供了一种新的算法,又提供了新的较低约束,证明大约为$(sqrt)的记忆既有必要,也足以获得美元(T)的遗憾。</s>