上下线抢匪案代表性学习的复杂性 (On the Complexity of Representation Learning in Contextual Linear Bandits)

In contextual linear bandits, the reward function is assumed to be a linear combination of an unknown reward vector and a given embedding of context-arm pairs. In practice, the embedding is often learned at the same time as the reward vector, thus leading to an online representation learning problem. Existing approaches to representation learning in contextual bandits are either very generic (e.g., model-selection techniques or algorithms for learning with arbitrary function classes) or specialized to particular structures (e.g., nested features or representations with certain spectral properties). As a result, the understanding of the cost of representation learning in contextual linear bandit is still limited. In this paper, we take a systematic approach to the problem and provide a comprehensive study through an instance-dependent perspective. We show that representation learning is fundamentally more complex than linear bandits (i.e., learning with a given representation). In particular, learning with a given set of representations is never simpler than learning with the worst realizable representation in the set, while we show cases where it can be arbitrarily harder. We complement this result with an extensive discussion of how it relates to existing literature and we illustrate positive instances where representation learning is as complex as learning with a fixed representation and where sub-logarithmic regret is achievable.

翻译：在背景线性土匪中,奖赏功能被假定为一个未知的奖赏矢量的线性组合和上下文武器配对的某种嵌入。在实践中,嵌入往往与奖赏矢量同时学习,从而导致在线代表性学习问题。在背景土匪中,现有的代表学习方法要么非常通用(例如,以任意功能类学习的模式选择技术或算法),要么专门针对特定结构(例如,以某些光谱属性进行嵌巢特征或表达)。结果,对背景线性土匪中代表学习成本的理解仍然有限。在本文中,我们采取系统的方法来解决这一问题,并通过以实例为依存的角度提供全面研究。我们表明,代表学习比线性土匪(即,以某种特定代表方式学习)基本上更为复杂。特别是,与特定一组代表的学习比学习最差的、最真实的代表性要简单一些,而我们却展示出一些可以任意地更难理解的例子。我们用大量的讨论结果补充了它与现有文献的关系,我们用一个基于实例来说明积极的事例,即代表的学习具有可实现的分级的复杂。