多重制约下的亚模块强盗问题 (Submodular Bandit Problem Under Multiple Constraints)

The linear submodular bandit problem was proposed to simultaneously address diversified retrieval and online learning in a recommender system. If there is no uncertainty, this problem is equivalent to a submodular maximization problem under a cardinality constraint. However, in some situations, recommendation lists should satisfy additional constraints such as budget constraints, other than a cardinality constraint. Thus, motivated by diversified retrieval considering budget constraints, we introduce a submodular bandit problem under the intersection of $l$ knapsacks and a $k$-system constraint. Here $k$-system constraints form a very general class of constraints including cardinality constraints and the intersection of $k$ matroid constraints. To solve this problem, we propose a non-greedy algorithm that adaptively focuses on a standard or modified upper-confidence bound. We provide a high-probability upper bound of an approximation regret, where the approximation ratio matches that of a fast offline algorithm. Moreover, we perform experiments under various combinations of constraints using a synthetic and two real-world datasets and demonstrate that our proposed methods outperform the existing baselines.

翻译：线性亚模块土匪问题被提议在推荐人系统中同时解决多样化的检索和在线学习问题。如果没有不确定性, 这个问题相当于在核心限制下的一个亚模块最大化问题。但是, 在某些情况下, 建议清单应该满足预算限制等额外限制, 而不是基本限制。因此, 以多样化的检索为动机, 考虑到预算限制, 我们引入了一个在美元Knapsacks和美元系统限制的交叉下的一个子模块型土匪问题。这里, $k$- 系统限制形成了一个非常普遍的制约类别, 包括基本限制和美元成型机器人限制的交叉点。为了解决这个问题, 我们提出了一种适应性非模块化的算法, 以标准或修改的上信任约束点为主点。我们提供了一种高度概率的近似差, 其近似率与快速离线算法相匹配。此外, 我们使用合成的和两个真实世界的数据集在各种制约组合下进行实验, 并证明我们提出的方法超过了现有的基线。