Submodular optimization with bandit feedback has recently been studied in a variety of contexts. In a number of real-world applications such as diversified recommender systems and data summarization, the submodular function exhibits additional linear structure. We consider developing approximation algorithms for the maximization of a submodular objective function $f:2^U\to\mathbb{R}_{\geq 0}$, where $f=\sum_{i=1}^dw_iF_{i}$. It is assumed that we have value oracle access to the functions $F_i$, but the coefficients $w_i$ are unknown, and $f$ can only be accessed via noisy queries. We develop algorithms for this setting inspired by adaptive allocation algorithms in the best-arm identification for linear bandit, with approximation guarantees arbitrarily close to the setting where we have value oracle access to $f$. Finally, we empirically demonstrate that our algorithms make vast improvements in terms of sample efficiency compared to algorithms that do not exploit the linear structure of $f$ on instances of move recommendation.
翻译:暂无翻译