With the rise of online applications, recommender systems (RSs) often encounter constraints in balancing exploration and exploitation. Such constraints arise when exploration is carried out by agents whose individual utility should be balanced with overall welfare. Recent work suggests that recommendations should be individually rational. Specifically, if agents have a default arm they would use, relying on the RS should yield each agent at least the reward of the default arm, conditioned on the knowledge available to the RS. Under this individual rationality constraint, striking a balance between exploration and exploitation becomes a complex planning problem. We assume a stochastic order of the rewards (e.g., Bernoulli, unit-variance Gaussian, etc.), and derive an approximately optimal algorithm. Our technique is based on an auxiliary Goal Markov Decision Process problem that is of independent interest. Additionally, we present an incentive-compatible version of our algorithm.
翻译:暂无翻译