For domains in which a recommender provides repeated content suggestions, agent preferences may evolve over time as a function of prior recommendations, and algorithms must take this into account for long-run optimization. Recently, Agarwal and Brown (2022) introduced a model for studying recommendations when agents' preferences are adaptive, and gave a series of results for the case when agent preferences depend {\it uniformly} on their history of past selections. Here, the recommender shows a $k$-item menu (out of $n$) to the agent at each round, who selects one of the $k$ items via their history-dependent {\it preference model}, yielding a per-item adversarial reward for the recommender. We expand this setting to {\it non-uniform} preferences, and give a series of results for {\it $\gamma$-discounted} histories. For this problem, the feasible regret benchmarks can depend drastically on varying conditions. In the ``large $\gamma$'' regime, we show that the previously considered benchmark, the ``EIRD set'', is attainable for any {\it smooth} model, relaxing the ``local learnability'' requirement from the uniform memory case. We introduce ``pseudo-increasing'' preference models, for which we give an algorithm which can compete against any item distribution with small uniform noise (the ``smoothed simplex''). We show NP-hardness results for larger regret benchmarks in each case. We give another algorithm for pseudo-increasing models (under a restriction on the adversarial nature of the reward functions), which works for any $\gamma$ and is faster when $\gamma$ is sufficiently small, and we show a super-polynomial regret lower bound with respect to EIRD for general models in the ``small $\gamma$'' regime. We conclude with a pair of algorithms for the memoryless case.
翻译:对于推荐人反复提供内容建议的领域,代理商的偏好可能会随着时间而演变,作为先前建议的一种功能,并且算法必须考虑到这一点,以便长期优化。最近,Agarwal和Brown(2022年)引入了一个模型,用于在代理商的偏好具有适应性时研究建议,并给出了一系列结果,用于代理商的偏好取决于其过去选择历史的情况。在这里,推荐人向每轮代理商展示了一个美元项目菜单(用美元计算),该代理商通过历史依赖的 ~Iit 偏好模式选择一个美元项目,而算法则必须考虑到这一点。最近,Agarwal和Brown(2022年)引入了一个模型来研究建议者的建议,对于代理商的偏好性选择者来说,给代理商的偏好标准可能大大取决于不同的条件。在“美元”制度下,我们展示了先前考虑的基准,“EIRD 设置的低值”, 给推荐人带来一个更大规模的对建议性的对每个项目的对质奖状奖项的奖赏。我们从一个稳定的排序模型开始,让我们学习一个稳定的排序。