Web applications where users are presented with a limited selection of items have long employed ranking models to put the most relevant results first. Any feedback received from users is typically assumed to reflect a relative judgement on the utility of items, e.g. a user clicking on an item only implies it is better than items not clicked in the same ranked list. Hence, the objectives optimized in Learning-to-Rank (LTR) tend to be pairwise or listwise. Yet, by only viewing feedback as relative, we neglect the user's absolute feedback on the list's overall quality, e.g. when no items in the selection are clicked. We thus reconsider the standard LTR paradigm and argue the benefits of learning from this listwide signal. To this end, we propose the RankFormer as an architecture that, with a Transformer at its core, can jointly optimize a novel listwide assessment objective and a traditional listwise LTR objective. We simulate implicit feedback on public datasets and observe that the RankFormer succeeds in benefitting from listwide signals. Additionally, we conduct experiments in e-commerce on Amazon Search data and find the RankFormer to be superior to all baselines offline. An online experiment shows that knowledge distillation can be used to find immediate practical use for the RankFormer.
翻译:暂无翻译