Top-N recommendation aims to recommend each consumer a small set of N items from a large collection of items, and its accuracy is one of the most common indexes to evaluate the performance of a recommendation system. While a large number of algorithms are proposed to push the Top-N accuracy by learning the user preference from their history purchase data, a predictability question is naturally raised - whether there is an upper limit of such Top-N accuracy. This work investigates such predictability by studying the degree of regularity from a specific set of user behavior data. Quantifying the predictability of Top-N recommendations requires simultaneously quantifying the limits on the accuracy of the N behaviors with the highest probability. This greatly increases the difficulty of the problem. To achieve this, we firstly excavate the associations among N behaviors with the highest probability and describe the user behavior distribution based on the information theory. Then, we adopt the Fano inequality to scale and obtain the Top-N predictability. Extensive experiments are conducted on the real-world data where significant improvements are observed compared to the state-of-the-art methods. We have not only completed the predictability calculation for N targets but also obtained predictability that is much closer to the true value than existing methods. We expect our results to assist these research areas where the quantitative requirement of Top-N predictability is required.
翻译:Top-N 推荐旨在从大量物品中向每个消费者推荐一小组 N 个物品,其准确性是评估推荐系统性能最常用的指标之一。尽管有大量算法被提出来通过学习用户从历史购买数据中获取偏好来提高 Top-N 准确性,但一个可预测性问题自然而然地被提出——是否存在这样一个上限,即 Top-N 准确性的预测能力。本研究通过研究一组特定用户行为数据的规律程度来调查这种可预测性。量化 Top-N 推荐的可预测性需要同时量化 N 个行为的准确性的上限,在很大程度上增加了问题的难度。为了达到这个目的,我们首先挖掘了 N 个最高概率行为之间的关联,并根据信息理论描述了用户行为分布。然后,我们采用 Fano 不等式进行缩放,获得 Top-N 可预测性。在真实世界数据上进行了广泛的实验,与现有方法相比,观察到了显著的改进。我们不仅完成了 N 目标的可预测性计算,而且获得了比现有方法更接近真值的可预测性。我们希望我们的结果能够协助这些需要 Top-N 可预测性量化要求的研究领域。