基于按顺序排列的贝耶西亚最佳优化和对称比较的优先学习 (On Preference Learning Based on Sequential Bayesian Optimization with Pairwise Comparison)

User preference learning is generally a hard problem. Individual preferences are typically unknown even to users themselves, while the space of choices is infinite. Here we study user preference learning from information-theoretic perspective. We model preference learning as a system with two interacting sub-systems, one representing a user with his/her preferences and another one representing an agent that has to learn these preferences. The user with his/her behaviour is modeled by a parametric preference function. To efficiently learn the preferences and reduce search space quickly, we propose the agent that interacts with the user to collect the most informative data for learning. The agent presents two proposals to the user for evaluation, and the user rates them based on his/her preference function. We show that the optimum agent strategy for data collection and preference learning is a result of maximin optimization of the normalized weighted Kullback-Leibler (KL) divergence between true and agent-assigned predictive user response distributions. The resulting value of KL-divergence, which we also call remaining system uncertainty (RSU), provides an efficient performance metric in the absence of the ground truth. This metric characterises how well the agent can predict user and, thus, the quality of the underlying learned user (preference) model. Our proposed agent comprises sequential mechanisms for user model inference and proposal generation. To infer the user model (preference function), Bayesian approximate inference is used in the agent. The data collection strategy is to generate proposals, responses to which help resolving uncertainty associated with prediction of the user responses the most. The efficiency of our approach is validated by numerical simulations.

翻译：用户偏好通常是一个棘手的问题。个人偏好通常不为人知, 即使是用户本身也不为人知, 而选择的空间却无限。我们在这里研究用户偏好从信息理论角度学习。我们将偏好作为系统, 由两个互动的子系统来进行学习, 一个代表用户, 一个代表其偏好, 另一个代表一个必须学习这些偏好的代理。用户的行为以参数偏好功能为模型。为了高效学习偏好并迅速减少搜索空间, 我们建议与用户互动的代理商为学习收集最丰富的数据。该代理商向用户提出两项建议, 供评价用户使用, 用户根据他/ 她的偏好功能, 用户偏好在数据收集和代理商发送的预测响应分布之间差异最大化。 KL 由此得出的模型比重价值, 我们还称之为系统不确定性。代理商向用户提供一种高效的绩效衡量标准, 缺乏地面相关信息, 用户偏好地评估和偏好地评估的度战略。这个测量性战略用于用户的用户的排序比。在用户的排序中, 最精度, 使用使用用户的比分析工具分析分析工具分析分析工具分析分析工具分析分析工具分析分析分析分析分析分析工具分析分析分析分析分析工具分析工具分析分析分析工具分析分析分析分析工具分析分析分析分析分析分析工具分析分析分析分析工具分析分析工具分析分析分析分析分析分析分析分析工具分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析工具分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析分析