We study the problem of eliciting the preferences of a decision-maker through a moderate number of pairwise comparison queries to make them a high quality recommendation for a specific problem. We are motivated by applications in high stakes domains, such as when choosing a policy for allocating scarce resources to satisfy basic needs (e.g., kidneys for transplantation or housing for those experiencing homelessness) where a consequential recommendation needs to be made from the (partially) elicited preferences. We model uncertainty in the preferences as being set based and} investigate two settings: a) an offline elicitation setting, where all queries are made at once, and b) an online elicitation setting, where queries are selected sequentially over time in an adaptive fashion. We propose robust optimization formulations of these problems which integrate the preference elicitation and recommendation phases with aim to either maximize worst-case utility or minimize worst-case regret, and study their complexity. For the offline case, where active preference elicitation takes the form of a two and half stage robust optimization problem with decision-dependent information discovery, we provide an equivalent reformulation in the form of a mixed-binary linear program which we solve via column-and-constraint generation. For the online setting, where active preference learning takes the form of a multi-stage robust optimization problem with decision-dependent information discovery, we propose a conservative solution approach. Numerical studies on synthetic data demonstrate that our methods outperform state-of-the art approaches from the literature in terms of worst-case rank, regret, and utility. We showcase how our methodology can be used to assist a homeless services agency in choosing a policy for allocating scarce housing resources of different types to people experiencing homelessness.
翻译:我们研究如何通过适度数量的配对比较查询来吸引决策者的偏好,从而使他们成为针对具体问题的高质量建议。我们受到高利贷领域的应用的激励,例如选择分配稀缺资源以满足基本需求(例如,用于移植的肾脏或为无家可归者提供住房)的政策时,需要从(部分)获得的偏好中提出相应的建议。我们以基于决定的偏好为模型,调查两种环境:(a) 离线引出环境,即所有查询都是一次性的,和(b) 在线引出环境,在高利贷领域按时间顺序选择查询。我们提出这些问题的优化提法,将偏好激发和建议阶段结合起来,目的是最大限度地发挥最坏的效用或尽量减少最坏的遗憾,并研究其复杂性。对于离线案例,积极偏好以两个和半阶段的优化为形式,通过基于决定的信息发现,我们以混合的分级线程序的形式进行重组,我们通过分行和分行选择的直线程序,以适应时间顺序选择询问。我们提出这些问题的优化提法,我们用稳妥的方法,我们用高利贷的方法,我们用多级的排序的方法,我们学习多级研究。我们的方法,学习一个动态的模型,学习一个动态的方法,让我们的学习一个动态的模型, 学习一个动态的方法, 学习一个动态的模型,我们用多级的方法,我们用一个动态的生成的生成的模型的生成的生成的生成式的方法,我们的方法。