Multi-objective controller synthesis concerns the problem of computing an optimal controller subject to multiple (possibly conflicting) objective properties. The relative importance of objectives is often specified by human decision-makers. However, there is inherent uncertainty in human preferences (e.g., due to different preference elicitation methods). In this paper, we formalize the notion of uncertain human preferences and present a novel approach that accounts for uncertain human preferences in the multi-objective controller synthesis for Markov decision processes (MDPs). Our approach is based on mixed-integer linear programming (MILP) and synthesizes a sound, optimally permissive multi-strategy with respect to a multi-objective property and an uncertain set of human preferences. Experimental results on a range of large case studies show that our MILP-based approach is feasible and scalable to synthesize sound, optimally permissive multi-strategies with varying MDP model sizes and uncertainty levels of human preferences. Evaluation via an online user study also demonstrates the quality and benefits of synthesized (multi-)strategies.
翻译:多重目标控制者综合分析涉及在多种(可能相互冲突)客观特性下计算最佳控制器的问题,目标的相对重要性往往由人类决策者具体规定,然而,人类偏好中存在着内在的不确定性(例如,由于不同的偏好诱导方法)。在本文件中,我们正式确定了人类偏好不确定的概念,并提出了一个新颖的方法,在多目标控制器综合分析马可夫决定程序(MDPs)中考虑到人类偏好不确定的情况。我们的方法以混合整数线性编程(MILP)为基础,综合了对多目标性属性和一套不确定的人类偏好的健全、最佳宽松的多战略。一系列大型案例研究的实验结果表明,我们基于MILP的方法是可行和可扩展的,可以将健全、最佳许可的多战略与不同的MDP模型大小和人类偏好程度结合起来。通过在线用户研究进行的评价还展示了合成(多功能)战略的质量和效益。