The current state-of-the-art in multi-objective optimization assumes either a given utility function, learns a utility function interactively or tries to determine the complete Pareto front, requiring a post elicitation of the preferred result. However, result elicitation in real world problems is often based on implicit and explicit expert knowledge, making it difficult to define a utility function, whereas interactive learning or post elicitation requires repeated and expensive expert involvement. To mitigate this, we learn a utility function offline, using expert knowledge by means of preference learning. In contrast to other works, we do not only use (pairwise) result preferences, but also coarse information about the utility function space. This enables us to improve the utility function estimate, especially when using very few results. Additionally, we model the occurring uncertainties in the utility function learning task and propagate them through the whole optimization chain. Our method to learn a utility function eliminates the need of repeated expert involvement while still leading to high-quality results. We show the sample efficiency and quality gains of the proposed method in 4 domains, especially in cases where the surrogate utility function is not able to exactly capture the true expert utility function. We also show that to obtain good results, it is important to consider the induced uncertainties and analyze the effect of biased samples, which is a common problem in real world domains.
翻译:在多目标优化中,当前最先进的多目标优化技术要么假设特定公用事业功能,通过互动学习一种通用功能,要么试图确定完整的Pareto前端,要求事后引出优选结果。然而,现实世界问题的结果往往基于隐含和明确的专业知识,因此很难界定公用事业功能,而互动学习或后引则需要反复和昂贵的专家参与。为了减轻这一影响,我们利用专家知识,通过偏好学习,学习一种离线的通用功能。与其他工程不同,我们不仅使用(偏好)结果偏好,而且粗略的关于公用事业功能空间的信息。这使我们能够改进公用事业功能的估算,特别是在很少使用结果的情况下。此外,我们模拟公用事业功能学习任务中出现的不确定性,并通过整个优化链加以宣传。我们学习效用功能的方法消除了专家反复参与的需要,同时仍然能带来高质量的成果。我们展示了拟议方法在4个领域的抽样效率和质量收益,特别是在无法准确捕捉到真正的专家效用功能的情况下。我们还在分析一个共同的模型时,要表明获得良好的世界效益。