Resource scheduling and coordination is an NP-hard optimization requiring an efficient allocation of agents to a set of tasks with upper- and lower bound temporal and resource constraints. Due to the large-scale and dynamic nature of resource coordination in hospitals and factories, human domain experts manually plan and adjust schedules on the fly. To perform this job, domain experts leverage heterogeneous strategies and rules-of-thumb honed over years of apprenticeship. What is critically needed is the ability to extract this domain knowledge in a heterogeneous and interpretable apprenticeship learning framework to scale beyond the power of a single human expert, a necessity in safety-critical domains. We propose a personalized and interpretable apprenticeship scheduling algorithm that infers an interpretable representation of all human task demonstrators by extracting decision-making criteria via an inferred, personalized embedding non-parametric in the number of demonstrator types. We achieve near-perfect LfD accuracy in synthetic domains and 88.22\% accuracy on a planning domain with real-world, outperforming baselines. Finally, our user study showed our methodology produces more interpretable and easier-to-use models than neural networks ($p < 0.05$).
翻译:资源安排和协调是NP的硬性优化,需要将代理人员有效分配到具有上下约束时间和资源限制的一组任务中。由于医院和工厂资源协调的大规模和动态性质,人类领域专家人工规划和调整飞行时间表。为了完成这项工作,域专家利用多年学徒制磨练的各种战略和拖网规则。最迫切需要的是能够在一个多样化和可解释的学徒学习框架中提取这一域知识,使其超越单一人类专家的力量,成为安全关键领域的必要。我们建议一种个性化和可解释的学徒时间安排算法,通过推理、个性化地嵌入演示器类型中的非参数来推断出所有人类任务示威者的可解释代表性。我们在合成领域实现接近效果的LfD准确性,在现实世界的规划领域实现88.22 ⁇ 精确性,低于基线。最后,我们的用户研究表明,我们的方法产生比神经网络更易解释和更容易使用的模型( < 0.05美元)。