Safe interaction between vehicles requires the ability to choose actions that reveal the preferences of the other vehicles. Since exploratory actions often do not directly contribute to their objective, an interactive vehicle must also able to identify when it is appropriate to perform them. In this work we demonstrate how Active Learning methods can be used to incentivise an autonomous vehicle (AV) to choose actions that reveal information about the altruistic inclinations of another vehicle. We identify a property, Information Sufficiency, that a reward function should have in order to keep exploration from unnecessarily interfering with the pursuit of an objective. We empirically demonstrate that reward functions that do not have Information Sufficiency are prone to inadequate exploration, which can result in sub-optimal behaviour. We propose a reward definition that has Information Sufficiency, and show that it facilitates an AV choosing exploratory actions to estimate altruistic tendency, whilst also compensating for the possibility of conflicting beliefs between vehicles.
翻译:车辆之间安全的互动要求能够选择显示其他车辆偏好的行动。由于探索性行动往往不能直接促进它们的目标,互动性车辆还必须能够确定何时适合进行探索。在这项工作中,我们展示如何利用积极学习方法激励自主车辆(AV)选择行动以披露关于另一车辆利他倾向的信息。我们确定了一种财产,即信息充足性,即奖励功能应具有何种财产,以保持勘探不会不必要地干扰追求某一目标。我们从经验上证明,不具备信息充足性的奖励功能容易受到不充分的探索,从而可能导致次优行为。我们提出了一个具有信息充足性的奖励定义,并表明它有利于AV选择探索性行动来估计利他倾向,同时对车辆之间信仰冲突的可能性进行补偿。