Recent studies demonstrated the vulnerability of control policies learned through deep reinforcement learning against adversarial attacks, raising concerns about the application of such models to risk-sensitive tasks such as autonomous driving. Threat models for these demonstrations are limited to (1) targeted attacks through real-time manipulation of the agent's observation, and (2) untargeted attacks through manipulation of the physical environment. The former assumes full access to the agent's states/observations at all times, while the latter has no control over attack outcomes. This paper investigates the feasibility of targeted attacks through visually learned patterns placed on physical object in the environment, a threat model that combines the practicality and effectiveness of the existing ones. Through analysis, we demonstrate that a pre-trained policy can be hijacked within a time window, e.g., performing an unintended self-parking, when an adversarial object is present. To enable the attack, we adopt an assumption that the dynamics of both the environment and the agent can be learned by the attacker. Lastly, we empirically show the effectiveness of the proposed attack on different driving scenarios, perform a location robustness test, and study the tradeoff between the attack strength and its effectiveness.
翻译:最近的研究显示,通过深入强化学习对抗性攻击,所学的控制政策十分脆弱,使人们对此类模式应用于自主驾驶等风险敏感任务表示关切。这些示范活动的威胁模式限于:(1) 通过实时操纵代理人的观察进行定向攻击,(2) 通过操纵自然环境进行非有针对性的攻击。前者假定随时都能接触代理人的状态/观察,而后者对攻击结果没有控制权。本文件调查了通过在环境中对实物物体设置的目视学模式进行定向攻击的可行性,这种威胁模式将现有目标的实用性和有效性结合起来。我们通过分析表明,在有对抗性物体存在时,事先训练的政策可以在一个时间窗口内被劫持,例如进行非故意的自我瞄准。我们假定,攻击者可以从攻击者那里学到环境和代理人的动态。最后,我们从经验上表明拟议攻击在不同驾驶情景上的效果,进行地点稳健度测试,并研究攻击强度和效力之间的权衡。