Monocular Depth Estimation (MDE) serves as a core perception module in autonomous driving systems, but it remains highly susceptible to adversarial attacks. Errors in depth estimation may propagate through downstream decision making and influence overall traffic safety. Existing physical attacks primarily rely on texture-based patches, which impose strict placement constraints and exhibit limited realism, thereby reducing their effectiveness in complex driving environments. To overcome these limitations, this work introduces a training-free generative adversarial attack framework that generates naturalistic, scene-consistent adversarial objects via a diffusion-based conditional generation process. The framework incorporates a Salient Region Selection module that identifies regions most influential to MDE and a Jacobian Vector Product Guidance mechanism that steers adversarial gradients toward update directions supported by the pre-trained diffusion model. This formulation enables the generation of physically plausible adversarial objects capable of inducing substantial adversarial depth shifts. Extensive digital and physical experiments demonstrate that our method significantly outperforms existing attacks in effectiveness, stealthiness, and physical deployability, underscoring its strong practical implications for autonomous driving safety assessment.
翻译:单目深度估计(MDE)是自动驾驶系统中的核心感知模块,但其仍然极易受到对抗攻击。深度估计中的误差可能通过下游决策传播并影响整体交通安全。现有的物理攻击主要依赖基于纹理的补丁,这些方法对放置位置有严格限制且真实感有限,从而降低了其在复杂驾驶环境中的有效性。为克服这些局限,本研究提出了一种免训练的生成式对抗攻击框架,通过基于扩散的条件生成过程生成自然、与场景一致的对抗物体。该框架包含一个显著区域选择模块,用于识别对MDE影响最大的区域,以及一个雅可比向量积引导机制,将对抗梯度导向预训练扩散模型支持的更新方向。此方法能够生成物理上合理的对抗物体,这些物体能够引发显著的对抗性深度偏移。大量的数字与物理实验表明,我们的方法在攻击有效性、隐蔽性和物理可部署性上均显著优于现有攻击,凸显了其对自动驾驶安全评估的重要实践意义。