Human-Object Interaction (HOI) detection is crucial for robot-human assistance, enabling context-aware support. However, models trained on clean datasets degrade in real-world conditions due to unforeseen corruptions, leading to inaccurate predictions. To address this, we introduce the first robustness benchmark for HOI detection, evaluating model resilience under diverse challenges. Despite advances, current models struggle with environmental variability, occlusions, and noise. Our benchmark, RoHOI, includes 20 corruption types based on the HICO-DET and V-COCO datasets and a new robustness-focused metric. We systematically analyze existing models in the HOI field, revealing significant performance drops under corruptions. To improve robustness, we propose a Semantic-Aware Masking-based Progressive Learning (SAMPL) strategy to guide the model to be optimized based on holistic and partial cues, thus dynamically adjusting the model's optimization to enhance robust feature learning. Extensive experiments show that our approach outperforms state-of-the-art methods, setting a new standard for robust HOI detection. Benchmarks, datasets, and code are available at https://github.com/KratosWen/RoHOI.
翻译:人-物交互检测对于机器人-人辅助至关重要,能够实现情境感知的支持。然而,在干净数据集上训练的模型在现实世界条件下会因不可预见的损坏而性能下降,导致预测不准确。为解决这一问题,我们首次引入了人-物交互检测的鲁棒性基准,用于评估模型在多种挑战下的抗干扰能力。尽管已有进展,现有模型在处理环境变化、遮挡和噪声方面仍存在困难。我们的基准RoHOI基于HICO-DET和V-COCO数据集,包含了20种损坏类型以及一种新的以鲁棒性为中心的度量标准。我们系统分析了人-物交互领域的现有模型,揭示了其在损坏条件下性能显著下降的问题。为提高鲁棒性,我们提出了一种基于语义感知掩码的渐进式学习策略,以引导模型基于整体和局部线索进行优化,从而动态调整模型的优化过程以增强鲁棒特征学习。大量实验表明,我们的方法优于现有最先进方法,为鲁棒的人-物交互检测设立了新标准。基准、数据集和代码可在 https://github.com/KratosWen/RoHOI 获取。