Adversarial robustness evaluates the worst-case performance scenario of a machine learning model to ensure its safety and reliability. This study is the first to investigate the robustness of visually grounded dialog models towards textual attacks. These attacks represent a worst-case scenario where the input question contains a synonym which causes the previously correct model to return a wrong answer. Using this scenario, we first aim to understand how multimodal input components contribute to model robustness. Our results show that models which encode dialog history are more robust, and when launching an attack on history, model prediction becomes more uncertain. This is in contrast to prior work which finds that dialog history is negligible for model performance on this task. We also evaluate how to generate adversarial test examples which successfully fool the model but remain undetected by the user/software designer. We find that the textual, as well as the visual context are important to generate plausible worst-case scenarios.
翻译:Aversarial 稳健性评估了机器学习模型最坏的性能情景,以确保其安全和可靠性。 本研究是第一个调查视觉基础对话模型对文本攻击的稳健性的研究。 这些袭击代表了最坏的情景, 输入问题包含同义词, 导致先前正确的模型返回错误的答案。 我们首先使用这个情景, 我们首先要了解多式输入组件如何促进模型稳健性。 我们的结果表明, 输入对话框历史的模型更加稳健, 当对历史发动攻击时, 模型预测会变得更不确定。 这与先前发现对话历史对于模型的性能微不足道的工作不同。 我们还评估如何生成对抗性测试范例, 成功愚弄模型, 但仍不被用户/ 软件设计者发现。 我们认为, 文本以及视觉环境对于产生可信的最坏情况情景非常重要 。