With the increasing influence of social media, online misinformation has grown to become a societal issue. The motivation for our work comes from the threat caused by cheapfakes, where an unaltered image is described using a news caption in a new but false-context. The main challenge in detecting such out-of-context multimedia is the unavailability of large-scale datasets. Several detection methods employ randomly selected captions to generate out-of-context training inputs. However, these randomly matched captions are not truly representative of out-of-context scenarios due to inconsistencies between the image description and the matched caption. We aim to address these limitations by introducing a novel task of out-of-context caption generation. In this work, we propose a new method that generates a realistic out-of-context caption given visual and textual context. We also demonstrate that the semantics of the generated captions can be controlled using the textual context. We also evaluate our method against several baselines and our method improves over the image captioning baseline by 6.2% BLUE-4, 2.96% CiDEr, 11.5% ROUGE, and 7.3% METEOR
翻译:暂无翻译