Classification of AI-manipulated content is receiving great attention, for distinguishing different types of manipulations. Most of the methods developed so far fail in the open-set scenario, that is when the algorithm used for the manipulation is not represented by the training set. In this paper, we focus on the classification of synthetic face generation and manipulation in open-set scenarios, and propose a method for classification with a rejection option. The proposed method combines the use of Vision Transformers (ViT) with a hybrid approach for simultaneous classification and localization. Feature map correlation is exploited by the ViT module, while a localization branch is employed as an attention mechanism to force the model to learn per-class discriminative features associated with the forgery when the manipulation is performed locally in the image. Rejection is performed by considering several strategies and analyzing the model output layers. The effectiveness of the proposed method is assessed for the task of classification of facial attribute editing and GAN attribution.
翻译:摘要:人工智能图像处理的分类技术旨在区分不同类型的图像处理。但是,大部分方法往往无法处理开放集情形,也就是模型无法识别不属于训练集的算法。本文聚焦于开放集场景下的合成人脸生成和操作分类,并提出一种具有拒绝选项的分类方法。该方法通过ViT模块和混合体系结构实现了同时分类和本地化。通过ViT模块对特征映射进行相关性分析,而考虑到本地图像操作时,就采用了本地化分支作为注意力机制,以便模型学习与伪造类别相关的判别特征。对模型输出层进行多种策略和分析,实现了拒绝操作。本文的方法在面部属性编辑和基于GAN的鉴定任务中有效。