We present a novel framework, Spatial Pyramid Attention Network (SPAN) for detection and localization of multiple types of image manipulations. The proposed architecture efficiently and effectively models the relationship between image patches at multiple scales by constructing a pyramid of local self-attention blocks. The design includes a novel position projection to encode the spatial positions of the patches. SPAN is trained on a generic, synthetic dataset but can also be fine tuned for specific datasets; The proposed method shows significant gains in performance on standard datasets over previous state-of-the-art methods.
翻译:我们提出了一个新颖的框架,即空间金字塔注意网络(SPAN),用于探测和定位多种类型的图像操纵;拟议结构通过建造一个本地自用区块金字塔,有效和高效地模拟多尺度图像补丁之间的关系;设计包括一个新颖的位置预测,以编码补丁的空间位置;空间金字塔注意网(SPAN)接受通用合成数据集培训,但也可以对具体数据集进行微调;拟议方法显示,在标准数据集方面比以往最新方法取得显著成绩。