Convolutional neural networks have made significant progresses in edge detection by progressively exploring the context and semantic features. However, local details are gradually suppressed with the enlarging of receptive fields. Recently, vision transformer has shown excellent capability in capturing long-range dependencies. Inspired by this, we propose a novel transformer-based edge detector, \emph{Edge Detection TransformER (EDTER)}, to extract clear and crisp object boundaries and meaningful edges by exploiting the full image context information and detailed local cues simultaneously. EDTER works in two stages. In Stage I, a global transformer encoder is used to capture long-range global context on coarse-grained image patches. Then in Stage II, a local transformer encoder works on fine-grained patches to excavate the short-range local cues. Each transformer encoder is followed by an elaborately designed Bi-directional Multi-Level Aggregation decoder to achieve high-resolution features. Finally, the global context and local cues are combined by a Feature Fusion Module and fed into a decision head for edge prediction. Extensive experiments on BSDS500, NYUDv2, and Multicue demonstrate the superiority of EDTER in comparison with state-of-the-arts.
翻译:通过逐步探索上下文和语义特征,电磁神经网络在边缘探测方面取得了显著进展。然而,随着接收场的扩大,局部细节逐渐被抑制。最近,视觉变压器在捕捉远程依赖性方面表现出极强的能力。受此启发,我们提议了一个新的变压器边缘探测器,\emph{Edge 探测变形器(EDTER)},以便通过同时利用完整图像背景信息和详细的本地提示来提取清晰和精确的物体边界和有意义的边缘。EDTER工作分两个阶段进行。在第一阶段,使用全球变压器编码器来捕捉在粗重成图像补的长宽广的全球环境。随后在第二阶段,我们提出一个本地变压器变压器变压器在精细的边缘探测器上工作,以挖掘短程的当地导线。在每一变压器后,将精心设计的双向多层图像聚合分解器用于实现高分辨率特征。最后,全球背景和局部导线将全球变压器用来捕捉粗图像。随后,将一个Geal-FI-Florizal 的模型和FIDDDS 的深度模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟试验和模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟试验,将一个模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟和模拟模拟模拟模拟试验,将一个模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟试验并入入成一个模拟试验, 和模拟模拟的模拟的模拟的模拟的模拟试验。