深度变形器: 多模态位置编码和交叉输入注意力用于基于变形器的分割网络 (DepthFormer: Multimodal Positional Encodings and Cross-Input Attention for Transformer-Based Segmentation Networks) - 专知论文

会员服务 ·

0

变形 · 位置编码 · 分割 · 深度信息 · 模态 ·

2023 年 3 月 27 日

DepthFormer: Multimodal Positional Encodings and Cross-Input Attention for Transformer-Based Segmentation Networks

翻译：深度变形器: 多模态位置编码和交叉输入注意力用于基于变形器的分割网络

Francesco Barbato,Giulia Rizzoli,Pietro Zanuttigh

from arxiv, Accepted at ICASSP 2023

Most approaches for semantic segmentation use only information from color cameras to parse the scenes, yet recent advancements show that using depth data allows to further improve performances. In this work, we focus on transformer-based deep learning architectures, that have achieved state-of-the-art performances on the segmentation task, and we propose to employ depth information by embedding it in the positional encoding. Effectively, we extend the network to multimodal data without adding any parameters and in a natural way that makes use of the strength of transformers' self-attention modules. We also investigate the idea of performing cross-modality operations inside the attention module, swapping the key inputs between the depth and color branches. Our approach consistently improves performances on the Cityscapes benchmark.

翻译：大多数语义分割方法仅利用彩色相机的信息来解析场景，但最近的研究表明利用深度信息可以进一步提高性能。在这项工作中，我们专注于基于变形器的深度学习架构，这些架构在分割任务上取得了最先进的性能，并且我们提出通过将深度信息嵌入位置编码来使用它。实际上，我们可以在不添加任何参数的情况下自然地将网络扩展到多模态数据，并且利用变形器自我注意力模块的优点。我们还研究了在注意模块内执行跨模态操作的想法，即在深度和颜色分支之间交换键输入。我们的方法可以在Cityscapes基准测试中始终提高性能。

0

相关内容

用于识别任务的视觉 Transformer 综述

用于识别任务的视觉 Transformer 综述

专知会员服务

74+阅读 · 2023年2月25日

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

近期必读的5篇顶会ICCV 2021【语义分割】相关论文和代码

专知会员服务

43+阅读 · 2021年8月20日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

专知会员服务

76+阅读 · 2020年4月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

专知

13+阅读 · 2018年4月4日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

【论文推荐】最新5篇视觉目标跟踪相关论文—递归神经网络、深度适应计算策略、视觉目标跟踪基准、深度核化相关滤波、检测并跟踪

【论文推荐】最新5篇视觉目标跟踪相关论文—递归神经网络、深度适应计算策略、视觉目标跟踪基准、深度核化相关滤波、检测并跟踪

专知

14+阅读 · 2018年1月22日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

深度视觉的神经群体编码机制

国家自然科学基金

0+阅读 · 2014年12月31日

动态纹理建模与应用的张量方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于图片的交互式建筑物建模

国家自然科学基金

0+阅读 · 2012年12月31日

水轮机旋转湍流全欧拉并行多层网格模拟研究

国家自然科学基金

0+阅读 · 2012年12月31日

CyclinE2-3'UTR竞争性结合miR-30e上调Notch1促进鼻咽癌转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

神经信息流与突触传递可塑性

国家自然科学基金

0+阅读 · 2011年12月31日

仿真视觉系统多通道并行异构神经网络的目标识别算法研究

国家自然科学基金

2+阅读 · 2011年12月31日

复杂几何流道内运动变形气泡与热质传递之间的相互作用机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

句子语义的视觉表示研究

国家自然科学基金

4+阅读 · 2009年12月31日

苯白血病相关造血干细胞恶性转化的信号转导通路和预防新靶点的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Concurrent Misclassification and Out-of-Distribution Detection for Semantic Segmentation via Energy-Based Normalizing Flow

Arxiv

0+阅读 · 2023年5月16日

Leveraging Deep Learning and Digital Twins to Improve Energy Performance of Buildings

Arxiv

0+阅读 · 2023年5月16日

Enhancing the Performance of Transformer-based Spiking Neural Networks by SNN-optimized Downsampling with Precise Gradient Backpropagation

Enhancing the Performance of Transformer-based Spiking Neural Networks by SNN-optimized Downsampling with Precise Gradient Backpropagation

Arxiv

0+阅读 · 2023年5月16日

DualGenerator: Information Interaction-based Generative Network for Point Cloud Completion

Arxiv

0+阅读 · 2023年5月16日

Self-supervised Implicit Glyph Attention for Text Recognition

Arxiv

0+阅读 · 2023年5月15日

Shared and Private Information Learning in Multimodal Sentiment Analysis with Deep Modal Alignment and Self-supervised Multi-Task Learning

Arxiv

0+阅读 · 2023年5月15日

MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation

Arxiv

0+阅读 · 2023年5月15日

SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation

Arxiv

0+阅读 · 2023年5月15日

Pyramid Fusion Transformer for Semantic Segmentation

Arxiv

0+阅读 · 2023年5月14日

Self-Attention with Relative Position Representations

Arxiv

27+阅读 · 2018年4月12日

VIP会员

文章信息

相关主题

相关VIP内容

用于识别任务的视觉 Transformer 综述

用于识别任务的视觉 Transformer 综述

专知会员服务

74+阅读 · 2023年2月25日

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

近期必读的5篇顶会ICCV 2021【语义分割】相关论文和代码

专知会员服务

43+阅读 · 2021年8月20日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

专知会员服务

76+阅读 · 2020年4月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

专知

13+阅读 · 2018年4月4日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

【论文推荐】最新5篇视觉目标跟踪相关论文—递归神经网络、深度适应计算策略、视觉目标跟踪基准、深度核化相关滤波、检测并跟踪

【论文推荐】最新5篇视觉目标跟踪相关论文—递归神经网络、深度适应计算策略、视觉目标跟踪基准、深度核化相关滤波、检测并跟踪

专知

14+阅读 · 2018年1月22日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

相关论文

Concurrent Misclassification and Out-of-Distribution Detection for Semantic Segmentation via Energy-Based Normalizing Flow

Arxiv

0+阅读 · 2023年5月16日

Leveraging Deep Learning and Digital Twins to Improve Energy Performance of Buildings

Arxiv

0+阅读 · 2023年5月16日

Enhancing the Performance of Transformer-based Spiking Neural Networks by SNN-optimized Downsampling with Precise Gradient Backpropagation

Enhancing the Performance of Transformer-based Spiking Neural Networks by SNN-optimized Downsampling with Precise Gradient Backpropagation

Arxiv

0+阅读 · 2023年5月16日

DualGenerator: Information Interaction-based Generative Network for Point Cloud Completion

Arxiv

0+阅读 · 2023年5月16日

Self-supervised Implicit Glyph Attention for Text Recognition

Arxiv

0+阅读 · 2023年5月15日

Shared and Private Information Learning in Multimodal Sentiment Analysis with Deep Modal Alignment and Self-supervised Multi-Task Learning

Arxiv

0+阅读 · 2023年5月15日

MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation

Arxiv

0+阅读 · 2023年5月15日

SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation

Arxiv

0+阅读 · 2023年5月15日

Pyramid Fusion Transformer for Semantic Segmentation

Arxiv

0+阅读 · 2023年5月14日

Self-Attention with Relative Position Representations

Arxiv

27+阅读 · 2018年4月12日

相关基金

深度视觉的神经群体编码机制

国家自然科学基金

0+阅读 · 2014年12月31日

动态纹理建模与应用的张量方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于图片的交互式建筑物建模

国家自然科学基金

0+阅读 · 2012年12月31日

水轮机旋转湍流全欧拉并行多层网格模拟研究

国家自然科学基金

0+阅读 · 2012年12月31日

CyclinE2-3'UTR竞争性结合miR-30e上调Notch1促进鼻咽癌转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

神经信息流与突触传递可塑性

国家自然科学基金

0+阅读 · 2011年12月31日

仿真视觉系统多通道并行异构神经网络的目标识别算法研究

国家自然科学基金

2+阅读 · 2011年12月31日

复杂几何流道内运动变形气泡与热质传递之间的相互作用机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

句子语义的视觉表示研究

国家自然科学基金

4+阅读 · 2009年12月31日

苯白血病相关造血干细胞恶性转化的信号转导通路和预防新靶点的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员