3D 语义分割区走向更深和更好的多视图特征融合 (Towards Deeper and Better Multi-view Feature Fusion for 3D Semantic Segmentation) - 专知论文

会员服务 ·

0

Better · 3D · Networking · Projection · Performer ·

2022 年 12 月 13 日

Towards Deeper and Better Multi-view Feature Fusion for 3D Semantic Segmentation

翻译：3D 语义分割区走向更深和更好的多视图特征融合

Chaolong Yang,Yuyao Yan,Weiguang Zhao,Jianan Ye,Xi Yang,Amir Hussain,Kaizhu Huang

3D point clouds are rich in geometric structure information, while 2D images contain important and continuous texture information. Combining 2D information to achieve better 3D semantic segmentation has become mainstream in 3D scene understanding. Albeit the success, it still remains elusive how to fuse and process the cross-dimensional features from these two distinct spaces. Existing state-of-the-art usually exploit bidirectional projection methods to align the cross-dimensional features and realize both 2D & 3D semantic segmentation tasks. However, to enable bidirectional mapping, this framework often requires a symmetrical 2D-3D network structure, thus limiting the network's flexibility. Meanwhile, such dual-task settings may distract the network easily and lead to over-fitting in the 3D segmentation task. As limited by the network's inflexibility, fused features can only pass through a decoder network, which affects model performance due to insufficient depth. To alleviate these drawbacks, in this paper, we argue that despite its simplicity, projecting unidirectionally multi-view 2D deep semantic features into the 3D space aligned with 3D deep semantic features could lead to better feature fusion. On the one hand, the unidirectional projection enforces our model focused more on the core task, i.e., 3D segmentation; on the other hand, unlocking the bidirectional to unidirectional projection enables a deeper cross-domain semantic alignment and enjoys the flexibility to fuse better and complicated features from very different spaces. In joint 2D-3D approaches, our proposed method achieves superior performance on the ScanNetv2 benchmark for 3D semantic segmentation.

翻译：3D点云层在几何结构信息方面丰富, 而 2D 图像包含重要且连续的纹理信息。将 2D 信息合并, 以更好地实现 3D 语义分解, 已经成为 3D 场景理解的主流。尽管成功, 但仍难以整合和处理这两个不同空间的跨维特性。现有最先进的双向投影方法通常会利用双向投影方法来匹配跨维空间特性, 并实现 2D 和 3D 语义分解任务。然而, 为了能够进行双向直流的直线直线直线直线绘图, 这个框架往往需要一个对称 2D-3 D 网络结构, 从而限制网络的灵活性。与此同时, 这种双轨设置可能会轻易地分散网络, 并导致三D 分解特性的超。由于网络的不灵活性能, 连线性能只能通过一个解码网络, 影响模型的性能。在本文中, 为了减轻这些反向的图理学, 我们认为, 直向双向双向多D 直径直路路路路路路路路路路的将 3, 将一个更好的直路路路路路路段。在3D 将3 直路基直路段将3 直路基直路行将3D 3 直路行将3 直路行将3 。

0

相关内容

Better

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

GTAT4和Myocardin相互作用调控心肌肥厚

国家自然科学基金

0+阅读 · 2014年12月31日

基于立体视觉的结构大变形全过程非接触动态测量方法

国家自然科学基金

0+阅读 · 2014年12月31日

基于深度信息和深度学习的车载视觉行人检测方法研究

国家自然科学基金

4+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

三维流形Heegaard分解稳定化问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

稀疏化、结构化和判别性约束的多视角行为识别方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

增殖抑制基因（HSG）协同顺铂促进人肺腺癌细胞线粒体途径凋亡的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

三维流形上的Heegaard分解及其在纽结理论中应用

国家自然科学基金

0+阅读 · 2011年12月31日

ATP和ROS在BCL-2基因抑癌活性中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

新型中红外激光晶体Er3＋:CaReAlO4(Re=Y,Gd)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Painting 3D Nature in 2D: View Synthesis of Natural Scenes from a Single Semantic Mask

Arxiv

0+阅读 · 2023年2月14日

GLFF: Global and Local Feature Fusion for Face Forgery Detection

Arxiv

0+阅读 · 2023年2月13日

A Deep Learning-based Global and Segmentation-based Semantic Feature Fusion Approach for Indoor Scene Classification

Arxiv

0+阅读 · 2023年2月13日

Learning Aligned Cross-modal Representations for Referring Image Segmentation

Arxiv

0+阅读 · 2023年2月10日

Sparse4D: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion

Arxiv

0+阅读 · 2023年2月10日

The Folded Pneumatic Artificial Muscle (foldPAM): Towards Programmability and Control via End Geometry

Arxiv

0+阅读 · 2023年2月9日

Multi-Task Learning for Visual Scene Understanding

Arxiv

29+阅读 · 2022年3月28日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

12+阅读 · 2018年2月15日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

VIP会员

文章信息

相关主题

相关VIP内容

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】在低维和高维空间中分析、建模和转换潜在表征

从无人机到数据：揭示边缘计算作为新作战域

可解释人工智能的基础

大规模视觉模型中的基于提示的适应：综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Painting 3D Nature in 2D: View Synthesis of Natural Scenes from a Single Semantic Mask

Arxiv

0+阅读 · 2023年2月14日

GLFF: Global and Local Feature Fusion for Face Forgery Detection

Arxiv

0+阅读 · 2023年2月13日

A Deep Learning-based Global and Segmentation-based Semantic Feature Fusion Approach for Indoor Scene Classification

Arxiv

0+阅读 · 2023年2月13日

Learning Aligned Cross-modal Representations for Referring Image Segmentation

Arxiv

0+阅读 · 2023年2月10日

Sparse4D: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion

Arxiv

0+阅读 · 2023年2月10日

The Folded Pneumatic Artificial Muscle (foldPAM): Towards Programmability and Control via End Geometry

Arxiv

0+阅读 · 2023年2月9日

Multi-Task Learning for Visual Scene Understanding

Arxiv

29+阅读 · 2022年3月28日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

12+阅读 · 2018年2月15日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

相关基金

GTAT4和Myocardin相互作用调控心肌肥厚

国家自然科学基金

0+阅读 · 2014年12月31日

基于立体视觉的结构大变形全过程非接触动态测量方法

国家自然科学基金

0+阅读 · 2014年12月31日

基于深度信息和深度学习的车载视觉行人检测方法研究

国家自然科学基金

4+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

三维流形Heegaard分解稳定化问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

稀疏化、结构化和判别性约束的多视角行为识别方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

增殖抑制基因（HSG）协同顺铂促进人肺腺癌细胞线粒体途径凋亡的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

三维流形上的Heegaard分解及其在纽结理论中应用

国家自然科学基金

0+阅读 · 2011年12月31日

ATP和ROS在BCL-2基因抑癌活性中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

新型中红外激光晶体Er3＋:CaReAlO4(Re=Y,Gd)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员