DAMO-StreamNet: 优化自主驾驶中的流式感知 (DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving) - 专知论文

会员服务 ·

0

通道 · 时间感知 · 语义特征 · 状态预测 · 语义空间 ·

2023 年 3 月 30 日

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving

翻译：DAMO-StreamNet: 优化自主驾驶中的流式感知

Jun-Yan He,Zhi-Qi Cheng,Chenyang Li,Wangmeng Xiang,Binghui Chen,Bin Luo,Yifeng Geng,Xuansong Xie

from arxiv, he source code is at https://shorturl.at/BJPZ6

Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that has yet to be thoroughly explored in existing research. To address this gap, we present DAMO-StreamNet, an optimized framework that combines recent advances from the YOLO series with a comprehensive analysis of spatial and temporal perception mechanisms, delivering a cutting-edge solution. The key innovations of DAMO-StreamNet are: (1) A robust neck structure incorporating deformable convolution, enhancing the receptive field and feature alignment capabilities. (2) A dual-branch structure that integrates short-path semantic features and long-path temporal features, improving motion state prediction accuracy. (3) Logits-level distillation for efficient optimization, aligning the logits of teacher and student networks in semantic space. (4) A real-time forecasting mechanism that updates support frame features with the current frame, ensuring seamless streaming perception during inference. Our experiments demonstrate that DAMO-StreamNet surpasses existing state-of-the-art methods, achieving 37.8% (normal size (600, 960)) and 43.3% (large size (1200, 1920)) sAP without using extra data. This work not only sets a new benchmark for real-time perception but also provides valuable insights for future research. Additionally, DAMO-StreamNet can be applied to various autonomous systems, such as drones and robots, paving the way for real-time perception.

翻译：实时感知或流式感知是自主驾驶的关键方面，在现有研究中尚未得到充分探索。为了解决这一问题，我们提出了DAMO-StreamNet，该优化框架将YOLO系列的最新进展与对空间和时间感知机制的全面分析相结合，实现了尖端解决方案。 DAMO-StreamNet的关键创新点是：（1）稳健的颈部结构，采用可变形卷积，提高感受野和特征对齐能力。（2）双分支结构，将短通道语义特征和长通道时间特征相结合，提高动态状态预测准确性。（3）分类层级蒸馏，实现高效优化，将教师和学生网络的logits（逻辑抽象）在语义空间中进行对齐。（4）实时预测机制，使用当前帧更新支持帧特征，确保推理过程中流式感知的无缝性。我们的实验表明，DAMO-StreamNet超越了现有的最先进方法，实现了37.8％（普通尺寸（600，960））和43.3％（大尺寸（1200，1920））的sAP，而不使用额外的数据。这项工作不仅树立了实时感知的新基准，还为未来的研究提供了有价值的见解。此外，DAMO-StreamNet还可以应用于各种自主系统，例如无人机和机器人，为实时感知铺平道路。

0

相关内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【ICML2021】通过文本生成统一视觉和语言任务

专知会员服务

19+阅读 · 2021年9月13日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

【CVPR 2019 | tutorial】自主汽车的感知、预测和大规模数据采集：Perception, Prediction, and Large Scale Data Collection for Autonomous Cars

【CVPR 2019 | tutorial】自主汽车的感知、预测和大规模数据采集：Perception, Prediction, and Large Scale Data Collection for Autonomous Cars

专知会员服务

33+阅读 · 2019年11月28日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

【泡泡汇总】CVPR2019 SLAM Paperlist

【泡泡汇总】CVPR2019 SLAM Paperlist

泡泡机器人SLAM

14+阅读 · 2019年6月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

泡泡机器人SLAM

10+阅读 · 2019年4月26日

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

泡泡机器人SLAM

23+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】RNN最新研究进展综述

【推荐】RNN最新研究进展综述

机器学习研究会

26+阅读 · 2018年1月6日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

全钒液流电池用高性能微纳分级孔CNF电极的设计制备及性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向高性能云平台的并行程序优化关键技术研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于视觉感知的HEVC优化策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向故障封闭的航空电子分区综合模型完整性理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于视觉的智能机器人场景理解方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于云计算的建筑全生命期BIM集成与应用关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

交通监控中面向行驶车辆的图像超分辨率重建方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于多参数模型的实时精密单点定位理论与方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

航空通信中NEMO网络路由优化技术和AAA机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

淫羊藿总黄酮调控骨性关节炎p38MAPK信号转导通路的研究

国家自然科学基金

0+阅读 · 2010年12月31日

Client Selection for Federated Policy Optimization with Environment Heterogeneity

Arxiv

0+阅读 · 2023年5月18日

Behavioral event detection and rate estimation for autonomous vehicle evaluation

Arxiv

0+阅读 · 2023年5月17日

Accessible Interfaces for the Development and Deployment of Robotic Platforms

Arxiv

0+阅读 · 2023年5月16日

Learning Continuous Control Policies for Information-Theoretic Active Perception

Arxiv

0+阅读 · 2023年5月16日

An Intelligent SDWN Routing Algorithm Based on Network Situational Awareness and Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年5月12日

Enabling Deep Learning on Edge Devices

Arxiv

19+阅读 · 2022年10月6日

On the Opportunities and Risks of Foundation Models

Arxiv

30+阅读 · 2021年8月18日

Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges

Arxiv

17+阅读 · 2021年7月10日

3D Object Detection for Autonomous Driving: A Survey

Arxiv

12+阅读 · 2021年6月21日

Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks

Arxiv

36+阅读 · 2020年5月24日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【ICML2021】通过文本生成统一视觉和语言任务

专知会员服务

19+阅读 · 2021年9月13日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

【CVPR 2019 | tutorial】自主汽车的感知、预测和大规模数据采集：Perception, Prediction, and Large Scale Data Collection for Autonomous Cars

【CVPR 2019 | tutorial】自主汽车的感知、预测和大规模数据采集：Perception, Prediction, and Large Scale Data Collection for Autonomous Cars

专知会员服务

33+阅读 · 2019年11月28日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

【泡泡汇总】CVPR2019 SLAM Paperlist

【泡泡汇总】CVPR2019 SLAM Paperlist

泡泡机器人SLAM

14+阅读 · 2019年6月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

泡泡机器人SLAM

10+阅读 · 2019年4月26日

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

泡泡机器人SLAM

23+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】RNN最新研究进展综述

【推荐】RNN最新研究进展综述

机器学习研究会

26+阅读 · 2018年1月6日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

Client Selection for Federated Policy Optimization with Environment Heterogeneity

Arxiv

0+阅读 · 2023年5月18日

Behavioral event detection and rate estimation for autonomous vehicle evaluation

Arxiv

0+阅读 · 2023年5月17日

Accessible Interfaces for the Development and Deployment of Robotic Platforms

Arxiv

0+阅读 · 2023年5月16日

Learning Continuous Control Policies for Information-Theoretic Active Perception

Arxiv

0+阅读 · 2023年5月16日

An Intelligent SDWN Routing Algorithm Based on Network Situational Awareness and Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年5月12日

Enabling Deep Learning on Edge Devices

Arxiv

19+阅读 · 2022年10月6日

On the Opportunities and Risks of Foundation Models

Arxiv

30+阅读 · 2021年8月18日

Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges

Arxiv

17+阅读 · 2021年7月10日

3D Object Detection for Autonomous Driving: A Survey

Arxiv

12+阅读 · 2021年6月21日

Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks

Arxiv

36+阅读 · 2020年5月24日

相关基金

全钒液流电池用高性能微纳分级孔CNF电极的设计制备及性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向高性能云平台的并行程序优化关键技术研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于视觉感知的HEVC优化策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向故障封闭的航空电子分区综合模型完整性理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于视觉的智能机器人场景理解方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于云计算的建筑全生命期BIM集成与应用关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

交通监控中面向行驶车辆的图像超分辨率重建方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于多参数模型的实时精密单点定位理论与方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

航空通信中NEMO网络路由优化技术和AAA机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

淫羊藿总黄酮调控骨性关节炎p38MAPK信号转导通路的研究

国家自然科学基金

0+阅读 · 2010年12月31日

微信扫码咨询专知VIP会员