面向对象的映射对于场景理解非常重要,因为它们共同捕获几何和语义,允许对对象进行单独的实例化和有意义的推理。我们介绍了FroDO,这是一种从RGB视频中精确重建物体实例的方法,它以一种由粗到细的方式推断出物体的位置、姿态和形状。FroDO的关键是将对象形状嵌入到一个新的学习空间中,允许在稀疏点云和稠密DeepSDF解码之间进行无缝切换。给定一个局部的RGB帧的输入序列,FroDO首先聚合2D检测,为每个对象实例化一个分类感知的3D包围框。在利用稀疏和稠密形状表示进一步优化形状和姿态之前,使用编码器网络对形状代码进行回归。优化使用多视图几何,光度和剪影损失。我们对真实世界的数据集进行评估,包括Pix3D、Redwood-OS和ScanNet,用于单视图、多视图和多对象重建。

成为VIP会员查看完整内容
0
20

相关内容

CVPR is the premier annual computer vision event comprising the main conference and several co-located workshops and short courses. With its high quality and low cost, it provides an exceptional value for students, academics and industry researchers. CVPR 2020 will take place at The Washington State Convention Center in Seattle, WA, from June 16 to June 20, 2020. http://cvpr2020.thecvf.com/

End-to-End Object Detection with Transformers

论文:https://arxiv.org/abs/2005.12872

代码:https://github.com/facebookresearch/detr

本文已提交至ECCV 2020,作者团队:Facebook AI Research。FAIR提出DETR:基于Transformers的端到端目标检测,没有NMS后处理步骤、真正的没有anchor,直接对标且超越Faster R-CNN,代码刚刚开源!

注:开源24小时,star数已高达700+!

简介

本文提出了一种将目标检测视为direct set直接集合预测问题的新方法。我们的方法简化了检测流程,有效地消除了对许多手工设计的组件的需求,例如非极大值抑制(NMS)或锚点生成,这些组件明确编码了我们对任务的先验知识。

这种称为Detection Transformer或DETR的新框架的主要组成部分是基于集合的全局损失函数,该损失函数通过二分匹配和transformer编码器-解码器体系结构强制进行唯一的预测。给定一个固定的学习对象查询的小集合,DETR会考虑目标对象与全局图像上下文之间的关系,并直接并行输出最终的预测集合。

与许多其他现代检测器不同,新模型在概念上很简单,并且不需要专门的库。DETR与具有挑战性的COCO对象检测数据集上公认的且高度优化的Faster R-CNN baseline具有同等的准确性和运行时性能。此外,可以很容易地将DETR迁移到其他任务例如全景分割。

本文的Detection Transformer(DETR,请参见图1)可以预测所有物体的剧烈运动,并通过设置损失函数进行端到端训练,该函数可以在预测的物体与地面真实物体之间进行二分匹配。DETR通过删除多个手工设计的后处理过程例如nms,对先验知识进行编码的组件来简化检测流程。与大多数现有的检测方法不同,DETR不需要任何自定义层,因此可以在包含标准CNN和转换器类的任何框架中轻松复制。

DETR的主要特征是二分匹配损失和具有(非自回归)并行解码的Transformer的结合。

参考: https://mp.weixin.qq.com/s/b5Ont9vHPeCPnAjuDGv5Bg

成为VIP会员查看完整内容
0
41

The task of detecting 3D objects in point cloud has a pivotal role in many real-world applications. However, 3D object detection performance is behind that of 2D object detection due to the lack of powerful 3D feature extraction methods. In order to address this issue, we propose to build a 3D backbone network to learn rich 3D feature maps by using sparse 3D CNN operations for 3D object detection in point cloud. The 3D backbone network can inherently learn 3D features from almost raw data without compressing point cloud into multiple 2D images and generate rich feature maps for object detection. The sparse 3D CNN takes full advantages of the sparsity in the 3D point cloud to accelerate computation and save memory, which makes the 3D backbone network achievable. Empirical experiments are conducted on the KITTI benchmark and results show that the proposed method can achieve state-of-the-art performance for 3D object detection.

0
7
下载
预览

This paper aims at developing a faster and a more accurate solution to the amodal 3D object detection problem for indoor scenes. It is achieved through a novel neural network that takes a pair of RGB-D images as the input and delivers oriented 3D bounding boxes as the output. The network, named 3D-SSD, composed of two parts: hierarchical feature fusion and multi-layer prediction. The hierarchical feature fusion combines appearance and geometric features from RGB-D images while the multi-layer prediction utilizes multi-scale features for object detection. As a result, the network can exploit 2.5D representations in a synergetic way to improve the accuracy and efficiency. The issue of object sizes is addressed by attaching a set of 3D anchor boxes with varying sizes to every location of the prediction layers. At the end stage, the category scores for 3D anchor boxes are generated with adjusted positions, sizes and orientations respectively, leading to the final detections using non-maximum suppression. In the training phase, the positive samples are identified with the aid of 2D ground truth to avoid the noisy estimation of depth from raw data, which guide to a better converged model. Experiments performed on the challenging SUN RGB-D dataset show that our algorithm outperforms the state-of-the-art Deep Sliding Shape by 10.2% mAP and 88x faster. Further, experiments also suggest our approach achieves comparable accuracy and is 386x faster than the state-of-art method on the NYUv2 dataset even with a smaller input image size.

0
8
下载
预览
小贴士
相关VIP内容
相关资讯
【泡泡点云时空】基于分割方法的物体六维姿态估计
泡泡机器人SLAM
16+阅读 · 2019年9月15日
已删除
将门创投
8+阅读 · 2019年6月13日
【泡泡一分钟】点云到网格的回归算法实现
泡泡机器人SLAM
6+阅读 · 2018年11月23日
理解 YOLO 目标检测
AI研习社
10+阅读 · 2018年11月5日
相关论文
Jiangmiao Pang,Kai Chen,Jianping Shi,Huajun Feng,Wanli Ouyang,Dahua Lin
6+阅读 · 2019年4月4日
Xuesong Li,Jose E Guivant,Ngaiming Kwok,Yongzhi Xu
7+阅读 · 2019年1月24日
Learning Discriminative Motion Features Through Detection
Gedas Bertasius,Christoph Feichtenhofer,Du Tran,Jianbo Shi,Lorenzo Torresani
3+阅读 · 2018年12月11日
Joint Monocular 3D Vehicle Detection and Tracking
Hou-Ning Hu,Qi-Zhi Cai,Dequan Wang,Ji Lin,Min Sun,Philipp Krähenbühl,Trevor Darrell,Fisher Yu
8+阅读 · 2018年12月2日
Zero-Shot Object Detection
Ankan Bansal,Karan Sikka,Gaurav Sharma,Rama Chellappa,Ajay Divakaran
8+阅读 · 2018年7月27日
Alexander Grabner,Peter M. Roth,Vincent Lepetit
7+阅读 · 2018年3月30日
Ervin Teng,Rui Huang,Bob Iannucci
7+阅读 · 2018年3月27日
Martin Simon,Stefan Milz,Karl Amende,Horst-Michael Gross
3+阅读 · 2018年3月16日
Qianhui Luo,Huifang Ma,Yue Wang,Li Tang,Rong Xiong
8+阅读 · 2018年2月21日
Nils Bore,Patric Jensfelt,John Folkesson
6+阅读 · 2018年1月28日
Top