VALSE Webinar 22-06期总第272期我要找到你：2D/3D物体检测和定位

2022 年 3 月 17 日 VALSE

报告时间	2022年03月23日 (星期三) 晚上20:00 (北京时间)
主题	我要找到你：2D/3D物体检测和定位
主持人	欧阳万里 (悉尼大学) 周晓巍 (浙江大学)
直播地址	https://live.bilibili.com/22300737

报告嘉宾：贺通 (上海人工智能实验室)

报告题目：3D instance segmentation with dynamic convolution

报告嘉宾：陈挺 (Google Brain)

报告题目：Pix2seq: A Language Modeling Framework for Object Detection

Panel嘉宾：

贺通 (上海人工智能实验室)、陈挺 (Google Brain)、代季峰 (商汤科技) 、王鹤 (Peking University)、张兆翔 (中国科学院自动化研究所)

Panel议题：

1. 自然语言领域的科研进展会给计算机视觉带来哪些新的灵感？

2. Transformer在物体检测和定位领域有哪些应用和研究前景？

3. 2D/3D视觉理解中有哪些新方向？

4. 如何减少detection所需要的监督？

5. 如何应对detection中的长尾分布？

6. 2D/3D视觉定位中值得进一步研究的难点有哪些？

*欢迎大家在下方留言提出主题相关问题，主持人和panel嘉宾会从中选择若干热度高的问题加入panel议题！

报告嘉宾：贺通 (上海人工智能实验室)

报告时间：2022年03月23日 (星期三)晚上20:00 (北京时间)

报告题目：3D instance segmentation with dynamic convolution

报告人简介：

Tong was a Research Fellow at Australian Institute for Machine Learning (AIML), the University of Adelaide, working with Prof. Chunhua Shen and Prof. Anton van den Hengel. Tong got his PhD in computer science at the University of Adelaide and supervised by Chunhua Shen. He was a visiting student at MMLAB of the Chinese University of Hong Kong at Shenzhen under the supervision of Prof.Yu Qiao and Dr.Weilin Huang. His research interests lie in the area of computer vision and machine learning, especially in enabling machines to see and understand the environment. Specific research topics include 2D/3D object detection and instance segmentation.

个人主页：

https://tonghe90.github.io

报告摘要：

Previous top-performing approaches for point cloud instance segmentation involve a bottom-up strategy, which often includes inefficient operations or complex pipelines, such as grouping over-segmented components, introducing additional steps for refining, or designing complicated loss functions. The inevitable variation in the instance scales can lead bottom-up methods to become particularly sensitive to hyper-parameter values. To this end, we propose instead a dynamic, proposal-free, data-driven approach that generates the appropriate convolution kernels to apply in response to the nature of the instances. The proposed method achieves promising results on both ScanetNetV2 and S3DIS, and this performance is robust to the particular hyper-parameter values chosen.

报告嘉宾：陈挺 (Google Brain)

报告时间：2022年03月23日 (星期三)晚上20:30 (北京时间)

报告题目：Pix2seq: A Language Modeling Framework for Object Detection

报告人简介：

Ting Chen is a research scientist from the Google Brain team. He joined Google after obtaining his PhD from University of California, Los Angeles. His main research interest includes self-supervised representation learning, and general learning principles for various data.

个人主页：

https://research.google/people/106719/

报告摘要：

We present Pix2Seq, a simple and generic framework for object detection. Unlike existing approaches that explicitly integrate prior knowledge about the task, we simply cast object detection as a language modeling task conditioned on the observed pixel inputs. Object descriptions (e.g., bounding boxes and class labels)are expressed as sequences of discrete tokens, and we train a neural net to perceive the image and generate the desired sequence. Our approach is based mainly on the intuition that if a neural net knows about where and what the objects are, we just need to teach it how to read them out. Beyond the use of task-specific data augmentations, our approach makes minimal assumptions about the task, yet it achieves competitive results on the challenging COCO dataset, compared to highly specialized and well optimized detection algorithms.

Panel嘉宾：代季峰 (商汤科技)

嘉宾简介：

代季峰博士现任商汤科技研究院的执行研究总监。他在清华大学自动化系于2009年和2014年分别获得学士和博士学位。2012年至2013年间，他在加州大学洛杉矶分校访学。2014年至2019年间，他在微软亚洲研究院 (MSRA)视觉组工作，曾担任首席研究员、研究经理。2019年至今，他在商汤科技研究院工作，担任通用智能部门负责人，执行研究总监。他的研究兴趣为计算机视觉中的通用物体识别算法和跨模态通用感知算法。他在领域顶级会议和期刊上发表了30多篇论文，根据谷歌学术统计获得了16000多次引用。他曾连续两年获得领域权威的COCO物体识别竞赛一等奖。他是领域顶刊IJCV的编委，领域顶会CVPR 2021和ECCV 2020的领域主席，领域顶会ICCV 2019的公共事务主席，会议AAAI 2018的高级PC成员，北京智源人工智能研究院的青年科学家。

个人主页：

https://jifengdai.org

Panel嘉宾：王鹤 (Peking University)

嘉宾简介：

Dr. He Wang is a tenure-track assistant professor in the Center on Frontiers of Computing Studies (CFCS) at Peking University, where he leads Embodied Perception and InteraCtion Lab (EPIC). His research interests span across 3D vision, robotics, and machine learning, with a special focus on embodied AI. His research objective is to endow robots working in complex real-world scenes with generalizable 3D vision and interaction policies. Prior to joining Peking University, he received his Ph.D. degree from Stanford University in 2021 under the advisory of Prof. Leonidas J. Guibas and his Bachelor's degree in 2014 from Tsinghua University. He has published more than 20 papers on top vision and learning conferences (CVPR/ICCV/ECCV/NeurIPS)and his work won Eurographics 2019 best paper honorable mention. He serves as an area chair in CVPR 2022 and WACV 2022.

个人主页：

https://hughw19.github.io

Panel嘉宾：张兆翔 (中国科学院自动化研究所)

嘉宾简介：

张兆翔，博士，中国科学院自动化研究所研究员、博士生导师，教育部长江学者特聘教授，国家万人计划青年拔尖人才。主要研究方向包括脑启发的神经网络建模、视觉认知学习、面向开放环境的场景感知与理解，近五年来在本领域顶会顶刊发表论文100余篇，授权专利20余项，承担了国家自然科学基金重点项目、国家重点研发项目等一系列国家级科研项目和企业合作项目，是IEEE高级会员，中国计算机学会CCF杰出会员、中国人工智能学会CAAI杰出会员、中国人工智能学会CAAI副秘书长，担任了IEEE T-CSVT、Patten Recognition等知名期刊编委，是CVPR、ICCV、AAAI、IJCAI、ACM MM、ICPR、ACCV等知名国际会议的领域主席 (Area Chair)。

个人主页：

https://zhaoxiangzhang.net/chinese/

报告主持人：欧阳万里 (悉尼大学)

主持人简介：

欧阳万里，悉尼大学副教授。2011年于香港中文大学获得博士学位。研究方向包括计算机视觉，模式识别，深度学习等。主要从事基于深度学习结构设计，物体检测与跟踪，以及AI for Science的课题研究。他和团队曾获得ImageNet和COCO物体检测第一名。ICCV最佳审稿人，IJCV和Pattern Recognition编委，TPAMI客座编辑，IEEE高级会员，ICCV2019展示主席，CVPR、ICCV、AAAI领域主席。入选AI2000 2022年度「人工智能全球2000位最具影响力学者榜」计算机视觉领域前100名学者。获悉尼大学“科研杰出校长奖”。担任TPAMI, IJCV, TOG, TIP, CVPR, ICCV, SIGGRAPH等期刊/会议的审稿人。

个人主页：

https://wlouyang.github.io

Panel主持人：周晓巍 (浙江大学)

主持人简介：

周晓巍，浙江大学“百人计划”研究员、博士生导师。2008年本科毕业于浙江大学，2013年博士毕业于香港科技大学，2014至2017年在宾夕法尼亚大学 GRASP 机器人实验室从事博士后研究。2017年入选国家级青年项目并加入浙江大学。研究方向主要为三维视觉、图形学及其在混合现实、机器人等领域的应用。相关工作多次入选计算机视觉顶级会议CVPR的最佳论文候选 (<0.5%)，并被麻省理工科技评论等知名媒体报道。曾获得“陆增镛CAD&CG高科技奖”一等奖，中国计算机学会CAD&CG图形开源贡献奖。担任计算机视觉顶级期刊IJCV编委、顶级会议CVPR21/ICCV21领域主席。

个人主页：

https://xzhou.me

特别鸣谢本次Webinar主要组织者：

主办AC：欧阳万里 (悉尼大学)

协办AC：周晓巍 (浙江大学)

活动参与方式

1、VALSE每周举行的Webinar活动依托B站直播平台进行，欢迎在B站搜索VALSE_Webinar关注我们！

直播地址：

https://live.bilibili.com/22300737；

历史视频观看地址：

https://space.bilibili.com/562085182/

2、VALSE Webinar活动通常每周三晚上20:00进行，但偶尔会因为讲者时区问题略有调整，为方便您参加活动，请关注VALSE微信公众号：valse_wechat 或加入VALSE QQ R群，群号：137634472）；