旷视研究院 11 篇论文入选 ICCV 2019(含目标检测/Re-ID/文本检测/模型剪枝等)

2019 年 7 月 27 日 CVer

点击上方“CVer”,选择加"星标"或“置顶”

重磅干货,第一时间送达

CVer 将第一时间报道 ICCV 2019 录用论文!文末有彩蛋~


1、Objects365: A Large-scale, High-quality Dataset for Object Detection

 

数据集:http://www.objects365.org/overview.html


In this paper, we introduce a new large-scaleobject detection dataset, Objects365, which has 365 object categories over 600Kimages. More than 10 million, high-quality bounding boxes are manually labeledthrough a three-step, carefully designed annotation pipeline. It is the largestobject detection dataset (with full annotation) so far and establishes a morechallenging benchmark for the community. Objects365 can serve as a betterfeature learning dataset for localization-sensitive tasks like object detectionand semantic segmentation. The Objects365 pre-trained models significantlyoutperform ImageNet pre-trained models: 5.6 (42 vs 36.4) / 2.7 (42 vs 39.3) pointshigher when training 90K/540K iterations on COCO benchmark. Meanwhile, thefinetuning time can be greatly reduced (up to 10 times) when reaching the sameaccuracy. Better generalization ability of Object365 has also been verified onCityPersons, VOC segmentation, and ADE tasks. We will release the dataset aswell as all the pre-trained models.

 

2、ThunderNet: Towards Real-time Generic Object Detection


论文链接:https://arxiv.org/abs/1903.11752

 

Real-time generic object detection on mobileplatforms is a crucial but challenging computer vision task. However, previousCNN-based detectors suffer from enormous computational cost, which hinders themfrom real-time inference in computation-constrained scenarios. In this paper,we investigate the effectiveness of two-stage detectors in real-time genericdetection and propose a lightweight two-stage detector named ThunderNet. In thebackbone part, we analyze the drawbacks in previous lightweight backbones andpresent a lightweight backbone designed for object detection. In the detectionpart, we exploit an extremely efficient RPN and detection head design. Togenerate more discriminative feature representation, we design two efficientarchitecture blocks, Context Enhancement Module and Spatial Attention Module. Atlast, we investigate the balance between the input resolution, the backbone,and the detection head. Compared with lightweight one-stage detectors,ThunderNet achieves superior performance with only 40% of the computationalcost on PASCAL VOC and COCO benchmarks. Without bells and whistles, our modelruns at 24.1 fps on an ARM-based device. To the best of our knowledge, this isthe first real-time detector reported on ARM platforms. Code will be releasedfor paper reproduction.

 

3、Efficient and Accurate Arbitrary-Shaped Text Detection with PixelAggregation Network

 

论文链接:  尚未上传


Scene text detection, an important step of scenetext reading systems, has witnessed rapid development with convolutional neuralnetworks. Nonetheless, two main challenges still exist and hamper itsdeployment to real-world applications. The first problem is a trade-off betweenspeed and accuracy. The second one is to model the arbitrary-shaped textinstance. Recently, some methods have been proposed to tackle arbitrary-shapedtext detection, but they rarely take the running time of the entire pipelineinto consideration, which may fall short in the actual production environment.In this paper, we propose an efficient and accurate arbitrary-shaped textdetector, termed Pixel Aggregation Network (PAN), which is equipped with a lowcomputational-cost segmentation head and a learnable post-processing method.More specifically, the segmentation head is made up of Feature PyramidEnhancement Module (FPEM) and Feature Fusion Module (FFM). FPEM is a cascadableU-shaped module, which can introduce multi-level information to guide thebetter segmentation. FFM can gather the features given by the FPEMs ofdifferent depths into a final feature for segmentation. The learnablepost-processing is implemented by Pixel Aggregation (PA), which can preciselyaggregate text pixels by predicted similarity vectors. Experiments on severalstandard benchmarks validate the superiority of the proposed PAN. It is worthnoting that our method can achieve a competitive F-measure of 79.9% at 84.2 FPSon CTW1500. To our knowledge, PAN is the first method that can work real-timeto detect the text instance with arbitrary shape accurately.

 

4、Semi-supervised Skin Detection by Network with Mutual Guidance

 

论文链接:  尚未上传


In this paper we present a new data-driven methodfor robust skin detection from a single human portrait image. Unlike previousmethods, we incorporate human body as a

weak semantic guidance into this task, consideringacquiring large-scale of human labeled skin data is commonly expensive andtime-consuming. To be specific, we propose a dual-task neural network for jointdetection of skin and body via a semi-supervised learning strategy. Thedualtask network contains a shared encoder but two decoders for skin and body separately.For each decoder, its output also serves as a guidance for its counterpart,making both decoders mutually guided. Extensive experiments were conducted todemonstrate the effectiveness of our network with mutual guidance, andexperimental results show our network outperforms the state-of-the-art in skindetection.

 

5、Semi-Supervised Video Salient Object Detection Using Pseudo-Labels

 

论文链接:  尚未上传


Deep learning based video salient object detectionhas recently achieved great success with its performance significantlyoutperforming any other unsupervised methods. However, existing data-drivenapproaches heavily rely on a large quantity of pixel-wise annotated videoframes to deliver such promising results. In this paper, we address thesemi-supervised video salient object detection task using pseudo-labels.Specifically, we present an effective video saliency detector that consists ofa spatial refinement network and a spatiotemporal module. Based on the samerefinement network and motion information in terms of optical flow, we furtherpropose a novel method to generate pixel-level pseudo-labels from sparselyannotated frames. By utilizing the generated pseudo-labels together with a partof manual annotations, our video saliency detector learns spatial and temporalcues for both contrast inference and coherence enhancement, thus producingaccurate saliency maps. Experimental results demonstrate that our proposedsemi-supervised method even greatly outperforms all the state-of-the-artfully-supervised methods across three public benchmarks of VOS, DAVIS, andFBMS.

 

6、Disentangled Image Matting

 

论文链接:  尚未上传


Most previous image matting methods require aroughly-specificed trimap as input, and estimate fractional alpha values for allpixels that are in the unknown region of the trimap. In this paper, we arguethat directly estimating the alpha matte from a coarse trimap is a majorlimitation of previous methods, as this practice tries to address two difficultand inherently different problems at the same time: identifying true blendingpixels inside the trimap region, and estimate accurate alpha values for them.We propose AdaMatting, a new end-to-end matting framework that disentanglesthis problem into two sub-tasks: trimap adaptation and alpha estimation. Trimapadaptation is a pixel-wise classification problem that infers the globalstructure of the input image by identifying definite foreground, background, andsemi-transparent image regions. Alpha estimation is a regression problem thatcauculates the opacity value of each blended pixel. Our method separatelyhandles these two sub-tasks within a single deep convolutional neural network(CNN). Extensive experiments show that the AdaMatting can produce high qualityresults even with low-quality input trimaps. Our method refreshes thestate-of-the-art performance on Adobe Composition-1k dataset both qualitativelyand quantitatively. It is also the current best-performing method on thealphamatting.com online evaluation for all commonly-used metrics.

 

7、Re-ID Driven Localization Refinement for Person Search

 

论文链接:  尚未上传


Person search aims at localizing and identifying aquery person from a gallery of uncropped scene images. Different from personre-identification (re-ID), its performance

also depends on the localization accuracy of a pedestriandetector. The state-of-the-art methods train the detector individually, and thedetected bounding boxes may be suboptimal for the following re-ID task. Toalleviate this issue, we propose a re-ID driven localization refinementframework for providing the refined detection boxes for person

search. Specifically, we develop a differentiableROI transform layer to effectively transform the bounding boxes from theoriginal images. Thus, the box coordinates can be

driven by the re-ID training other than theoriginal detection task. With the joint supervision, the detector can generatemore reliable bounding boxes, which further favor

the person re-ID task. Extensive experimentalresults on the widely used benchmarks demonstrate that our proposed method performsfavorably against the state-of-the-art person search methods.

 

8、Vehicle Re-identification with Viewpoint-aware Metric Learning

 

论文链接:  尚未上传


This paper considers vehicle re-identification(re-ID) problem. The extreme viewpoint variation (up to 180 degrees) posesgreat challenges for existing approaches. Inspired by the behavior in human’srecognition process, we propose a novel viewpoint-aware metric learningapproach. It learns two metrics for similar viewpoints and different

viewpoints in two feature spaces, respectively,giving rise to viewpoint-aware network (VANet). During training, two types ofconstraints are applied jointly. During inference,

viewpoint is firstly estimated and thecorresponding metric is used. Experimental results confirm that VANetsignificantly improves re-ID accuracy, especially when the pair is

observed from different viewpoints. Our methodestablishes the new state-of-the-art on two benchmarks.

 

9、MetaPruning: Meta Learning for Automatic Neural Network ChannelPruning


论文链接:https://arxiv.org/abs/1903.10258


In this paper, we propose a novel meta learningapproach for automatic channel pruning of very deep neural networks. We firsttrain a PruningNet, a kind of meta network, which is able to generate weightparameters for any pruned structure given the target network. We use a simplestochastic structure sampling method for training the PruningNet. Then, weapply an evolutionary procedure to search for good-performing pruned networks.The search is highly efficient because the weights are directly generated bythe trained PruningNet and we do not need any finetuning. With a singlePruningNet trained for the target network, we can search for various PrunedNetworks under different constraints with little human participation. We havedemonstrated competitive performances on MobileNet V1/V2 networks, up to9.0/9.9 higher ImageNet accuracy than V1/V2. Compared to the previousstate-of-the-art AutoML-based pruning methods, like AMC and NetAdapt, weachieve higher or comparable accuracy under various conditions.

 

10、Symmetry-constrained Rectification Network for Scene Text Recognition


论文链接:  尚未上传

 

Reading text in the wild is a very challenging taskdue to the diversity of text instances and the complexity of natural scenes.Recently, the community has paid increasing attention to the problem ofrecognizing text instances of irregular shapes. One intuitive and effectivesolution to this problem is to rectify irregular text to a canonical formbefore recognition. However, these methods might struggle when dealing withhighly curved or distorted text instances. To tackle this issue, we propose aSymmetry-constrained Rectification Network (ScRN) in this paper, based on thelocal attributes of text instances, such as center line, scale, andorientation. Such constraints with an accurate description of text shape enableScRN to generate better rectification results than existing methods thusleading to higher recognition accuracy. Our method achieves state-of-the-artperformance on text of both regular and irregular shapes. Specifically, thesystem outperforms existing algorithms by a large margin on datasets thatcontain quite a proportion of irregular text instances, e.g., ICDAR 2015,SVT-Perspective and CUTE80.

 

11、Learning to Paint with Model-based Deep Reinforcement Learning


论文链接:https://arxiv.org/abs/1903.04411

 

We show how to teach machines to paint like humanpainters, who can use a few strokes to create fantastic paintings. By combiningthe neural renderer and model-based Deep Reinforcement Learning (DRL), ouragent can decompose texture-rich images into strokes and make longterm plans.For each stroke, the agent directly determines the position and color of thestroke. Excellent visual effect can be achieved using hundreds of strokes. Thetraining process does not require experience of human painting or stroketracking data.

 

 

CVer:ICCV 2019论文宣传征稿


近期 ICCV 2019的录用结果已经出来了,在此先恭喜各位大佬!如果你的ICCV 2019论文想要让更多的 CVer 看见,希望论文被更多引用,开源项目更多 star,欢迎投稿到 CVer 进行宣传!同时也欢迎 AI 相关的企业进行宣传合作。投稿的具体方式,请联系下面的CVer小助手。另外上述 IIAI的28篇论文如果有论文或者开源代码的链接放出,CVer 会第一时间进行报道。


注:本小组仅限 ICCV2019 录用论文的作者团队加入,其它 CVer 欢迎加入学术交流群,详见其它文章的文末信息。


CVer:ICCV2019 宣传小组


扫码添加CVer小助手,可申请加入CVer-ICCV2019宣传群。一定要备注:ICCV+研究方向+地点+学校/公司+昵称(如ICCV+目标检测+上海+上交+卡卡)

▲长按加群


▲长按关注我们

麻烦给我一个在看

登录查看更多
0

相关内容

专知会员服务
110+阅读 · 2020年3月12日
抢鲜看!13篇CVPR2020论文链接/开源代码/解读
专知会员服务
50+阅读 · 2020年2月26日
强化学习最新教程,17页pdf
专知会员服务
177+阅读 · 2019年10月11日
2019年机器学习框架回顾
专知会员服务
36+阅读 · 2019年10月11日
[综述]深度学习下的场景文本检测与识别
专知会员服务
78+阅读 · 2019年10月10日
计算机视觉最佳实践、代码示例和相关文档
专知会员服务
19+阅读 · 2019年10月9日
【ECCV2018】24篇论文代码实现
专知
17+阅读 · 2018年9月10日
语义分割+视频分割开源代码集合
极市平台
35+阅读 · 2018年3月5日
【推荐】(TensorFlow)SSD实时手部检测与追踪(附代码)
机器学习研究会
11+阅读 · 2017年12月5日
【推荐】YOLO实时目标检测(6fps)
机器学习研究会
20+阅读 · 2017年11月5日
【推荐】图像分类必读开创性论文汇总
机器学习研究会
14+阅读 · 2017年8月15日
Equalization Loss for Long-Tailed Object Recognition
Arxiv
5+阅读 · 2020年4月14日
SlowFast Networks for Video Recognition
Arxiv
4+阅读 · 2019年4月18日
Arxiv
11+阅读 · 2019年4月15日
Arxiv
4+阅读 · 2018年12月20日
Arxiv
7+阅读 · 2018年12月5日
Arxiv
5+阅读 · 2018年4月17日
Arxiv
7+阅读 · 2018年3月22日
Arxiv
7+阅读 · 2018年1月24日
VIP会员
相关VIP内容
专知会员服务
110+阅读 · 2020年3月12日
抢鲜看!13篇CVPR2020论文链接/开源代码/解读
专知会员服务
50+阅读 · 2020年2月26日
强化学习最新教程,17页pdf
专知会员服务
177+阅读 · 2019年10月11日
2019年机器学习框架回顾
专知会员服务
36+阅读 · 2019年10月11日
[综述]深度学习下的场景文本检测与识别
专知会员服务
78+阅读 · 2019年10月10日
计算机视觉最佳实践、代码示例和相关文档
专知会员服务
19+阅读 · 2019年10月9日
相关资讯
相关论文
Equalization Loss for Long-Tailed Object Recognition
Arxiv
5+阅读 · 2020年4月14日
SlowFast Networks for Video Recognition
Arxiv
4+阅读 · 2019年4月18日
Arxiv
11+阅读 · 2019年4月15日
Arxiv
4+阅读 · 2018年12月20日
Arxiv
7+阅读 · 2018年12月5日
Arxiv
5+阅读 · 2018年4月17日
Arxiv
7+阅读 · 2018年3月22日
Arxiv
7+阅读 · 2018年1月24日
Top
微信扫码咨询专知VIP会员