Currently available methods for extracting saliency maps identify parts of the input which are the most important to a specific fixed classifier. We show that this strong dependence on a given classifier hinders their performance. To address this problem, we propose classifier-agnostic saliency map extraction, which finds all parts of the image that any classifier could use, not just one given in advance. We observe that the proposed approach extracts higher quality saliency maps than prior work while being conceptually simple and easy to implement. The method sets the new state of the art result for localization task on the ImageNet data, outperforming all existing weakly-supervised localization techniques, despite not using the ground truth labels at the inference time. The code reproducing the results is available at https://github.com/kondiz/casme . The final version of this manuscript is published in Computer Vision and Image Understanding and is available online at https://doi.org/10.1016/j.cviu.2020.102969 .

0
下载
关闭预览

相关内容

计算机视觉与图像理解CVIU(Computer Vision and Image Understanding)的中心内容是对图像信息的计算机分析。这本杂志发表的论文涵盖了图像分析的各个方面,从早期视觉的低级、形象性过程,到识别和解释的高级、符号化过程。图像理解领域的广泛主题被覆盖,包括提供与主流观点不同的见解的论文。 官网地址:http://dblp.uni-trier.de/db/journals/cviu/

This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor x4 based on a set of prior examples of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter count, FLOPs, activations, and memory consumption while at least maintaining PSNR of MSRResNet. The track had 150 registered participants, and 25 teams submitted the final results. They gauge the state-of-the-art in efficient single image super-resolution.

0
0
下载
预览

This paper reviews the video extreme super-resolution challenge associated with the AIM 2020 workshop at ECCV 2020. Common scaling factors for learned video super-resolution (VSR) do not go beyond factor 4. Missing information can be restored well in this region, especially in HR videos, where the high-frequency content mostly consists of texture details. The task in this challenge is to upscale videos with an extreme factor of 16, which results in more serious degradations that also affect the structural integrity of the videos. A single pixel in the low-resolution (LR) domain corresponds to 256 pixels in the high-resolution (HR) domain. Due to this massive information loss, it is hard to accurately restore the missing information. Track 1 is set up to gauge the state-of-the-art for such a demanding task, where fidelity to the ground truth is measured by PSNR and SSIM. Perceptually higher quality can be achieved in trade-off for fidelity by generating plausible high-frequency content. Track 2 therefore aims at generating visually pleasing results, which are ranked according to human perception, evaluated by a user study. In contrast to single image super-resolution (SISR), VSR can benefit from additional information in the temporal domain. However, this also imposes an additional requirement, as the generated frames need to be consistent along time.

0
0
下载
预览

Polarimetric synthetic aperture radar (PolSAR) image classification has been investigated vigorously in various remote sensing applications. However, it is still a challenging task nowadays. One significant barrier lies in the speckle effect embedded in the PolSAR imaging process, which greatly degrades the quality of the images and further complicates the classification. To this end, we present a novel PolSAR image classification method, which removes speckle noise via low-rank (LR) feature extraction and enforces smoothness priors via Markov random field (MRF). Specifically, we employ the mixture of Gaussian-based robust LR matrix factorization to simultaneously extract discriminative features and remove complex noises. Then, a classification map is obtained by applying convolutional neural network with data augmentation on the extracted features, where local consistency is implicitly involved, and the insufficient label issue is alleviated. Finally, we refine the classification map by MRF to enforce contextual smoothness. We conduct experiments on two benchmark PolSAR datasets. Experimental results indicate that the proposed method achieves promising classification performance and preferable spatial consistency.

0
0
下载
预览

This work presents our contribution in the context of the 6th task of SemEval-2020: Extracting Definitions from Free Text in Textbooks (DeftEval). This competition consists of three subtasks with different levels of granularity: (1) classification of sentences as definitional or non-definitional,(2) labeling of definitional sentences, and (3) relation classification. We use various pretrained language models (i.e., BERT, XLNet, RoBERTa, SciBERT, and ALBERT) to solve each of the three subtasks of the competition. Specifically, for each language model variant, we experiment by both freezing its weights and fine-tuning them. We also explore a multi-task architecture that was trained to jointly predict the outputs for the second and the third subtasks. Our best performing model evaluated on the DeftEval dataset obtains the 32nd place for the first subtask and the 37th place for the second subtask. The code is available for further research at: https://github.com/avramandrei/DeftEval.

0
0
下载
预览

In this paper, we propose a span based model combined with syntactic information for n-ary open information extraction. The advantage of span model is that it can leverage span level features, which is difficult in token based BIO tagging methods. We also improve the previous bootstrap method to construct training corpus. Experiments show that our model outperforms previous open information extraction systems. Our code and data are publicly available at https://github.com/zhanjunlang/Span_OIE

0
3
下载
预览

In recent years, many publications showed that convolutional neural network based features can have a superior performance to engineered features. However, not much effort was taken so far to extract local features efficiently for a whole image. In this paper, we present an approach to compute patch-based local feature descriptors efficiently in presence of pooling and striding layers for whole images at once. Our approach is generic and can be applied to nearly all existing network architectures. This includes networks for all local feature extraction tasks like camera calibration, Patchmatching, optical flow estimation and stereo matching. In addition, our approach can be applied to other patch-based approaches like sliding window object detection and recognition. We complete our paper with a speed benchmark of popular CNN based feature extraction approaches applied on a whole image, with and without our speedup, and example code (for Torch) that shows how an arbitrary CNN architecture can be easily converted by our approach.

0
5
下载
预览

Discrete correlation filter (DCF) based trackers have shown considerable success in visual object tracking. These trackers often make use of low to mid level features such as histogram of gradients (HoG) and mid-layer activations from convolution neural networks (CNNs). We argue that including semantically higher level information to the tracked features may provide further robustness to challenging cases such as viewpoint changes. Deep salient object detection is one example of such high level features, as it make use of semantic information to highlight the important regions in the given scene. In this work, we propose an improvement over DCF based trackers by combining saliency based and other features based filter responses. This combination is performed with an adaptive weight on the saliency based filter responses, which is automatically selected according to the temporal consistency of visual saliency. We show that our method consistently improves a baseline DCF based tracker especially in challenging cases and performs superior to the state-of-the-art. Our improved tracker operates at 9.3 fps, introducing a small computational burden over the baseline which operates at 11 fps.

0
6
下载
预览

We report an evaluation of the effectiveness of the existing knowledge base embedding models for relation prediction and for relation extraction on a wide range of benchmarks. We also describe a new benchmark, which is much larger and complex than previous ones, which we introduce to help validate the effectiveness of both tasks. The results demonstrate that knowledge base embedding models are generally effective for relation prediction but unable to give improvements for the state-of-art neural relation extraction model with the existing strategies, while pointing limitations of existing methods.

0
8
下载
预览

Partial person re-identification (re-id) is a challenging problem, where only some partial observations (images) of persons are available for matching. However, few studies have offered a flexible solution of how to identify an arbitrary patch of a person image. In this paper, we propose a fast and accurate matching method to address this problem. The proposed method leverages Fully Convolutional Network (FCN) to generate certain-sized spatial feature maps such that pixel-level features are consistent. To match a pair of person images of different sizes, hence, a novel method called Deep Spatial feature Reconstruction (DSR) is further developed to avoid explicit alignment. Specifically, DSR exploits the reconstructing error from popular dictionary learning models to calculate the similarity between different spatial feature maps. In that way, we expect that the proposed FCN can decrease the similarity of coupled images from different persons and increase that of coupled images from the same person. Experimental results on two partial person datasets demonstrate the efficiency and effectiveness of the proposed method in comparison with several state-of-the-art partial person re-id approaches. Additionally, it achieves competitive results on a benchmark person dataset Market1501 with the Rank-1 accuracy being 83.58%.

0
9
下载
预览

Most previous event extraction studies have relied heavily on features derived from annotated event mentions, thus cannot be applied to new event types without annotation effort. In this work, we take a fresh look at event extraction and model it as a grounding problem. We design a transferable neural architecture, mapping event mentions and types jointly into a shared semantic space using structural and compositional neural networks, where the type of each event mention can be determined by the closest of all candidate types . By leveraging (1)~available manual annotations for a small set of existing event types and (2)~existing event ontologies, our framework applies to new event types without requiring additional annotation. Experiments on both existing event types (e.g., ACE, ERE) and new event types (e.g., FrameNet) demonstrate the effectiveness of our approach. \textit{Without any manual annotations} for 23 new event types, our zero-shot framework achieved performance comparable to a state-of-the-art supervised model which is trained from the annotations of 500 event mentions.

0
10
下载
预览
小贴士
相关论文
Kai Zhang,Martin Danelljan,Yawei Li,Radu Timofte,Jie Liu,Jie Tang,Gangshan Wu,Yu Zhu,Xiangyu He,Wenjie Xu,Chenghua Li,Cong Leng,Jian Cheng,Guangyang Wu,Wenyi Wang,Xiaohong Liu,Hengyuan Zhao,Xiangtao Kong,Jingwen He,Yu Qiao,Chao Dong,Xiaotong Luo,Liang Chen,Jiangtao Zhang,Maitreya Suin,Kuldeep Purohit,A. N. Rajagopalan,Xiaochuan Li,Zhiqiang Lang,Jiangtao Nie,Wei Wei,Lei Zhang,Abdul Muqeet,Jiwon Hwang,Subin Yang,JungHeum Kang,Sung-Ho Bae,Yongwoo Kim,Liang Chen,Jiangtao Zhang,Xiaotong Luo,Yanyun Qu,Geun-Woo Jeon,Jun-Ho Choi,Jun-Hyuk Kim,Jong-Seok Lee,Steven Marty,Eric Marty,Dongliang Xiong,Siang Chen,Lin Zha,Jiande Jiang,Xinbo Gao,Wen Lu,Haicheng Wang,Vineeth Bhaskara,Alex Levinshtein,Stavros Tsogkas,Allan Jepson,Xiangzhen Kong,Tongtong Zhao,Shanshan Zhao,Hrishikesh P S,Densen Puthussery,Jiji C V,Nan Nan,Shuai Liu,Jie Cai,Zibo Meng,Jiaming Ding,Chiu Man Ho,Xuehui Wang,Qiong Yan,Yuzhi Zhao,Long Chen,Jiangtao Zhang,Xiaotong Luo,Liang Chen,Yanyun Qu,Long Sun,Wenhao Wang,Zhenbing Liu,Rushi Lan,Rao Muhammad Umer,Christian Micheloni
0+阅读 · 2020年9月15日
Dario Fuoli,Zhiwu Huang,Shuhang Gu,Radu Timofte,Arnau Raventos,Aryan Esfandiari,Salah Karout,Xuan Xu,Xin Li,Xin Xiong,Jinge Wang,Pablo Navarrete Michelini,Wenhao Zhang,Dongyang Zhang,Hanwei Zhu,Dan Xia,Haoyu Chen,Jinjin Gu,Zhi Zhang,Tongtong Zhao,Shanshan Zhao,Kazutoshi Akita,Norimichi Ukita,Hrishikesh P S,Densen Puthussery,Jiji C V
0+阅读 · 2020年9月14日
Haixia Bi,Jing Yao,Zhiqiang Wei,Danfeng Hong,Jocelyn Chanussot
0+阅读 · 2020年9月13日
Andrei-Marius Avram,Dumitru-Clementin Cercel,Costin-Gabriel Chiru
0+阅读 · 2020年9月11日
Junlang Zhan,Hai Zhao
3+阅读 · 2019年3月1日
Christian Bailer,Tewodros Habtegebrial,Kiran varanasi,Didier Stricker
5+阅读 · 2018年5月8日
Caglar Aytekin,Francesco Cricri,Emre Aksu
6+阅读 · 2018年2月8日
Lingxiao He,Jian Liang,Haiqing Li,Zhenan Sun
9+阅读 · 2018年1月3日
Lifu Huang,Heng Ji,Kyunghyun Cho,Clare R. Voss
10+阅读 · 2017年7月4日
相关资讯
Hierarchically Structured Meta-learning
CreateAMind
13+阅读 · 2019年5月22日
深度自进化聚类:Deep Self-Evolution Clustering
我爱读PAMI
11+阅读 · 2019年4月13日
Unsupervised Learning via Meta-Learning
CreateAMind
32+阅读 · 2019年1月3日
【泡泡一分钟】学习多视图相似度(ICCV-2017)
泡泡机器人SLAM
6+阅读 · 2018年10月7日
ResNet, AlexNet, VGG, Inception:各种卷积网络架构的理解
全球人工智能
15+阅读 · 2017年12月17日
【推荐】ResNet, AlexNet, VGG, Inception:各种卷积网络架构的理解
机器学习研究会
17+阅读 · 2017年12月17日
Capsule Networks解析
机器学习研究会
10+阅读 · 2017年11月12日
【推荐】YOLO实时目标检测(6fps)
机器学习研究会
17+阅读 · 2017年11月5日
【推荐】决策树/随机森林深入解析
机器学习研究会
5+阅读 · 2017年9月21日
【推荐】SVM实例教程
机器学习研究会
16+阅读 · 2017年8月26日
Top
微信扫码咨询专知VIP会员