新智元推荐
本文来源于公众号CVer和专知的整理
【新智元导读】计算机视觉最具影响力的学术会议之一的 IEEE CVPR 将于 2018 年 6 月 18 日 - 22 日在美国盐湖城召开举行。据 CVPR 官网显示,今年大会有超过 3300 篇论文投稿,其中录取 979 篇;相比去年 783 篇论文,今年增长了近 25%。本文将介绍 CVPR 2018 所有录用论文的标题, 包括每篇论文属于 oral, spotlight 还是 poster 的情况。
本文将介绍 CVPR 2018 所有录用论文的标题, 包括每篇论文属于 oral, spotlight 还是 poster 的情况。大家可以根据论文的标题去 google/baidu,即可以找到相关 pdf/github/homepage 链接。
Amusi 已经将 CVPR 2018 所有论文清单上传到 daily-paper-computer-vision 上,大家直接点击文末的 “阅读全文”,即可访问 daily-paper-computer-vision,下载 cvpr2018-paper-list.csv。
link:
https://github.com/amusi/daily-paper-computer-vision/blob/master/2018/cvpr2018-paper-list.csv
CVPR 2018概览
CVPR 是 IEEE Conference on Computer Vision and Pattern Recognition 的缩写,即 IEEE 国际计算机视觉与模式识别会议。该会议是由 IEEE 举办的计算机视觉和模式识别领域的顶级会议。
会议的主要内容是计算机视觉与模式识别技术。CVPR 是世界顶级的计算机视觉会议(三大顶会之一,另外两个是 ICCV 和 ECCV)。本会议每年都会有固定的研讨主题,而每一年都会有公司赞助该会议并获得在会场展示的机会。
CVPR 有着较为严苛的录用标准,会议整体的录取率通常不超过 30%,而口头报告的论文比例更是不高于 5%。而会议的组织方是一个循环的志愿群体,通常在某次会议召开的三年之前通过遴选产生。CVPR 的审稿一般是双盲的,也就是说会议的审稿与投稿方均不知道对方的信息。通常某一篇论文需要由三位审稿者进行审读。最后再由会议的领域主席 (area chair) 决定论文是否可被接收。
CVPR 2018
上面简单介绍了 CVPR ,其重要性不言而喻。而本文的重点,也是各位童鞋关注的焦点就在于 CVPR 2018。我们先看一组数据:979/3303 ~= 29.6%,该数据是指 CVPR 2018 论文的收录比。
之前在知乎和各个新闻平台上都看到了 CVPR 2018 list,但都是一组纯序号,既没有属性也没有论文标题。机(wu)智(nai)的童鞋也只能去 arXiv 上 follow 最新的 paper,如果能遇见带有 CVPR 2018 标志的 paper,相信内心还有点小激动呢。
Amusi 在对知识的不断追求中,发现了 CVPR 2018 所有收录论文的名单,既包含了序号,也包含了属性(oral、spotlight 或 poster)以及最最最重要的论文标题!
有了论文标题,真的就可以为所欲为~
打开 cvpr2018-paper-list.csv,按下 crtl + F,输入要查找的内容,如 Object Detection,然后你就可以看到一篇篇关于 Object Detection 的论文啦!
然后将需要阅读的论文标题复制到 google/baidu 搜索框中,比如《An Analysis of Scale Invariance in Object Detection - SNIP》
打开最上面的链接,一般就可以成功跳转至 arXiv 的论文下载界面
授人以鱼,不如授人以鱼。上述只是 Amusi 常用小技巧,真的关公面前舞大刀了,大家可以自由发挥~
温馨提示:CVPR 2018 大会将于 2018 年 6 月 18~22 日于美国犹他州的盐湖城(Salt Lake City)举办。
link: http://cvpr2018.thecvf.com/
Single-Shot Refinement Neural Network for Object Detection |
||
Video Captioning via Hierarchical Reinforcement Learning |
||
DensePose: Multi-Person Dense Human Pose Estimation In The Wild |
||
DensePose: Multi-Person Dense Human Pose Estimation In The Wild |
||
Frustum PointNets for 3D Object Detection from RGB-D Data |
||
Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge |
||
Rethinking the Faster R-CNN Architecture for Temporal Action Localization |
||
Shape from Shading through Shape Evolution |
||
Shape from Shading through Shape Evolution |
||
A High-Quality Denoising Dataset for Smartphone Cameras |
||
Improving Color Reproduction Accuracy in the Camera Imaging Pipeline |
||
End-to-End Dense Video Captioning with Masked Transformer |
||
End-to-End Dense Video Captioning with Masked Transformer |
||
pOSE: Pseudo Object Space Error for Initialization-Free Bundle Adjustment |
||
Learning to Segment Every Thing |
||
Density-aware Single Image De-raining using a Multi-stream Dense Network |
||
Densely Connected Pyramid Dehazing Network |
||
Embodied Question Answering |
||
TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays |
||
TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays |
||
Towards Open-Set Identity Preserving Face Synthesis |
||
Baseline Desensitizing In Translation Averaging |
||
Learning from the Deep: A Revised Underwater Image Formation Model |
||
Context Encoding for Semantic Segmentation |
||
Context Encoding for Semantic Segmentation |
||
Deep Texture Manifold for Ground Terrain Recognition |
||
DS*: Tighter Lifting-Free Convex Relaxations for Quadratic Matching Problems |
||
Sparse, Smart Contours to Represent and Edit Images |
||
Every Smile is Unique: Landmark-guided Diverse Smile Generation |
||
Generative Non-Rigid Shape Completion with Graph Convolutional Autoencoders |
||
Learning a Discriminative Prior for Blind Image Deblurring |
||
Attentional ShapeContextNet for Point Cloud Recognition |
||
Learning Superpixels with Segmentation-Aware Affinity Loss |
||
Real-World Repetition Estimation by Div, Grad and Curl |
||
Real-World Repetition Estimation by Div, Grad and Curl |
||
Recurrent Saliency Transformation Network: Incorporating Multi-Stage Visual Cues for Small Organ Segmentation |
||
MegaDepth: Learning Single-View Depth Prediction from Internet Photos |
||
Learning Intrinsic Image Decomposition from Watching the World |
||
Learning Intrinsic Image Decomposition from Watching the World |
||
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering |
||
Human-centric Indoor Scene Synthesis Using Stochastic Grammar |
||
Learning by Asking Questions |
||
Instance Embedding Transfer to Unsupervised Video Object Segmentation |
||
Detect-and-Track: Efficient Pose Estimation in Videos |
||
Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval |
||
Guided Proofreading of Automatic Segmentations for Connectomics |
||
Augmented Skeleton Space Transfer for Depth-based Hand Pose Estimation |
||
Augmented Skeleton Space Transfer for Depth-based Hand Pose Estimation |
||
Context-aware Synthesis for Video Frame Interpolation |
||
2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning |
||
NAG: Network for Adversary Generation |
||
LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation |
||
LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation |
||
Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration |
||
Multi-view Harmonized Bilinear Network for 3D Object Recognition |
||
Multi-view Harmonized Bilinear Network for 3D Object Recognition |
||
Tangent Convolutions for Dense Prediction in 3D |
||
Tangent Convolutions for Dense Prediction in 3D |
||
Semi-parametric Image Synthesis |
||
Semi-parametric Image Synthesis |
||
Interactive Image Segmentation with Latent Diversity |
||
3D Hand Pose Estimation: From Current Achievements to Future Goals |
||
3D Hand Pose Estimation: From Current Achievements to Future Goals |
||
W2F: A Weakly-Supervised to Fully-Supervised Framework for Object Detection |
||
BlockDrop: Dynamic Inference Paths in Residual Networks |
||
BlockDrop: Dynamic Inference Paths in Residual Networks |
||
MapNet: Geometry-Aware Learning of Maps for Camera Localization |
||
MapNet: Geometry-Aware Learning of Maps for Camera Localization |
||
BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning |
||
Salient Object Detection Driven by Fixation Prediction |
||
3D Object Detection with Latent Support Surfaces |
||
Practical Block-wise Neural Network Architecture Generation |
||
Practical Block-wise Neural Network Architecture Generation |
||
Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points |
||
Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning |
||
Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning |
||
Visual Grounding via Accumulated Attention |
||
Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors |
||
ISTA-Net: Interpretable Optimization-Inspired Deep Network for Image Compressive Sensing |
||
Perturbative Neural Networks: Rethinking Convolution in CNNs |
||
Nonlinear 3D Face Morphable Model |
||
Nonlinear 3D Face Morphable Model |
||
Neural Baby Talk |
||
Neural Baby Talk |
||
Towards Pose Invariant Face Recognition in the Wild |
||
MoNet: Deep Motion Exploitation for Video Object Segmentation |
||
Exploring Disentangled Feature Representation Beyond Face Identification |
||
Towards Effective Low-bitwidth Convolutional Neural Networks |
||
Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries |
||
Learning Facial Action Units from Web Images with Scalable Weakly Supervised Clustering |
||
Few-Shot Image Recognition by Predicting Parameters from Activations |
||
Few-Shot Image Recognition by Predicting Parameters from Activations |
||
Single-Shot Object Detection with Enriched Semantics |
||
Unifying Identification and Context Learning for Person Recognition |
||
Separating Self-Expression and Visual Content in Hashtag Supervision |
||
Multi-Cue Correlation Filters for Robust Visual Tracking |
||
Beyond Trade-off: Accelerate FCN-based Face Detection with Higher Accuracy |
||
On the Robustness of Semantic Segmentation Models to Adversarial Attacks |
||
PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume |
||
PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume |
||
Illuminant Spectra-based Source Separation Using Flash Photography |
||
Illuminant Spectra-based Source Separation Using Flash Photography |
||
Tracking Multiple Objects Outside the Line of Sight using Speckle Imaging |
||
Tracking Multiple Objects Outside the Line of Sight using Speckle Imaging |
||
Improved Human Pose Estimation through Adversarial Data Augmentation |
||
Generative Adversarial Learning Towards Fast Weakly Supervised Detection |
||
Audio to Body Dynamics |
||
Audio to Body Dynamics |
||
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric |
||
Frame-Recurrent Video Super-Resolution |
||
Deep Mutual Learning |
||
Real-world Anomaly Detection in Surveillance Videos |
||
Soccer on Your Tabletop |
||
Diversity Regularized Spatiotemporal Attention for Video-based Person Re-identification |
||
HashGAN: Deep Learning to Hash with Pair Conditional Wasserstein GAN |
||
Excitation Backprop for RNNs |
||
Dynamic-Structured Semantic Propagation Network |
||
Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation |
||
Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation |
||
SPLATNet: Sparse Lattice Networks for Point Cloud Processing |
||
SPLATNet: Sparse Lattice Networks for Point Cloud Processing |
||
Video Representation Learning Using Discriminative Pooling |
||
Attend and Interact: Higher-Order Object Interactions for Video Understanding |
||
Human Pose Estimation with Parsing Induced Learner |
||
4D Human Body Correspondences from Panoramic Depth Maps |
||
Recognizing Human Actions as Evolution of Pose Estimation Maps |
||
GraphBit: Bitwise Interaction Mining via Deep Reinforcement Learning |
||
Deep Adversarial Metric Learning |
||
Deep Adversarial Metric Learning |
||
Revisiting Video Saliency: A Large-scale Benchmark and a New Model |
||
Graph-Cut RANSAC |
||
Five-point Fundamental Matrix Estimation for Uncalibrated Cameras |
||
Hashing as Tie-Aware Learning to Rank |
||
Optimizing Local Feature Descriptors for Nearest Neighbor Matching |
||
Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies |
||
Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies |
||
Consensus Maximization for Semantic Region Correspondences |
||
Consensus Maximization for Semantic Region Correspondences |
||
ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing |
||
Motion-Guided Cascaded Refinement Network for Video Object Segmentation |
||
Zigzag Learning for Weakly Supervised Object Detection |
||
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models |
||
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models |
||
VITON: An Image-based Virtual Try-on Network |
||
VITON: An Image-based Virtual Try-on Network |
||
Cross-Domain Self-supervised Multi-task Feature Learning Using Synthetic Game Imagery |
||
LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image |
||
Thoracic Disease Identification and Localization with Limited Supervision |
||
Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks |
||
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation |
||
Deep End-to-End Time-of-Flight Imaging |
||
Fast and Accurate Online Video Object Segmentation via Tracking Parts |
||
Fast and Accurate Online Video Object Segmentation via Tracking Parts |
||
Min-Entropy Latent Model for Weakly Supervised Object Detection |
||
Future Frame Prediction for Anomaly Detection A New Baseline |
||
Face Aging with Identity-Preserved Conditional Generative Adversarial Networks |
||
Learning to Compare: Relation Network for Few-Shot Learning |
||
Deep Layer Aggregation |
||
Deep Layer Aggregation |
||
Style Aggregated Network for Facial Landmark Detection |
||
M3: Multimodal Memory Modelling for Video Captioning |
||
M3: Multimodal Memory Modelling for Video Captioning |
||
Classification Driven Dynamic Image Enhancement |
||
Generative Image Inpainting with Contextual Attention |
||
Iterative Visual Reasoning Beyond Convolutions |
||
Iterative Visual Reasoning Beyond Convolutions |
||
Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification |
||
Textbook Question Answering under Teacher Guidance with Memory Networks |
||
Textbook Question Answering under Teacher Guidance with Memory Networks |
||
Multi-Level Factorisation Net for Person Re-Identification |
||
Functional Map of the World |
||
Functional Map of the World |
||
A Two-Step Disentanglement Method |
||
Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization |
||
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? |
||
Left-Right Comparative Recurrent Model for Stereo Matching |
||
Left-Right Comparative Recurrent Model for Stereo Matching |
||
Analytic Expressions for Probabilistic Moments of PL-DNN with Gaussian Input |
||
Analytic Expressions for Probabilistic Moments of PL-DNN with Gaussian Input |
||
Zero-Shot Sketch-Image Hashing |
||
Zero-Shot Sketch-Image Hashing |
||
Interpretable Convolutional Neural Networks |
||
Interpretable Convolutional Neural Networks |
||
Reconstructing Thin Structures of Manifold Surfaces by Integrating Spatial Curves |
||
Enhancing the Spatial Resolution of Stereo Images using a Parallax Prior |
||
Anticipating Traffic Accidents with Adaptive Loss and Large-scale Incident DB |
||
Generating Synthetic X-ray Images of a Person from the Surface Geometry |
||
Generating Synthetic X-ray Images of a Person from the Surface Geometry |
||
Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification |
||
Unsupervised CCA |
||
Discovering Point Lights with Intensity Distance Fields |
||
Universal Denoising Networks : A Novel CNN-based Network Architecture for Image Denoising |
||
Easy Identification from Better Constraints: Multi-Shot Person Re-Identification from Reference Constraints |
||
Recurrent Pixel Embedding for Instance Grouping |
||
Recurrent Pixel Embedding for Instance Grouping |
||
Recurrent Scene Parsing with Perspective Understanding in the Loop |
||
Learning to Hash by Discrepancy Minimization |
||
Fast End-to-End Trainable Guided Filter |
||
Disentangling Structure and Aesthetics for Content-aware Image Completion |
||
An Analysis of Scale Invariance in Object Detection - SNIP |
||
An Analysis of Scale Invariance in Object Detection - SNIP |
||
CSGNet: Neural Shape Parser for Constructive Solid Geometry |
||
Finding Tiny Faces in the Wild with Generative Adversarial Network |
||
Finding Tiny Faces in the Wild with Generative Adversarial Network |
||
SSNet: Scale Selection Network for Online 3D Action Prediction |
||
SSNet: Scale Selection Network for Online 3D Action Prediction |
||
Integrated facial landmark localization and super-resolution of real-world very low resolution faces in arbitrary poses with GANs |
||
Integrated facial landmark localization and super-resolution of real-world very low resolution faces in arbitrary poses with GANs |
||
The Best of Both Worlds: Combining CNNs and Geometric Constraints for Hierarchical Motion Segmentation |
||
In-Place Activated BatchNorm for Memory-Optimized Training of DNNs |
||
Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks |
||
Deep Cross-media Knowledge Transfer |
||
Deep Cross-media Knowledge Transfer |
||
Coupled End-to-end Transfer Learning with Generalized Fisher Information |
||
Knowledge Aided Consistency for Weakly Supervised Phrase Grounding |
||
Viewpoint-aware Attentive Multi-view Inference for Vehicle Re-identification |
||
MatNet: Modular Attention Network for Referring Expression Comprehension |
||
CBMV: A Coalesced Bidirectional Matching Volume for Disparity Estimation |
||
NISP: Pruning Networks using Neuron Importance Score Propagation |
||
NISP: Pruning Networks using Neuron Importance Score Propagation |
||
Who Let The Dogs Out? Modeling Dog Behavior From Visual Data |
||
Efficient Video Object Segmentation via Network Modulation |
||
Learning Deep Models for Face Anti-Spoofing: Binary or Auxiliary Supervision |
||
Feedback-prop: Convolutional Neural Network Inference under Partial Evidence |
||
A Memory Network Approach for Story-based Temporal Summarization of 360?Videos |
||
Improving Occlusion and Hard Negative Handling for Single-Stage Object Detectors |
||
UV-GAN: Adversarial Facial UV Map Completion for Pose-invariant Face Recognition |
||
Learning a Toolchain for Image Restoration |
||
Learning a Toolchain for Image Restoration |
||
Learning to Act Properly: Predicting and Explaining Affordances from Images |
||
Learning a Discriminative Feature Network for Semantic Segmentation |
||
Optimizing Video Object Detection via a Scale-Time Lattice |
||
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices |
||
Cascaded Pyramid Network for Multi-Person Pose Estimation |
||
Seeing Temporal Modulation of Lights from Standard Cameras |
||
Point-wise Convolutional Neural Networks |
||
Fine-grained Video Captioning for Sports Narrative |
||
Fine-grained Video Captioning for Sports Narrative |
||
Dense 3D Regression for Hand Pose Estimation |
||
Missing Slice Recovery for Tensors Using a Low-rank Model in Embedded Space |
||
Learning Convolutional Networks for Content-weighted Image Compression |
||
Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking |
||
Deep Cost-Sensitive and Order-Preserving Feature Learning for Cross-Population Age Estimation |
||
First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations |
||
Hand PointNet: 3D Hand Pose Estimation using Point Sets |
||
Hand PointNet: 3D Hand Pose Estimation using Point Sets |
||
Recovering Realistic Texture in Image Super-resolution by Spatial Feature Modulation |
||
Cube Padding for Weakly-Supervised Saliency Prediction in 360$^{\circ}$ Videos |
||
A Face to Face Neural Conversation Model |
||
SurfConv: Bridging 3D and 2D Convolution for RGBD Images |
||
Dynamic Video Segmentation Network |
||
Multiple Granularity Group Interaction Prediction |
||
Visual Question Reasoning on General Dependency Tree |
||
Visual Question Reasoning on General Dependency Tree |
||
From Lifestyle VLOGs to Everyday Interactions |
||
COCO-Stuff: Thing and Stuff Classes in Context |
||
GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB |
||
GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB |
||
Non-local Neural Networks |
||
Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs |
||
Taskonomy: Disentangling Task Transfer Learning |
||
Taskonomy: Disentangling Task Transfer Learning |
||
Embodied Real-World Active Perception |
||
Embodied Real-World Active Perception |
||
SfSNet : Learning Shape, Reflectance and Illuminance of Faces `in the wild' |
||
SfSNet : Learning Shape, Reflectance and Illuminance of Faces `in the wild' |
||
End-to-end Recovery of Human Shape and Pose |
||
Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene |
||
Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction |
||
A Fast Resection-Intersection Method for the Known Rotation Problem |
||
Image Generation from Scene Graphs |
||
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets |
||
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets |
||
PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation |
||
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs |
||
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs |
||
Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks |
||
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference |
||
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference |
||
Finding It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Video" |
||
Finding It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Video" |
||
Unsupervised Cross-dataset Person Re-identification by Transfer Learning of Spatio-temporal Patterns |
||
Kernelized Subspace Pooling for Deep Local Descriptors |
||
Video Rain Removal By Multiscale Convolutional Sparse Coding |
||
Learning from Millions of 3D Scans for Large-scale 3D Face Recognition |
||
Referring Relationships |
||
Improving Object Localization with Fitness NMS and Bounded IoU Loss |
||
Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination |
||
Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination |
||
CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-Localization |
||
CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-Localization |
||
Visual Question Generation as Dual Task of Visual Question Answering |
||
Visual Question Generation as Dual Task of Visual Question Answering |
||
Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi- Supervised Semantic Segmentation |
||
Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi- Supervised Semantic Segmentation |
||
Learning Dual Convolutional Neural Networks for Low-Level Vision |
||
Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation |
||
MegDet: A Large Mini-Batch Object Detector |
||
MegDet: A Large Mini-Batch Object Detector |
||
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks |
||
TOM-Net: Learning Transparent Object Matting from a Single Image |
||
TOM-Net: Learning Transparent Object Matting from a Single Image |
||
End-to-End Deep Kronecker-Product Matching for Person Re-identification |
||
Semantic Visual Localization |
||
Joint Cuts and Matching of Partitions in One Graph |
||
Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions |
||
Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions |
||
Crowd Counting via Adversarial Cross-Scale Consistency Pursuit |
||
Deep Group-shuffling Random Walk for Person Re-identification |
||
Learning to Detect Features in Texture Images |
||
Learning to Detect Features in Texture Images |
||
Transferable Joint Attribute-Identity Deep Learning for Unsupervised Person Re-Identification |
||
CarFusion: Combining Point Tracking and Part Detection for Dynamic 3D Reconstruction of Vehicles |
||
Context-aware Deep Feature Compression for High-speed Visual Tracking |
||
Deep Material-aware Cross-spectral Stereo Matching |
||
Deep Extreme Cut: From Extreme Points to Object Segmentation |
||
Label Denoising Adversarial Network (LDAN) for Inverse Lighting of Face Images |
||
Label Denoising Adversarial Network (LDAN) for Inverse Lighting of Face Images |
||
Harmonious Attention Network for Person Re-Identication |
||
Unsupervised Deep Generative Adversarial Hashing Network |
||
Unsupervised Deep Generative Adversarial Hashing Network |
||
Pseudo-Mask Augmented Object Detection |
||
LSTM stack-based Neural Multi-sequence Alignment TeCHnique (NeuMATCH) |
||
LSTM stack-based Neural Multi-sequence Alignment TeCHnique (NeuMATCH) |
||
Adversarial Complementary Learning for Weakly Supervised Object Localization |
||
Unsupervised Discovery of Object Landmarks as Structural Representations |
||
Unsupervised Discovery of Object Landmarks as Structural Representations |
||
DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map |
||
Monocular Relative Depth Perception with Web Stereo Data Supervision |
||
Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification |
||
Objects as context for detecting their semantic parts |
||
Camera Style Adaptation for Person Re-identification |
||
Conditional Generative Adversarial Network for Structured Domain Adaptation |
||
Rotation-sensitive Regression for Oriented Scene Text Detection |
||
Residual Parameter Transfer for Deep Domain Adaptation |
||
SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation |
||
SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation |
||
Weakly Supervised Instance Segmentation using Class Peak Response |
||
Weakly Supervised Instance Segmentation using Class Peak Response |
||
Robust Facial Landmark Detection via a Fully-Convolutional Local-Global Context Network |
||
Rotation Averaging and Strong Duality |
||
Rotation Averaging and Strong Duality |
||
PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning |
||
Im2Flow: Motion Hallucination from Static Images for Action Recognition |
||
Im2Flow: Motion Hallucination from Static Images for Action Recognition |
||
Feature Quantization for Defending Against Distortion of Images |
||
End-to-end weakly-supervised semantic alignment |
||
PointGrid: A Deep Network for 3D Shape Understanding |
||
PointGrid: A Deep Network for 3D Shape Understanding |
||
Imagine it for me: Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts |
||
A Minimalist Approach to Type-Agnostic Detection of Quadrics in Point Clouds |
||
A Benchmark for Articulated Human Pose Estimation and Tracking |
||
Boosting Self-Supervised Learning via Knowledge Transfer |
||
PPFNet: Global Context Aware Local Features for Robust 3D Point Matching |
||
PPFNet: Global Context Aware Local Features for Robust 3D Point Matching |
||
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments |
||
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments |
||
Fast Video Object Segmentation by Reference-Guided Mask Propagation |
||
Fast Video Object Segmentation by Reference-Guided Mask Propagation |
||
Super-Resolving Very Low-Resolution Face Images with Supplementary Attributes |
||
Video Person Re-identification with Competitive Snippet-similarity Aggregation and Co-attentive Snippet Embedding |
||
One-shot Action Localization by Sequence Matching Network |
||
Efficient Subpixel Refinement with Symbolic Linear Predictors |
||
Distort-and-Recover: Color Enhancement using Deep Reinforcement Learning |
||
Group Consistent Similarity Learning via Deep CRFs for Person Re-Identification |
||
Group Consistent Similarity Learning via Deep CRFs for Person Re-Identification |
||
Single Image Reflection Separation with Perceptual Losses |
||
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions |
||
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions |
||
Recognize Actions by Disentangling Components of Dynamics |
||
Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains |
||
Attention-aware Compositional Network for Person Re-Identification |
||
HATS: Histograms of Averaged Time Surfaces for Robust Event-based Object Classification |
||
Mask-guided Contrastive Attention Model for Person Re-Identification |
||
Pose-Guided Photorealistic Face Rotation |
||
Pose-Guided Photorealistic Face Rotation |
||
Automatic 3D Indoor Scene Modeling from Single Panorama |
||
Automatic 3D Indoor Scene Modeling from Single Panorama |
||
SobolevFusion: 3D Reconstruction of Scenes Undergoing Free Non-rigid Motion |
||
SobolevFusion: 3D Reconstruction of Scenes Undergoing Free Non-rigid Motion |
||
A Biresolution Spectral framework for Product Quantization |
||
Dynamic Zoom-in Network for Fast Object Detection in Large Images |
||
On the Importance of Label Quality for Semantic Segmentation |
||
EPINET: A Fully-Convolutional Neural Network for Light Field Depth Estimation by Using Epipolar Geometry |
||
A Pose-Sensitive Embedding for Person Re-Identification with Expanded Cross Neighborhood Re-Ranking |
||
Erase or Fill? Deep Joint Recurrent Rain Removal and Reconstruction in Videos |
||
Scalable and Effective Deep CCA via Soft Decorrelation |
||
High-order tensor regularization with application to attribute ranking |
||
3D-RCNN: Instance-level 3D Scene Understanding via Render-and-Compare |
||
3D-RCNN: Instance-level 3D Scene Understanding via Render-and-Compare |
||
FoldingNet: Interpretable Unsupervised Learning on 3D Point Clouds |
||
FoldingNet: Interpretable Unsupervised Learning on 3D Point Clouds |
||
Defocus Blur Detection via Multi-Stream Bottom-Top-Bottom Fully Convolutional Network |
||
Decorrelated Batch Normalization |
||
Unsupervised Textual Grounding: Linking Words to Image Concepts |
||
Unsupervised Textual Grounding: Linking Words to Image Concepts |
||
Scale-recurrent Network for Deep Image Deblurring |
||
Low-Shot Recognition with Imprinted Weights |
||
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering |
||
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering |
||
Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation |
||
Facelet-Bank for Fast Portrait Manipulation |
||
Duplex Generative Adversarial Network for Unsupervised Domain Adaptation |
||
Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation |
||
Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks |
||
Structure Preserving Video Prediction |
||
Tagging Like Humans: Diverse and Distinct Image Annotation |
||
Learning to Sketch with Shortcut Cycle Consistency |
||
GroupCap: Group-based Image Captioning with Structured Relevance and Diversity Constraints |
||
Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks |
||
Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks |
||
Hyperparameter Optimization for Tracking with Continuous Deep Q-Learning |
||
Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective |
||
Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective |
||
NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning |
||
NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning |
||
Detecting and Recognizing Human-Object Interactions |
||
Detecting and Recognizing Human-Object Interactions |
||
Augmenting Crowd-Sourced 3D Reconstructions using Semantic Detections |
||
Visual Relationship Learning with a Factorization-based Prior |
||
Re-weighted Adversarial Adaptation Network for Unsupervised Domain Adaptation |
||
Flow Guided Recurrent Neural Encoder for Video Salient Object Detection |
||
Disentangling 3D Pose in A Dendritic CNN for Unconstrained 2D Face Alignment |
||
Progressive Attention Guided Recurrent Network for Salient Object Detection |
||
Answer with Grounding Snippets: Focal Visual-Text Attention for Visual Question Answering |
||
Answer with Grounding Snippets: Focal Visual-Text Attention for Visual Question Answering |
||
Unsupervised Learning of Depth and Egomotion from Monocular Video Using 3D Geometric Constraints |
||
Repulsion Loss: Detecting Pedestrians in a Crowd |
||
PU-Net: Point Cloud Upsampling Network |
||
Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF |
||
Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF |
||
PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection |
||
Gated Fusion Network for Single Image Dehazing |
||
Interleaved Structured Sparse Convolutional Neural Networks |
||
Interleaved Structured Sparse Convolutional Neural Networks |
||
Where and Why Are They Looking? Jointly Inferring Human Attention and Intentions in Complex Tasks |
||
End-to-end Flow Correlation Tracking with Spatial-temporal Attention |
||
Left/Right Asymmetric Layer Skippable Networks |
||
Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation |
||
Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation |
||
VITAL: VIsual Tracking via Adversarial Learning |
||
VITAL: VIsual Tracking via Adversarial Learning |
||
RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints |
||
Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints |
||
Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints |
||
Squeeze-and-Excitation Networks |
||
Squeeze-and-Excitation Networks |
||
Edit Probability for Scene Text Recognition |
||
Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning |
||
Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning |
||
Exploit the Unknown Gradually:~ One-Shot Video-Based Person Re-Identification by Stepwise Learning |
||
Learning to Localize Sound Source in Visual Scenes |
||
Dynamic Few-Shot Visual Learning without Forgetting |
||
Weakly-Supervised Semantic Segmentation by Iteratively Mining Common Object Features |
||
SINT++: Robust Visual Tracking via Adversarial Hard Positive Generation |
||
Real-Time Monocular Depth Estimation using Synthetic Data with Domain Adaptation via Image Style Transfer |
||
Fast and Accurate Single Image Super-Resolution via Information Distillation Network |
||
Low-Latency Video Semantic Segmentation |
||
Low-Latency Video Semantic Segmentation |
||
Domain Adaptive Faster R-CNN for Object Detection in the Wild |
||
DoubleFusion: Real-time Capture of Human Performance with Inner Body Shape from a Single Depth Sensor |
||
DoubleFusion: Real-time Capture of Human Performance with Inner Body Shape from a Single Depth Sensor |
||
Lean Multiclass Crowdsourcing |
||
Lean Multiclass Crowdsourcing |
||
Tell Me Where To Look: Guided Attention Inference Network |
||
Tell Me Where To Look: Guided Attention Inference Network |
||
Residual Dense Network for Image Super-Resolution |
||
Residual Dense Network for Image Super-Resolution |
||
Look at Boundary: A Boundary-Aware Face Alignment Algorithm |
||
Imagination-IQA: No-reference Image Quality Assessment via Adversarial Learning |
||
Memory Matching Networks for One-Shot Image Recognition |
||
3D Human Pose Estimation in the Wild by Adversarial Learning |
||
Unsupervised Training for 3D Morphable Model Regression |
||
Unsupervised Training for 3D Morphable Model Regression |
||
Scalable Dense Non-rigid Structure-from-Motion: A Grassmannian Perspective |
||
IQA: Visual Question Answering in Interactive Environments |
||
Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking |
||
Low-shot Learning from Imaginary Data |
||
Low-shot Learning from Imaginary Data |
||
Deep Regression Forests for Age Estimation |
||
Partial Transfer Learning with Selective Adversarial Networks |
||
Partial Transfer Learning with Selective Adversarial Networks |
||
A Bi-directional Message Passing Model for Salient Object Detection |
||
Transductive Unbiased Embedding for Zero-Shot Learning |
||
Scale-Transferrable Object Detection |
||
Crowd Counting with Deep Negative Correlation Learning |
||
Deep Cauchy Hashing for Hamming Space Retrieval |
||
Demo2Vec: Reasoning Object Affordances from Online Videos |
||
GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition |
||
An End-to-End TextSpotter with Explicit Alignment and Attention |
||
Stereoscopic Neural Style Transfer |
||
Bootstrapping the Performance of Webly Supervised Semantic Segmentation |
||
Learning Markov Clustering Networks for Scene Text Detection |
||
Collaborative and Adversarial Network for Unsupervised domain adaptation |
||
Collaborative and Adversarial Network for Unsupervised domain adaptation |
||
Reflection Removal for Large-Scale 3D Point Clouds |
||
Pose Transferrable Person Re-Identification |
||
Learning to Adapt Structured Output Space for Semantic Segmentation |
||
Learning to Adapt Structured Output Space for Semantic Segmentation |
||
Efficient Diverse Ensemble for Discriminative Co-Tracking |
||
Learning a Single Convolutional Super-Resolution Network for Multiple Degradations |
||
Probabilistic Plant Modeling via Multi-View Image-to-Image Translation |
||
Learning to Parse Wireframes in Images of Man-Made Environments |
||
A Variational U-Net for Conditional Appearance and Shape Generation |
||
A Variational U-Net for Conditional Appearance and Shape Generation |
||
Learning to Find Good Correspondences |
||
Learning to Find Good Correspondences |
||
Actor and Action Video Segmentation from a Sentence |
||
Actor and Action Video Segmentation from a Sentence |
||
Towards a Mathematical Understanding of the Difficulty in Learning with Feedforward Neural Networks |
||
Weakly-supervised Deep Convolutional Neural Network Learning for Facial Action Unit Intensity Estimation |
||
Maximum Classifier Discrepancy for Unsupervised Domain Adaptation |
||
Maximum Classifier Discrepancy for Unsupervised Domain Adaptation |
由于微信字数限制,没有全部显示,详细 list 请查看 Amusi 整理的
https://github.com/amusi/daily-paper-computer-vision
【加入社群】
新智元 AI 技术 + 产业社群招募中,欢迎对 AI 技术 + 产业落地感兴趣的同学,加小助手微信号: aiera2015_3 入群;通过审核后我们将邀请进群,加入社群后务必修改群备注(姓名 - 公司 - 职位;专业群审核较严,敬请谅解)。