计算机视觉领域顶会CVPR 2018 接受论文列表

2018 年 5 月 26 日 专知

【导读】计算机视觉最具影响力的学术会议之一的IEEE CVPR将于2018年6月18日-22日在美国盐湖城召开举行。据 CVPR 官网显示，今年大会有超过 3300 篇论文投稿，其中录取 979 篇；相比去年 783 篇论文，今年增长了近 25%。

详细录用名单日前已经公布，可参见：http://cvpr2018.thecvf.com/files/cvpr_2018_final_accept_list.txt

https://github.com/amusi/daily-paper-computer-vision/blob/master/2018/cvpr2018-paper-list.csv

▌论文列表：

CVPR 2018 Accepted Papers

Single-Shot Refinement Neural Network for Object Detection
Video Captioning via Hierarchical Reinforcement Learning
DensePose: Multi-Person Dense Human Pose Estimation In The Wild
DensePose: Multi-Person Dense Human Pose Estimation In The Wild
Frustum PointNets for 3D Object Detection from RGB-D Data
Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge
Rethinking the Faster R-CNN Architecture for Temporal Action Localization
Shape from Shading through Shape Evolution
Shape from Shading through Shape Evolution
A High-Quality Denoising Dataset for Smartphone Cameras
Improving Color Reproduction Accuracy in the Camera Imaging Pipeline
End-to-End Dense Video Captioning with Masked Transformer
End-to-End Dense Video Captioning with Masked Transformer
pOSE: Pseudo Object Space Error for Initialization-Free Bundle Adjustment
Learning to Segment Every Thing
Density-aware Single Image De-raining using a Multi-stream Dense Network
Densely Connected Pyramid Dehazing Network
Embodied Question Answering
TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays
TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays
Towards Open-Set Identity Preserving Face Synthesis
Baseline Desensitizing In Translation Averaging
Learning from the Deep: A Revised Underwater Image Formation Model
Context Encoding for Semantic Segmentation
Context Encoding for Semantic Segmentation
Deep Texture Manifold for Ground Terrain Recognition
DS*: Tighter Lifting-Free Convex Relaxations for Quadratic Matching Problems
Sparse, Smart Contours to Represent and Edit Images
Every Smile is Unique: Landmark-guided Diverse Smile Generation
Generative Non-Rigid Shape Completion with Graph Convolutional Autoencoders
Learning a Discriminative Prior for Blind Image Deblurring
Attentional ShapeContextNet for Point Cloud Recognition
Learning Superpixels with Segmentation-Aware Affinity Loss
Real-World Repetition Estimation by Div, Grad and Curl
Real-World Repetition Estimation by Div, Grad and Curl
Recurrent Saliency Transformation Network: Incorporating Multi-Stage Visual Cues for Small Organ Segmentation
MegaDepth: Learning Single-View Depth Prediction from Internet Photos
Learning Intrinsic Image Decomposition from Watching the World
Learning Intrinsic Image Decomposition from Watching the World
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Human-centric Indoor Scene Synthesis Using Stochastic Grammar
Learning by Asking Questions
Instance Embedding Transfer to Unsupervised Video Object Segmentation
Detect-and-Track: Efficient Pose Estimation in Videos
Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval
Guided Proofreading of Automatic Segmentations for Connectomics
Augmented Skeleton Space Transfer for Depth-based Hand Pose Estimation
Augmented Skeleton Space Transfer for Depth-based Hand Pose Estimation
Context-aware Synthesis for Video Frame Interpolation
2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning
NAG: Network for Adversary Generation
LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation
LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation
Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration
Multi-view Harmonized Bilinear Network for 3D Object Recognition
Multi-view Harmonized Bilinear Network for 3D Object Recognition
Tangent Convolutions for Dense Prediction in 3D
Tangent Convolutions for Dense Prediction in 3D
Semi-parametric Image Synthesis
Semi-parametric Image Synthesis
Interactive Image Segmentation with Latent Diversity
3D Hand Pose Estimation: From Current Achievements to Future Goals
3D Hand Pose Estimation: From Current Achievements to Future Goals
W2F: A Weakly-Supervised to Fully-Supervised Framework for Object Detection
BlockDrop: Dynamic Inference Paths in Residual Networks
BlockDrop: Dynamic Inference Paths in Residual Networks
MapNet: Geometry-Aware Learning of Maps for Camera Localization
MapNet: Geometry-Aware Learning of Maps for Camera Localization
BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning
Salient Object Detection Driven by Fixation Prediction
3D Object Detection with Latent Support Surfaces
Practical Block-wise Neural Network Architecture Generation
Practical Block-wise Neural Network Architecture Generation
Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points
Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning
Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning
Visual Grounding via Accumulated Attention
Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors
ISTA-Net: Interpretable Optimization-Inspired Deep Network for Image Compressive Sensing
Perturbative Neural Networks: Rethinking Convolution in CNNs
Nonlinear 3D Face Morphable Model
Nonlinear 3D Face Morphable Model
Neural Baby Talk
Neural Baby Talk
Towards Pose Invariant Face Recognition in the Wild
MoNet: Deep Motion Exploitation for Video Object Segmentation
Exploring Disentangled Feature Representation Beyond Face Identification
Towards Effective Low-bitwidth Convolutional Neural Networks
Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries
Learning Facial Action Units from Web Images with Scalable Weakly Supervised Clustering
Few-Shot Image Recognition by Predicting Parameters from Activations
Few-Shot Image Recognition by Predicting Parameters from Activations
Single-Shot Object Detection with Enriched Semantics
Unifying Identification and Context Learning for Person Recognition
Separating Self-Expression and Visual Content in Hashtag Supervision
Multi-Cue Correlation Filters for Robust Visual Tracking
Beyond Trade-off: Accelerate FCN-based Face Detection with Higher Accuracy
On the Robustness of Semantic Segmentation Models to Adversarial Attacks
PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume
PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume
Illuminant Spectra-based Source Separation Using Flash Photography
Illuminant Spectra-based Source Separation Using Flash Photography
Tracking Multiple Objects Outside the Line of Sight using Speckle Imaging
Tracking Multiple Objects Outside the Line of Sight using Speckle Imaging
Improved Human Pose Estimation through Adversarial Data Augmentation
Generative Adversarial Learning Towards Fast Weakly Supervised Detection
Audio to Body Dynamics
Audio to Body Dynamics
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
Frame-Recurrent Video Super-Resolution
Deep Mutual Learning
Real-world Anomaly Detection in Surveillance Videos
Soccer on Your Tabletop
Diversity Regularized Spatiotemporal Attention for Video-based Person Re-identification
HashGAN: Deep Learning to Hash with Pair Conditional Wasserstein GAN
Excitation Backprop for RNNs
Dynamic-Structured Semantic Propagation Network
Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation
Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation
SPLATNet: Sparse Lattice Networks for Point Cloud Processing
SPLATNet: Sparse Lattice Networks for Point Cloud Processing
Video Representation Learning Using Discriminative Pooling
Attend and Interact: Higher-Order Object Interactions for Video Understanding
Human Pose Estimation with Parsing Induced Learner
4D Human Body Correspondences from Panoramic Depth Maps
Recognizing Human Actions as Evolution of Pose Estimation Maps
GraphBit: Bitwise Interaction Mining via Deep Reinforcement Learning
Deep Adversarial Metric Learning
Deep Adversarial Metric Learning
Revisiting Video Saliency: A Large-scale Benchmark and a New Model
Graph-Cut RANSAC
Five-point Fundamental Matrix Estimation for Uncalibrated Cameras
Hashing as Tie-Aware Learning to Rank
Optimizing Local Feature Descriptors for Nearest Neighbor Matching
Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies
Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies
Consensus Maximization for Semantic Region Correspondences
Consensus Maximization for Semantic Region Correspondences
ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing
Motion-Guided Cascaded Refinement Network for Video Object Segmentation
Zigzag Learning for Weakly Supervised Object Detection
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models
VITON: An Image-based Virtual Try-on Network
VITON: An Image-based Virtual Try-on Network
Cross-Domain Self-supervised Multi-task Feature Learning Using Synthetic Game Imagery
LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image
Thoracic Disease Identification and Localization with Limited Supervision
Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation
Deep End-to-End Time-of-Flight Imaging
Fast and Accurate Online Video Object Segmentation via Tracking Parts
Fast and Accurate Online Video Object Segmentation via Tracking Parts
Min-Entropy Latent Model for Weakly Supervised Object Detection
Future Frame Prediction for Anomaly Detection A New Baseline
Face Aging with Identity-Preserved Conditional Generative Adversarial Networks
Learning to Compare: Relation Network for Few-Shot Learning
Deep Layer Aggregation
Deep Layer Aggregation
Style Aggregated Network for Facial Landmark Detection
M3: Multimodal Memory Modelling for Video Captioning
M3: Multimodal Memory Modelling for Video Captioning
Classification Driven Dynamic Image Enhancement
Generative Image Inpainting with Contextual Attention
Iterative Visual Reasoning Beyond Convolutions
Iterative Visual Reasoning Beyond Convolutions
Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification
Textbook Question Answering under Teacher Guidance with Memory Networks
Textbook Question Answering under Teacher Guidance with Memory Networks
Multi-Level Factorisation Net for Person Re-Identification
Functional Map of the World
Functional Map of the World
A Two-Step Disentanglement Method
Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
Left-Right Comparative Recurrent Model for Stereo Matching
Left-Right Comparative Recurrent Model for Stereo Matching
Analytic Expressions for Probabilistic Moments of PL-DNN with Gaussian Input
Analytic Expressions for Probabilistic Moments of PL-DNN with Gaussian Input
Zero-Shot Sketch-Image Hashing
Zero-Shot Sketch-Image Hashing
Interpretable Convolutional Neural Networks
Interpretable Convolutional Neural Networks
Reconstructing Thin Structures of Manifold Surfaces by Integrating Spatial Curves
Enhancing the Spatial Resolution of Stereo Images using a Parallax Prior
Anticipating Traffic Accidents with Adaptive Loss and Large-scale Incident DB
Generating Synthetic X-ray Images of a Person from the Surface Geometry
Generating Synthetic X-ray Images of a Person from the Surface Geometry
Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification
Unsupervised CCA
Discovering Point Lights with Intensity Distance Fields
Universal Denoising Networks : A Novel CNN-based Network Architecture for Image Denoising
Easy Identification from Better Constraints: Multi-Shot Person Re-Identification from Reference Constraints
Recurrent Pixel Embedding for Instance Grouping
Recurrent Pixel Embedding for Instance Grouping
Recurrent Scene Parsing with Perspective Understanding in the Loop
Learning to Hash by Discrepancy Minimization
Fast End-to-End Trainable Guided Filter
Disentangling Structure and Aesthetics for Content-aware Image Completion
An Analysis of Scale Invariance in Object Detection - SNIP
An Analysis of Scale Invariance in Object Detection - SNIP
CSGNet: Neural Shape Parser for Constructive Solid Geometry
Finding Tiny Faces in the Wild with Generative Adversarial Network
Finding Tiny Faces in the Wild with Generative Adversarial Network
SSNet: Scale Selection Network for Online 3D Action Prediction
SSNet: Scale Selection Network for Online 3D Action Prediction
Integrated facial landmark localization and super-resolution of real-world very low resolution faces in arbitrary poses with GANs
Integrated facial landmark localization and super-resolution of real-world very low resolution faces in arbitrary poses with GANs
The Best of Both Worlds: Combining CNNs and Geometric Constraints for Hierarchical Motion Segmentation
In-Place Activated BatchNorm for Memory-Optimized Training of DNNs
Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks
Deep Cross-media Knowledge Transfer
Deep Cross-media Knowledge Transfer
Coupled End-to-end Transfer Learning with Generalized Fisher Information
Knowledge Aided Consistency for Weakly Supervised Phrase Grounding
Viewpoint-aware Attentive Multi-view Inference for Vehicle Re-identification
MatNet: Modular Attention Network for Referring Expression Comprehension
CBMV: A Coalesced Bidirectional Matching Volume for Disparity Estimation
NISP: Pruning Networks using Neuron Importance Score Propagation
NISP: Pruning Networks using Neuron Importance Score Propagation
Who Let The Dogs Out? Modeling Dog Behavior From Visual Data
Efficient Video Object Segmentation via Network Modulation
Learning Deep Models for Face Anti-Spoofing: Binary or Auxiliary Supervision
Feedback-prop: Convolutional Neural Network Inference under Partial Evidence
A Memory Network Approach for Story-based Temporal Summarization of 360?Videos
Improving Occlusion and Hard Negative Handling for Single-Stage Object Detectors
UV-GAN: Adversarial Facial UV Map Completion for Pose-invariant Face Recognition
Learning a Toolchain for Image Restoration
Learning a Toolchain for Image Restoration
Learning to Act Properly: Predicting and Explaining Affordances from Images
Learning a Discriminative Feature Network for Semantic Segmentation
Optimizing Video Object Detection via a Scale-Time Lattice
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
Cascaded Pyramid Network for Multi-Person Pose Estimation
Seeing Temporal Modulation of Lights from Standard Cameras
Point-wise Convolutional Neural Networks
Fine-grained Video Captioning for Sports Narrative
Fine-grained Video Captioning for Sports Narrative
Dense 3D Regression for Hand Pose Estimation
Missing Slice Recovery for Tensors Using a Low-rank Model in Embedded Space
Learning Convolutional Networks for Content-weighted Image Compression
Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking
Deep Cost-Sensitive and Order-Preserving Feature Learning for Cross-Population Age Estimation
First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations
Hand PointNet: 3D Hand Pose Estimation using Point Sets
Hand PointNet: 3D Hand Pose Estimation using Point Sets
Recovering Realistic Texture in Image Super-resolution by Spatial Feature Modulation
Cube Padding for Weakly-Supervised Saliency Prediction in 360$^{\circ}$ Videos
A Face to Face Neural Conversation Model
SurfConv: Bridging 3D and 2D Convolution for RGBD Images
Dynamic Video Segmentation Network
Multiple Granularity Group Interaction Prediction
Visual Question Reasoning on General Dependency Tree
Visual Question Reasoning on General Dependency Tree
From Lifestyle VLOGs to Everyday Interactions
COCO-Stuff: Thing and Stuff Classes in Context
GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB
GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB
Non-local Neural Networks
Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs
Taskonomy: Disentangling Task Transfer Learning
Taskonomy: Disentangling Task Transfer Learning
Embodied Real-World Active Perception
Embodied Real-World Active Perception
SfSNet : Learning Shape, Reflectance and Illuminance of Faces `in the wild'
SfSNet : Learning Shape, Reflectance and Illuminance of Faces `in the wild'
End-to-end Recovery of Human Shape and Pose
Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene
Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction
A Fast Resection-Intersection Method for the Known Rotation Problem
Image Generation from Scene Graphs
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets
PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Finding It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Video"
Finding It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Video"
Unsupervised Cross-dataset Person Re-identification by Transfer Learning of Spatio-temporal Patterns
Kernelized Subspace Pooling for Deep Local Descriptors
Video Rain Removal By Multiscale Convolutional Sparse Coding
Learning from Millions of 3D Scans for Large-scale 3D Face Recognition
Referring Relationships
Improving Object Localization with Fitness NMS and Bounded IoU Loss
Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination
Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination
CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-Localization
CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-Localization
Visual Question Generation as Dual Task of Visual Question Answering
Visual Question Generation as Dual Task of Visual Question Answering
Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi- Supervised Semantic Segmentation
Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi- Supervised Semantic Segmentation
Learning Dual Convolutional Neural Networks for Low-Level Vision
Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation
MegDet: A Large Mini-Batch Object Detector
MegDet: A Large Mini-Batch Object Detector
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks
TOM-Net: Learning Transparent Object Matting from a Single Image
TOM-Net: Learning Transparent Object Matting from a Single Image
End-to-End Deep Kronecker-Product Matching for Person Re-identification
Semantic Visual Localization
Joint Cuts and Matching of Partitions in One Graph
Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions
Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions
Crowd Counting via Adversarial Cross-Scale Consistency Pursuit
Deep Group-shuffling Random Walk for Person Re-identification
Learning to Detect Features in Texture Images
Learning to Detect Features in Texture Images
Transferable Joint Attribute-Identity Deep Learning for Unsupervised Person Re-Identification
CarFusion: Combining Point Tracking and Part Detection for Dynamic 3D Reconstruction of Vehicles
Context-aware Deep Feature Compression for High-speed Visual Tracking
Deep Material-aware Cross-spectral Stereo Matching
Deep Extreme Cut: From Extreme Points to Object Segmentation
Label Denoising Adversarial Network (LDAN) for Inverse Lighting of Face Images
Label Denoising Adversarial Network (LDAN) for Inverse Lighting of Face Images
Harmonious Attention Network for Person Re-Identication
Unsupervised Deep Generative Adversarial Hashing Network
Unsupervised Deep Generative Adversarial Hashing Network
Pseudo-Mask Augmented Object Detection
LSTM stack-based Neural Multi-sequence Alignment TeCHnique (NeuMATCH)
LSTM stack-based Neural Multi-sequence Alignment TeCHnique (NeuMATCH)
Adversarial Complementary Learning for Weakly Supervised Object Localization
Unsupervised Discovery of Object Landmarks as Structural Representations
Unsupervised Discovery of Object Landmarks as Structural Representations
DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map
Monocular Relative Depth Perception with Web Stereo Data Supervision
Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification
Objects as context for detecting their semantic parts
Camera Style Adaptation for Person Re-identification
Conditional Generative Adversarial Network for Structured Domain Adaptation
Rotation-sensitive Regression for Oriented Scene Text Detection
Residual Parameter Transfer for Deep Domain Adaptation
SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation
SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation
Weakly Supervised Instance Segmentation using Class Peak Response
Weakly Supervised Instance Segmentation using Class Peak Response
Robust Facial Landmark Detection via a Fully-Convolutional Local-Global Context Network
Rotation Averaging and Strong Duality
Rotation Averaging and Strong Duality
PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning
Im2Flow: Motion Hallucination from Static Images for Action Recognition
Im2Flow: Motion Hallucination from Static Images for Action Recognition
Feature Quantization for Defending Against Distortion of Images
End-to-end weakly-supervised semantic alignment
PointGrid: A Deep Network for 3D Shape Understanding
PointGrid: A Deep Network for 3D Shape Understanding
Imagine it for me: Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts
A Minimalist Approach to Type-Agnostic Detection of Quadrics in Point Clouds
A Benchmark for Articulated Human Pose Estimation and Tracking
Boosting Self-Supervised Learning via Knowledge Transfer
PPFNet: Global Context Aware Local Features for Robust 3D Point Matching
PPFNet: Global Context Aware Local Features for Robust 3D Point Matching
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
Fast Video Object Segmentation by Reference-Guided Mask Propagation
Fast Video Object Segmentation by Reference-Guided Mask Propagation
Super-Resolving Very Low-Resolution Face Images with Supplementary Attributes
Video Person Re-identification with Competitive Snippet-similarity Aggregation and Co-attentive Snippet Embedding
One-shot Action Localization by Sequence Matching Network
Efficient Subpixel Refinement with Symbolic Linear Predictors
Distort-and-Recover: Color Enhancement using Deep Reinforcement Learning
Group Consistent Similarity Learning via Deep CRFs for Person Re-Identification
Group Consistent Similarity Learning via Deep CRFs for Person Re-Identification
Single Image Reflection Separation with Perceptual Losses
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
Recognize Actions by Disentangling Components of Dynamics
Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains
Attention-aware Compositional Network for Person Re-Identification
HATS: Histograms of Averaged Time Surfaces for Robust Event-based Object Classification
Mask-guided Contrastive Attention Model for Person Re-Identification
Pose-Guided Photorealistic Face Rotation
Pose-Guided Photorealistic Face Rotation
Automatic 3D Indoor Scene Modeling from Single Panorama
Automatic 3D Indoor Scene Modeling from Single Panorama
SobolevFusion: 3D Reconstruction of Scenes Undergoing Free Non-rigid Motion
SobolevFusion: 3D Reconstruction of Scenes Undergoing Free Non-rigid Motion
A Biresolution Spectral framework for Product Quantization
Dynamic Zoom-in Network for Fast Object Detection in Large Images
On the Importance of Label Quality for Semantic Segmentation
EPINET: A Fully-Convolutional Neural Network for Light Field Depth Estimation by Using Epipolar Geometry
A Pose-Sensitive Embedding for Person Re-Identification with Expanded Cross Neighborhood Re-Ranking
Erase or Fill? Deep Joint Recurrent Rain Removal and Reconstruction in Videos
Scalable and Effective Deep CCA via Soft Decorrelation
High-order tensor regularization with application to attribute ranking
3D-RCNN: Instance-level 3D Scene Understanding via Render-and-Compare
3D-RCNN: Instance-level 3D Scene Understanding via Render-and-Compare
FoldingNet: Interpretable Unsupervised Learning on 3D Point Clouds
FoldingNet: Interpretable Unsupervised Learning on 3D Point Clouds
Defocus Blur Detection via Multi-Stream Bottom-Top-Bottom Fully Convolutional Network
Decorrelated Batch Normalization
Unsupervised Textual Grounding: Linking Words to Image Concepts
Unsupervised Textual Grounding: Linking Words to Image Concepts
Scale-recurrent Network for Deep Image Deblurring
Low-Shot Recognition with Imprinted Weights
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation
Facelet-Bank for Fast Portrait Manipulation
Duplex Generative Adversarial Network for Unsupervised Domain Adaptation
Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation
Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks
Structure Preserving Video Prediction
Tagging Like Humans: Diverse and Distinct Image Annotation
Learning to Sketch with Shortcut Cycle Consistency
GroupCap: Group-based Image Captioning with Structured Relevance and Diversity Constraints
Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks
Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks
Hyperparameter Optimization for Tracking with Continuous Deep Q-Learning
Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective
Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective
NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning
NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning
Detecting and Recognizing Human-Object Interactions
Detecting and Recognizing Human-Object Interactions
Augmenting Crowd-Sourced 3D Reconstructions using Semantic Detections
Visual Relationship Learning with a Factorization-based Prior
Re-weighted Adversarial Adaptation Network for Unsupervised Domain Adaptation
Flow Guided Recurrent Neural Encoder for Video Salient Object Detection
Disentangling 3D Pose in A Dendritic CNN for Unconstrained 2D Face Alignment
Progressive Attention Guided Recurrent Network for Salient Object Detection
Answer with Grounding Snippets: Focal Visual-Text Attention for Visual Question Answering
Answer with Grounding Snippets: Focal Visual-Text Attention for Visual Question Answering
Unsupervised Learning of Depth and Egomotion from Monocular Video Using 3D Geometric Constraints
Repulsion Loss: Detecting Pedestrians in a Crowd
PU-Net: Point Cloud Upsampling Network
Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF
Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF
PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection
Gated Fusion Network for Single Image Dehazing
Interleaved Structured Sparse Convolutional Neural Networks
Interleaved Structured Sparse Convolutional Neural Networks
Where and Why Are They Looking? Jointly Inferring Human Attention and Intentions in Complex Tasks
End-to-end Flow Correlation Tracking with Spatial-temporal Attention
Left/Right Asymmetric Layer Skippable Networks
Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation
Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation
VITAL: VIsual Tracking via Adversarial Learning
VITAL: VIsual Tracking via Adversarial Learning
RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints
Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints
Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints
Squeeze-and-Excitation Networks
Squeeze-and-Excitation Networks
Edit Probability for Scene Text Recognition
Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning
Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning
Exploit the Unknown Gradually:~ One-Shot Video-Based Person Re-Identification by Stepwise Learning
Learning to Localize Sound Source in Visual Scenes
Dynamic Few-Shot Visual Learning without Forgetting
Weakly-Supervised Semantic Segmentation by Iteratively Mining Common Object Features
SINT++: Robust Visual Tracking via Adversarial Hard Positive Generation
Real-Time Monocular Depth Estimation using Synthetic Data with Domain Adaptation via Image Style Transfer
Fast and Accurate Single Image Super-Resolution via Information Distillation Network
Low-Latency Video Semantic Segmentation
Low-Latency Video Semantic Segmentation
Domain Adaptive Faster R-CNN for Object Detection in the Wild
DoubleFusion: Real-time Capture of Human Performance with Inner Body Shape from a Single Depth Sensor
DoubleFusion: Real-time Capture of Human Performance with Inner Body Shape from a Single Depth Sensor
Lean Multiclass Crowdsourcing
Lean Multiclass Crowdsourcing
Tell Me Where To Look: Guided Attention Inference Network
Tell Me Where To Look: Guided Attention Inference Network
Residual Dense Network for Image Super-Resolution
Residual Dense Network for Image Super-Resolution
Look at Boundary: A Boundary-Aware Face Alignment Algorithm
Imagination-IQA: No-reference Image Quality Assessment via Adversarial Learning
Memory Matching Networks for One-Shot Image Recognition
3D Human Pose Estimation in the Wild by Adversarial Learning
Unsupervised Training for 3D Morphable Model Regression
Unsupervised Training for 3D Morphable Model Regression
Scalable Dense Non-rigid Structure-from-Motion: A Grassmannian Perspective
IQA: Visual Question Answering in Interactive Environments
Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking
Low-shot Learning from Imaginary Data
Low-shot Learning from Imaginary Data
Deep Regression Forests for Age Estimation
Partial Transfer Learning with Selective Adversarial Networks
Partial Transfer Learning with Selective Adversarial Networks
A Bi-directional Message Passing Model for Salient Object Detection
Transductive Unbiased Embedding for Zero-Shot Learning
Scale-Transferrable Object Detection
Crowd Counting with Deep Negative Correlation Learning
Deep Cauchy Hashing for Hamming Space Retrieval
Demo2Vec: Reasoning Object Affordances from Online Videos
GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition
An End-to-End TextSpotter with Explicit Alignment and Attention
Stereoscopic Neural Style Transfer
Bootstrapping the Performance of Webly Supervised Semantic Segmentation
Learning Markov Clustering Networks for Scene Text Detection
Collaborative and Adversarial Network for Unsupervised domain adaptation
Collaborative and Adversarial Network for Unsupervised domain adaptation
Reflection Removal for Large-Scale 3D Point Clouds
Pose Transferrable Person Re-Identification
Learning to Adapt Structured Output Space for Semantic Segmentation
Learning to Adapt Structured Output Space for Semantic Segmentation
Efficient Diverse Ensemble for Discriminative Co-Tracking
Learning a Single Convolutional Super-Resolution Network for Multiple Degradations
Probabilistic Plant Modeling via Multi-View Image-to-Image Translation
Learning to Parse Wireframes in Images of Man-Made Environments
A Variational U-Net for Conditional Appearance and Shape Generation
A Variational U-Net for Conditional Appearance and Shape Generation
Learning to Find Good Correspondences
Learning to Find Good Correspondences
Actor and Action Video Segmentation from a Sentence
Actor and Action Video Segmentation from a Sentence
Towards a Mathematical Understanding of the Difficulty in Learning with Feedforward Neural Networks
Weakly-supervised Deep Convolutional Neural Network Learning for Facial Action Unit Intensity Estimation
Maximum Classifier Discrepancy for Unsupervised Domain Adaptation
Maximum Classifier Discrepancy for Unsupervised Domain Adaptation