Ranjay Krishna, Michael Bernstein, Li Fei-Fei:Information Maximizing Visual Question Generation CVPR2019 [https://arxiv.org/abs/1903.11207]
Amir Zadeh, Michael Chan, Paul Pu Liang, Edmund Tong, Louis-Philippe Morency: Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence. CVPR 2019[http://openaccess.thecvf.com/content_CVPR_2019/html/Zadeh_Social-IQ_A_Question_Answering_Benchmark_for_Artificial_Social_Intelligence_CVPR_2019_paper.html]
Kaihua Tang, Hanwang Zhang, Baoyuan Wu, Wenhan Luo, Wei Liu: Learning to Compose Dynamic Tree Structures for Visual Contexts. CVPR 2019[https://arxiv.org/abs/1812.01880]
Hyeonwoo Noh, Taehoon Kim, Jonghwan Mun, Bohyung Han:**** Transfer Learning via Unsupervised Task Discovery for Visual Question Answering. CVPR 2019[https://arxiv.org/abs/1810.02358]
Yao-Hung Hubert Tsai, Santosh Kumar Divvala, Louis-Philippe Morency, Ruslan Salakhutdinov, Ali Farhadi: Video Relationship Reasoning Using Gated Spatio-Temporal Energy Graph. CVPR 2019[https://arxiv.org/abs/1903.10547] [https://github.com/yaohungt/Gated-Spatio-Temporal-Energy-Graph]
Jiaxin Shi, Hanwang Zhang, Juanzi Li: Explainable and Explicit Visual Reasoning Over Scene Graphs. CVPR 2019[https://arxiv.org/abs/1812.01855] [https://github.com/shijx12/XNM-Net]
Rémi Cadène, Hedi Ben-younes, Matthieu Cord, Nicolas Thome:**** MUREL: Multimodal Relational Reasoning for Visual Question Answering. CVPR 2019:[https://arxiv.org/abs/1902.09487] [https://github.com/Cadene/murel.bootstrap.pytorch]
Dalu Guo, Chang Xu, Dacheng Tao:**** Image-Question-Answer Synergistic Network for Visual Dialog. CVPR 2019: [https://arxiv.org/abs/1902.09774]
Chenfei Wu, Jinlai Liu, Xiaojie Wang, Ruifan Li: Differential Networks for Visual Question Answering. AAAI 2019: [https://www.aaai.org/Papers/AAAI/2019/AAAI-WuC.76.pdf]
Hedi Ben-younes, Rémi Cadène, Nicolas Thome, Matthieu Cord: BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection. AAAI 2019:
Chenfei Wu, Jinlai Liu, Xiaojie Wang, Xuan Dong: Chain of Reasoning for Visual Question Answering. NeurIPS 2018: [https://papers.nips.cc/paper/7311-chain-of-reasoning-for-visual-question-answering]
Will Norcliffe-Brown, Stathis Vafeias, Sarah Parisot: Learning Conditioned Graph Structures for Interpretable Visual Question Answering. NeurIPS 2018: [https://papers.nips.cc/paper/8054-learning-conditioned-graph-structures-for-interpretable-visual-question-answering] [https://github.com/aimbrain/vqa-project]
Medhini Narasimhan, Svetlana Lazebnik, Alexander G. Schwing: Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering. NeurIPS 2018: [https://papers.nips.cc/paper/7531-out-of-the-box-reasoning-with-graph-convolution-nets-for-factual-visual-question-answering]
Sainandan Ramakrishnan, Aishwarya Agrawal, Stefan Lee: Overcoming Language Priors in Visual Question Answering with Adversarial Regularization. NeurIPS 2018: [https://papers.nips.cc/paper/7427-overcoming-language-priors-in-visual-question-answering-with-adversarial-regularization]
Somak Aditya, Yezhou Yang, Chitta Baral: Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering. AAAI 2018: [https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16446]
Zhihao Fan, Zhongyu Wei, Piji Li, Yanyan Lan, Xuanjing Huang: A Question Type Driven Framework to Diversify Visual Question Generation. IJCAI 2018: [https://www.ijcai.org/proceedings/2018/563]
Damien Teney, Peter Anderson, Xiaodong He, Anton van den Hengel: Tips and Tricks for Visual Question Answering: Learnings From the 2017 Challenge. CVPR 2018: [http://openaccess.thecvf.com/content_cvpr_2018/html/Teney_Tips_and_Tricks_CVPR_2018_paper.html]
Ishan Misra, Ross B. Girshick, Rob Fergus, Martial Hebert, Abhinav Gupta, Laurens van der Maaten: Learning by Asking Questions. CVPR 2018: [http://openaccess.thecvf.com/content_cvpr_2018/html/Misra_Learning_by_Asking_CVPR_2018_paper.html]
Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra: Embodied Question Answering. CVPR 2018: [http://openaccess.thecvf.com/content_cvpr_2018/html/Das_Embodied_Question_Answering_CVPR_2018_paper.html]
Danna Gurari, Qing Li, Abigale J. Stangl, Anhong Guo, Chi Lin, Kristen Grauman, Jiebo Luo, Jeffrey P. Bigham: VizWiz Grand Challenge: Answering Visual Questions From Blind People. CVPR 2018: [http://openaccess.thecvf.com/content_cvpr_2018/html/Gurari_VizWiz_Grand_Challenge_CVPR_2018_paper.html]
Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi: Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering. CVPR 2018: [http://openaccess.thecvf.com/content_cvpr_2018/html/Agrawal_Dont_Just_Assume_CVPR_2018_paper.html]
Hexiang Hu, Wei-Lun Chao, Fei Sha: Learning Answer Embeddings for Visual Question Answering. CVPR 2018: [http://openaccess.thecvf.com/content_cvpr_2018/html/Hu_Learning_Answer_Embeddings_CVPR_2018_paper.html]
Wei-Lun Chao, Hexiang Hu, Fei Sha: Cross-Dataset Adaptation for Visual Question Answering. CVPR 2018: [http://openaccess.thecvf.com/content_cvpr_2018/html/Chao_Cross-Dataset_Adaptation_for_CVPR_2018_paper.html]
Unnat Jain, Svetlana Lazebnik, Alexander G. Schwing: Two Can Play This Game: Visual Dialog With Discriminative Question Generation and Answering. CVPR 2018: [http://openaccess.thecvf.com/content_cvpr_2018/html/Jain_Two_Can_Play_CVPR_2018_paper.html]
Qingxing Cao, Xiaodan Liang, Bailing Li, Guanbin Li, Liang Lin: Visual Question Reasoning on General Dependency Tree. CVPR 2018: [http://openaccess.thecvf.com/content_cvpr_2018/html/Cao_Visual_Question_Reasoning_CVPR_2018_paper.html]
Feng Liu, Tao Xiang, Timothy M. Hospedales, Wankou Yang, Changyin Sun: IVQA: Inverse Visual Question Answering. CVPR 2018: [http://openaccess.thecvf.com/content_cvpr_2018/html/Liu_IVQA_Inverse_Visual_CVPR_2018_paper.html]
Andrew Shin, Yoshitaka Ushiku, Tatsuya Harada: Customized Image Narrative Generation via Interactive Visual Question Generation and Answering. CVPR 2018: [http://openaccess.thecvf.com/content_cvpr_2018/html/Shin_Customized_Image_Narrative_CVPR_2018_paper.html]
Chenfei Wu, Jinlai Liu, Xiaojie Wang, Xuan Dong: Object-Difference Attention: A Simple Relational Attention for Visual Question Answering. ACM Multimedia 2018: [https://dl.acm.org/citation.cfm?doid=3240508.3240513]
Zhiwei Fang, Jing Liu, Yanyuan Qiao, Qu Tang, Yong Li, Hanqing Lu: Enhancing Visual Question Answering Using Dropout. ACM Multimedia 2018: [https://doi.org/10.1145/3240508.3240662]
Xuanyi Dong, Linchao Zhu, De Zhang, Yi Yang, Fei Wu: Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering. ACM Multimedia 2018: [https://doi.org/10.1145/3240508.3240527]
Xiaomeng Song, Yucheng Shi, Xin Chen, Yahong Han: Explore Multi-Step Reasoning in Video Question Answering. ACM Multimedia 2018: [https://doi.org/10.1145/3240508.3240563]
Damien Teney, Anton van den Hengel: Visual Question Answering as a Meta Learning Task. ECCV (15) 2018: [http://openaccess.thecvf.com/content_ECCV_2018/html/Damien_Teney_Visual_Question_Answering_ECCV_2018_paper.html]
Peng Gao, Hongsheng Li, Shuang Li, Pan Lu, Yikang Li, Steven C. H. Hoi, Xiaogang Wang: Question-Guided Hybrid Convolution for Visual Question Answering. ECCV (1) 2018: [http://openaccess.thecvf.com/content_ECCV_2018/html/gao_peng_Question-Guided_Hybrid_Convolution_ECCV_2018_paper.html]
Youngjae Yu, Jongseok Kim, Gunhee Kim: A Joint Sequence Fusion Model for Video Question Answering and Retrieval. ECCV (7) 2018: [http://openaccess.thecvf.com/content_ECCV_2018/html/Youngjae_Yu_A_Joint_Sequence_ECCV_2018_paper.html]
Medhini Narasimhan, Alexander G. Schwing: Straight to the Facts: Learning Knowledge Base Retrieval for Factual Visual Question Answering. ECCV (8) 2018: [http://openaccess.thecvf.com/content_ECCV_2018/html/Medhini_Gulganjalli_Narasimhan_Straight_to_the_ECCV_2018_paper.html]
Kohei Uehara, Antonio Tejero-de-Pablos, Yoshitaka Ushiku, Tatsuya Harada: Visual Question Generation for Class Acquisition of Unknown Objects. ECCV (12) 2018: [http://openaccess.thecvf.com/content_ECCV_2018/html/Kohei_Uehara_Visual_Question_Generation_ECCV_2018_paper.html]
Peng Wang, Qi Wu, Chunhua Shen, Anthony R. Dick, Anton van den Hengel: FVQA: Fact-Based Visual Question Answering. IEEE Trans. Pattern Anal. Mach. Intell. 40(10): [https://arxiv.org/abs/1606.05433]
Alexander Trott, Caiming Xiong, Richard Socher: Interpretable Counting for Visual Question Answering. ICLR (Poster) 2018 [https://arxiv.org/abs/1712.08697]
Yan Zhang, Jonathon S. Hare, Adam Prügel-Bennett: Learning to Count Objects in Natural Images for Visual Question Answering. ICLR (Poster) 2018 [https://openreview.net/forum?id=B12Js_yRb]
Kushal Kafle, and Christopher Kanan.
Visual question answering: Datasets, algorithms, and future challenges.
Computer Vision and Image Understanding [2017].
Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick, CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning, CVPR 2017.
Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick, Inferring and Executing Programs for Visual Reasoning, arXiv:1705.03633, 2017. [https://arxiv.org/abs/1705.03633]
Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Kate Saenko, Learning to Reason: End-to-End Module Networks for Visual Question Answering, arXiv:1704.05526, 2017. [https://arxiv.org/abs/1704.05526]
Adam Santoro, David Raposo, David G.T. Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, Timothy Lillicrap, A simple neural network module for relational reasoning, arXiv:1706.01427, 2017. [https://arxiv.org/abs/1706.01427]
Hedi Ben-younes, Remi Cadene, Matthieu Cord, Nicolas Thome: MUTAN: Multimodal Tucker Fusion for Visual Question Answering [https://arxiv.org/pdf/1705.06676.pdf] [https://github.com/Cadene/vqa.pytorch]
Vahid Kazemi, Ali Elqursh, Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering, arXiv:1704.03162, 2016. [https://arxiv.org/abs/1704.03162] [https://github.com/Cyanogenoid/pytorch-vqa]
Kushal Kafle, and Christopher Kanan. An Analysis of Visual Question Answering Algorithms. arXiv:1703.09684, 2017. [https://arxiv.org/abs/1703.09684]
Hyeonseob Nam, Jung-Woo Ha, Jeonghee Kim, Dual Attention Networks for Multimodal Reasoning and Matching, arXiv:1611.00471, 2016. [https://arxiv.org/abs/1611.00471]
Jin-Hwa Kim, Kyoung Woon On, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang, Hadamard Product for Low-rank Bilinear Pooling, arXiv:1610.04325, 2016. [https://arxiv.org/abs/1610.04325]
Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, Marcus Rohrbach, Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding, arXiv:1606.01847, 2016. [https://arxiv.org/abs/1606.01847] [https://github.com/akirafukui/vqa-mcb]
Kuniaki Saito, Andrew Shin, Yoshitaka Ushiku, Tatsuya Harada, DualNet: Domain-Invariant Network for Visual Question Answering. arXiv:1606.06108v1, 2016. [https://arxiv.org/pdf/1606.06108.pdf]
Arijit Ray, Gordon Christie, Mohit Bansal, Dhruv Batra, Devi Parikh, Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions, arXiv:1606.06622, 2016. [https://arxiv.org/pdf/1606.06622v1.pdf]
Hyeonwoo Noh, Bohyung Han, Training Recurrent Answering Units with Joint Loss Minimization for VQA, arXiv:1606.03647, 2016. [http://arxiv.org/abs/1606.03647v1]
Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh, Hierarchical Question-Image Co-Attention for Visual Question Answering, arXiv:1606.00061, 2016. [https://arxiv.org/pdf/1606.00061v2.pdf] [https://github.com/jiasenlu/HieCoAttenVQA]
Jin-Hwa Kim, Sang-Woo Lee, Dong-Hyun Kwak, Min-Oh Heo, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang, Multimodal Residual Learning for Visual QA, arXiv:1606.01455, 2016. [https://arxiv.org/pdf/1606.01455v1.pdf]
Peng Wang, Qi Wu, Chunhua Shen, Anton van den Hengel, Anthony Dick, FVQA: Fact-based Visual Question Answering, arXiv:1606.05433, 2016. [https://arxiv.org/pdf/1606.05433.pdf]
Ilija Ilievski, Shuicheng Yan, Jiashi Feng, A Focused Dynamic Attention Model for Visual Question Answering, arXiv:1604.01485. [https://arxiv.org/pdf/1604.01485v1.pdf]
Yuke Zhu, Oliver Groth, Michael Bernstein, Li Fei-Fei, Visual7W: Grounded Question Answering in Images, CVPR 2016. [http://arxiv.org/abs/1511.03416]
Hyeonwoo Noh, Paul Hongsuck Seo, and Bohyung Han, Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction, CVPR, 2016.[http://arxiv.org/pdf/1511.05756.pdf]
Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Dan Klein, Learning to Compose Neural Networks for Question Answering, NAACL 2016. [http://arxiv.org/pdf/1601.01705.pdf]
Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Dan Klein, Deep compositional question answering with neural module networks, CVPR 2016. [https://arxiv.org/abs/1511.02799]
Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Smola, Stacked Attention Networks for Image Question Answering, CVPR 2016. [http://arxiv.org/abs/1511.02274] [https://github.com/JamesChuanggg/san-torch]
Kevin J. Shih, Saurabh Singh, Derek Hoiem, Where To Look: Focus Regions for Visual Question Answering, CVPR, 2015. [http://arxiv.org/pdf/1511.07394v2.pdf]
Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, Wei Xu, Ram Nevatia, ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering, arXiv:1511.05960v1, Nov 2015. [http://arxiv.org/pdf/1511.05960v1.pdf]
Huijuan Xu, Kate Saenko, Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering, arXiv:1511.05234v1, Nov 2015. [http://arxiv.org/abs/1511.05234]
Kushal Kafle and Christopher Kanan, Answer-Type Prediction for Visual Question Answering, CVPR 2016. [http://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Kafle_Answer-Type_Prediction_for_CVPR_2016_paper.html]
Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, VQA: Visual Question Answering, ICCV, 2015. [http://arxiv.org/pdf/1505.00468]
Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, VQA: Visual Question Answering, ICCV, 2015. [http://arxiv.org/pdf/1505.00468] [https://github.com/JamesChuanggg/VQA-tensorflow]
Bolei Zhou, Yuandong Tian, Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus, Simple Baseline for Visual Question Answering, arXiv:1512.02167v2, Dec 2015. [http://arxiv.org/abs/1512.02167]
Hauyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, Wei Xu, Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering, NIPS 2015. [http://arxiv.org/pdf/1505.05612.pdf]
Mateusz Malinowski, Marcus Rohrbach, Mario Fritz, Ask Your Neurons: A Neural-based Approach to Answering Questions about Images, ICCV 2015. [http://arxiv.org/pdf/1505.01121v3.pdf]
Mengye Ren, Ryan Kiros, Richard Zemel, Exploring Models and Data for Image Question Answering, ICML 2015. [http://arxiv.org/pdf/1505.02074.pdf]
Mateusz Malinowski, Mario Fritz, Towards a Visual Turing Challe, NIPS Workshop 2015. [http://arxiv.org/abs/1410.8027]
Mateusz Malinowski, Mario Fritz, A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input, NIPS 2014. [http://arxiv.org/pdf/1410.0210v4.pdf]
Zhou Yu, Jun Yu, Yuhao Cui, Dacheng Tao, Qi Tian: Deep Modular Co-Attention Networks for Visual Question Answering
[http://openaccess.thecvf.com/content_CVPR_2019/papers/Yu_Deep_Modular_Co-Attention_Networks_for_Visual_Question_Answering_CVPR_2019_paper.pdf] [https://github.com/MILVLG/mcan-vqa]
Yiyi Zhou, Rongrong Ji, Jinsong Su, Xiaoshuai Sun, Weiqiu Chen: Dynamic Capsule Attention for Visual Question Answering. AAAI 2019: [https://www.aaai.org/Papers/AAAI/2019/AAAI-ZhouYiyi2.3610.pdf] [https://github.com/XMUVQA/CapsAtt]
Lianli Gao, Pengpeng Zeng, Jingkuan Song, Yuan-Fang Li, Wu Liu, Tao Mei, Heng Tao Shen: Structured Two-Stream Attention Network for Video Question Answering. AAAI 2019:[https://www.aaai.org/ojs/index.php/AAAI/article/view/4602]
Xiangpeng Li, Jingkuan Song, Lianli Gao, Xianglong Liu, Wenbing Huang, Xiangnan He, Chuang Gan: Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering. AAAI 2019: [https://www.semanticscholar.org/paper/Beyond-RNNs%3A-Positional-Self-Attention-with-for-Li-Song/565359aac8914505e6b02db05822ee63d3ffd03a] [https://github.com/lixiangpengcs/PSAC]
Pan Lu, Hongsheng Li, Wei Zhang, Jianyong Wang, Xiaogang Wang: Co-Attending Free-Form Regions and Detections With Multi-Modal Multiplicative Feature Embedding for Visual Question Answering. AAAI 2018: [https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16249] [https://github.com/lupantech/dual-mfa-vqa]
Tingting Qiao, Jianfeng Dong, Duanqing Xu: Exploring Human-Like Attention Supervision in Visual Question Answering. AAAI 2018: [https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16485]
Yuetan Lin, Zhangyang Pang, Donghui Wang, Yueting Zhuang: Feature Enhancement in Attention for Visual Question Answering. IJCAI 2018: [https://www.ijcai.org/proceedings/2018/586]
Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang: Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. CVPR 2018: [http://openaccess.thecvf.com/content_cvpr_2018/html/Anderson_Bottom-Up_and_Top-Down_CVPR_2018_paper.html] [https://github.com/peteanderson80/bottom-up-attention] [https://github.com/facebookresearch/pythia] [https://github.com/hengyuan-hu/bottom-up-attention-vqa ]
Duy-Kien Nguyen, Takayuki Okatani: Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering. CVPR 2018: [http://openaccess.thecvf.com/content_cvpr_2018/html/Nguyen_Improved_Fusion_of_CVPR_2018_paper.html]
Yikang Li, Nan Duan, Bolei Zhou, Xiao Chu, Wanli Ouyang, Xiaogang Wang, Ming Zhou: Visual Question Generation as Dual Task of Visual Question Answering. CVPR 2018: [http://openaccess.thecvf.com/content_cvpr_2018/html/Li_Visual_Question_Generation_CVPR_2018_paper.html]
Junwei Liang, Lu Jiang, Liangliang Cao, Li-Jia Li, Alexander G. Hauptmann: Focal Visual-Text Attention for Visual Question Answering. CVPR 2018: [http://openaccess.thecvf.com/content_cvpr_2018/html/Liang_Focal_Visual-Text_Attention_CVPR_2018_paper.html]
Badri N. Patro, Vinay P. Namboodiri: Differential Attention for Visual Question Answering. CVPR 2018: [http://openaccess.thecvf.com/content_cvpr_2018/html/Patro_Differential_Attention_for_CVPR_2018_paper.html]
Yang Shi, Tommaso Furlanello, Sheng Zha, Animashree Anandkumar: Question Type Guided Attention in Visual Question Answering. ECCV (4) 2018: [http://openaccess.thecvf.com/content_ECCV_2018/html/Yang_Shi_Question_Type_Guided_ECCV_2018_paper.html]
Mateusz Malinowski, Carl Doersch, Adam Santoro, Peter W. Battaglia: Learning Visual Question Answering by Bootstrapping Hard Attention. ECCV (6) 2018: [http://openaccess.thecvf.com/content_ECCV_2018/html/Mateusz_Malinowski_Learning_Visual_Question_ECCV_2018_paper.html]
Pan Lu, Lei Ji, Wei Zhang, Nan Duan, Ming Zhou, Jianyong Wang: R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering. KDD 2018: [https://dl.acm.org/citation.cfm?doid=3219819.3220036]
Hedi Ben-younes, Remi Cadene, Matthieu Cord, Nicolas Thome: MUTAN: Multimodal Tucker Fusion for Visual Question Answering [https://arxiv.org/pdf/1705.06676.pdf] [https://github.com/Cadene/vqa.pytorch]
Jin-Hwa Kim, Kyoung Woon On, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang, Hadamard Product for Low-rank Bilinear Pooling, arXiv:1610.04325, 2016. [https://arxiv.org/abs/1610.04325]
Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, Marcus Rohrbach, Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding, arXiv:1606.01847, 2016. [https://arxiv.org/abs/1606.01847]
Hyeonwoo Noh, Bohyung Han, Training Recurrent Answering Units with Joint Loss Minimization for VQA, arXiv:1606.03647, 2016. [http://arxiv.org/abs/1606.03647v1]
Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh, Hierarchical Question-Image Co-Attention for Visual Question Answering, arXiv:1606.00061, 2016. [https://arxiv.org/pdf/1606.00061v2.pdf]
Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Smola, Stacked Attention Networks for Image Question Answering, CVPR 2016. [http://arxiv.org/abs/1511.02274]
Ilija Ilievski, Shuicheng Yan, Jiashi Feng, A Focused Dynamic Attention Model for Visual Question Answering, arXiv:1604.01485. [https://arxiv.org/pdf/1604.01485v1.pdf]
Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, Wei Xu, Ram Nevatia, ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering, arXiv:1511.05960v1, Nov 2015. [http://arxiv.org/pdf/1511.05960v1.pdf]
Huijuan Xu, Kate Saenko, Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering, arXiv:1511.05234v1, Nov 2015. [http://arxiv.org/abs/1511.05234]
Visual7W: Grounded Question Answering in Images
DAQUAR
COCO-QA
The VQA Dataset
FM-IQA
Visual Genome
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing
KVQA: Knowledge-Aware Visual Question Answering.
Qi Wu
Qi Wu博士目前是阿德莱德大学的高级讲师,他是澳大利亚阿德莱德大学澳大利亚机器人视觉中心(ACRV)的ARC高级研究助理。 在此之前,他在澳大利亚视觉技术中心(ACVT)担任博士后。 他分别于2011年和2015年在英国巴斯大学获得了全球计算和媒体技术硕士学位与获得了计算机科学博士学位。他的研究兴趣包括跨描述风格的对象建模,对象检测和视觉到语言。 他对图像字幕和视觉问答特别感兴趣。 他的图像字幕模型在去年的Microsoft COCO图像字幕挑战赛中取得了最佳成绩,而他的VQA模型是当前的最佳技术。 他的论文已发表在著名的期刊和会议上,例如TPAMI,CVPR,ICCV和ECCV。
Bolei Zhou 周博磊
CUHK信息工程系助理教授,研究方向为机器感知与决策、计算机视觉等,2018年毕业于MIT,师从 Antonio Torralba。曾获Facebook Ph.D. Fellowship in Computer Vision、BRC Fellowship Award、MIT Greater China Computer Science Fellowship等。在NeuralPS、TPAMI、IJCV、ICCV、CVPR、ECCV等期刊与会议已发表论文近50篇。
Stanislaw Antol
现任职于梅赛德斯-奔驰研发部门的自动驾驶汽车团队。从2018年3月到2019年3月,为Traptic的计算机视觉工程师,致力于草莓采摘机器人的研究。从2016年7月到2018年2月,在三星研究美国公司的智库团队的计算机视觉实验室担任研究工程师。
Jin-Hwa Kim
自2018年8月,在SK T-Brain任研究科学家。研究方向为多模式深度学习,主要致力于视觉问题解答和其他相关主题。 2017年9月,他获得了2017年 Google Ph.D. Fellowship。首尔国立大学的机器学习奖学金和并在首尔国立大学完成博士学位。他于2017年1月至5月在Facebook AI Research实习,是从Tian Yuandong指导。2018年,他获得了博士学位。由首尔国立大学教授张秉德教授指导。他于2015年获得首尔国立大学的工程学硕士学位,并于2011年获得了广云大学的工程学学士学位(优等生)。从2011年到2012年,为SK Communications(大韩民国)的搜索基础设施开发团队的软件工程师。
Justin Johnson
现任Michigan大学计算机科学与工程学院助理教授。研究方向为视觉推理、语言与视觉、用深度网络生成图像等。斯坦福大学博士,师从李飞飞。CS231N讲师之一。
Ilija Ilievski
新加坡国立大学博士,研究方向为视觉问答。CVPR2017 VQA challenge第3名。
初步版本,水平有限,有错误或者不完善的地方,欢迎大家提建议和补充,会一直保持更新,本文为专知内容组原创内容,未经允许不得转载,如需转载请发送邮件至fangquanyi@gmail.com 或 联系微信专知小助手(Rancho_Fang)
敬请关注http://www.zhuanzhi.ai 和关注专知公众号,获取第一手AI相关知识
最近更新:2019-12-10