数据扩增和图像理解 (Data augmentation and image understanding)

from arxiv, Digital version of the PhD thesis by Alex Hernandez-Garcia, defended on the 27th of November of 2020 at the Institute of Cognitive Science of the University of Osnabrueck, Germany. PhD advisor: Prof. Peter Koenig. Contributors are acknowledged at the beginning of each chapter

Interdisciplinary research is often at the core of scientific progress. This dissertation explores some advantageous synergies between machine learning, cognitive science and neuroscience. In particular, this thesis focuses on vision and images. The human visual system has been widely studied from both behavioural and neuroscientific points of view, as vision is the dominant sense of most people. In turn, machine vision has also been an active area of research, currently dominated by the use of artificial neural networks. This work focuses on learning representations that are more aligned with visual perception and the biological vision. For that purpose, I have studied tools and aspects from cognitive science and computational neuroscience, and attempted to incorporate them into machine learning models of vision. A central subject of this dissertation is data augmentation, a commonly used technique for training artificial neural networks to augment the size of data sets through transformations of the images. Although often overlooked, data augmentation implements transformations that are perceptually plausible, since they correspond to the transformations we see in our visual world -- changes in viewpoint or illumination, for instance. Furthermore, neuroscientists have found that the brain invariantly represents objects under these transformations. Throughout this dissertation, I use these insights to analyse data augmentation as a particularly useful inductive bias, a more effective regularisation method for artificial neural networks, and as the framework to analyse and improve the invariance of vision models to perceptually plausible transformations. Overall, this work aims to shed more light on the properties of data augmentation and demonstrate the potential of interdisciplinary research.

翻译：跨学科研究往往是科学进步的核心。这部论文探索了机器学习、认知科学和神经科学之间的一些有利的协同作用。特别是, 这部论文侧重于视觉和图像。人类视觉系统从行为和神经科学的角度进行了广泛研究, 因为视觉是大多数人的主导感知。反过来, 机器视觉也是一个积极的研究领域, 目前以人工神经网络为主。这项工作侧重于学习更符合视觉感知和生物视觉的表征。为此, 我研究了认知科学和计算神经科学的工具和方面,并试图将它们纳入视觉的机器学习模型。这一论文的核心主题是数据增强,这是培训人工神经网络以通过图像转换扩大数据集规模的一种常用技术。尽管数据增强过程常常被忽略, 实施感知性更合理的转变, 因为它们与我们在视觉世界中看到的转变相对应 -- 观点和光度的变化, 例如。此外, 神经科学家们发现, 大脑的直观和直观神经科学, 更能用直观性变异性模型, 来分析这些直观性变异性模型, 以直观性模型为常规的直观性变, 直观分析这些直观数据分析, 直观性分析这些直观的直观模型中, 直观的直观的直观分析, 直观分析, 直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直观直直直直直直直直直观直观直直观直观直观直观直观直观直观直观直观直观直观直直观直观直观直观直观直观直观直观直观直观直观直观直观直观