In recent years, deep neural networks (DNNs) achieved state-of-the-art performance on several computer vision tasks. However, the one typical drawback of these DNNs is the requirement of massive labeled data. Even though few-shot learning methods address this problem, they often use techniques such as meta-learning and metric-learning on top of the existing methods. In this work, we address this problem from a neuroscience perspective by proposing a hypothesis named Ikshana, which is supported by several findings in neuroscience. Our hypothesis approximates the refining process of conceptual gist in the human brain while understanding a natural scene/image. While our hypothesis holds no particular novelty in neuroscience, it provides a novel perspective for designing DNNs for vision tasks. By following the Ikshana hypothesis, we design a novel neural-inspired CNN architecture named IkshanaNet. The empirical results demonstrate the effectiveness of our method by outperforming several baselines on the entire and subsets of the Cityscapes and the CamVid semantic segmentation benchmarks.
翻译:近些年来,深神经网络(DNNS)在一些计算机视觉任务上取得了最先进的表现。然而,这些DNS的一个典型缺点是需要大量贴标签的数据。尽管少见的学习方法解决这一问题,但它们经常在现有方法之上使用元学习和计量学习等技术。在这项工作中,我们从神经科学的角度来解决这一问题,提出一个名为Ikshana的假设,该假设得到神经科学的若干发现的支持。我们的假设近似于人类大脑概念学精炼过程,同时了解自然景象/图像。虽然我们的假设在神经科学方面并不特别新奇特,但它为设计DNNS的视觉任务提供了一个新视角。根据Ikshana假设,我们设计了一个新型的有神经启发的CNN结构,名为IkshanaNet。实验结果通过在城市景象和CamVid语分解基准的整个和子集上比几个基准都超过了我们的方法的有效性。