We examine how the saccade mechanism from biological vision can be used to make deep neural networks more efficient for classification and object detection problems. Our proposed approach is based on the ideas of attention-driven visual processing and saccades, miniature eye movements influenced by attention. We conduct experiments by analyzing: i) the robustness of different deep neural network (DNN) feature extractors to partially-sensed images for image classification and object detection, and ii) the utility of saccades in masking image patches for image classification and object tracking. Experiments with convolutional nets (ResNet-18) and transformer-based models (ViT, DETR, TransTrack) are conducted on several datasets (CIFAR-10, DAVSOD, MSCOCO, and MOT17). Our experiments show intelligent data reduction via learning to mimic human saccades when used in conjunction with state-of-the-art DNNs for classification, detection, and tracking tasks. We observed minimal drop in performance for the classification and detection tasks while only using about 30\% of the original sensor data. We discuss how the saccade mechanism can inform hardware design via ``in-pixel'' processing.
翻译:我们研究如何利用生物视觉学士学位机制提高深海神经网络的分类和物体探测问题的效率。我们建议的方法基于关注驱动的视觉处理和学士学位、受关注影响的微视运动等理念。我们通过分析进行实验:一)不同深神经网络特征提取器的坚固性,以部分扫描图像进行图像分类和物体探测,二)隐蔽图像补丁的学士学位机制在图像分类和物体跟踪方面的作用。与聚合网(Res-Net-18)和变压器模型(VIT、DETR、TransTracack)的实验,在若干数据集(CIFAR-10、DAVSOD、MCCO和MOT17)上进行。我们的实验表明,通过学习模拟人类学士级考试来进行图像分类、检测和跟踪任务。我们观察到,用于分类和检测任务的绩效最低下降,同时仅使用大约30 ⁇ 的原始传感器数据。我们讨论了Scachad机制如何通过处理硬件设计来进行分类和检测。