Object recognition in humans depends primarily on shape cues. We have developed a new approach to measuring the shape recognition performance of a vision system based on nearest neighbor view matching within the system's embedding space. Our performance benchmark, ShapeY, allows for precise control of task difficulty, by enforcing that view matching span a specified degree of 3D viewpoint change and/or appearance change. As a first test case we measured the performance of ResNet50 pre-trained on ImageNet. Matching error rates were high. For example, a 27 degree change in object pitch led ResNet50 to match the incorrect object 45% of the time. Appearance changes were also highly disruptive. Examination of false matches indicates that ResNet50's embedding space is severely "tangled". These findings suggest ShapeY can be a useful tool for charting the progress of artificial vision systems towards human-level shape recognition capabilities.
翻译:人类对象的辨识主要取决于形状提示。 我们开发了一种新的方法, 来测量视觉系统的形状辨识性能。 我们的性能基准, 形状Y, 能够精确地控制任务难度, 具体地执行3D 视图变化和/ 或外观变化。 作为第一个测试案例, 我们测量了 ResNet50 在图像网上预先训练过的 ResNet50 的性能。 匹配误差率很高 。 例如, 对象定位的27度变化导致 ResNet50 匹配不正确的对象45%的时间 。 外观变化也具有高度干扰性 。 对假匹配的检查显示 ResNet50 嵌入空间严重“ 缠绕 ” 。 这些发现显示 ShapeY 是一个有用的工具, 用来绘制人造视觉系统在人类层次形状识别能力上的进展 。