The initial analysis of any large data set can be divided into two phases: (1) the identification of common trends or patterns and (2) the identification of anomalies or outliers that deviate from those trends. We focus on the goal of detecting observations with novel content, which can alert us to artifacts in the data set or, potentially, the discovery of previously unknown phenomena. To aid in interpreting and diagnosing the novel aspect of these selected observations, we recommend the use of novelty detection methods that generate explanations. In the context of large image data sets, these explanations should highlight what aspect of a given image is new (color, shape, texture, content) in a human-comprehensible form. We propose DEMUD-VIS, the first method for providing visual explanations of novel image content by employing a convolutional neural network (CNN) to extract image features, a method that uses reconstruction error to detect novel content, and an up-convolutional network to convert CNN feature representations back into image space. We demonstrate this approach on diverse images from ImageNet, freshwater streams, and the surface of Mars.
翻译:对任何大型数据集的初步分析可分为两个阶段:(1) 确定共同趋势或模式,(2) 查明偏离这些趋势的异常或异常点,我们侧重于探测具有新内容的观测,以提醒我们注意数据集中的文物或可能发现以前未知现象;为帮助解释和诊断这些选定观测的新方面,我们建议使用产生解释的新颖的探测方法;在大型图像数据集方面,这些解释应突出特定图像的哪些方面(颜色、形状、质地、内容)以人类可理解的形式是新的。我们建议DEMUD-VIS,这是通过利用革命性神经网络提取图像特征,对新图像内容提供直观解释的第一种方法,一种利用重建错误探测新内容的方法,以及一种将CNN地貌图解转换回图像空间的上层网络。我们对来自图像网络、淡水流和火星表面的各种图像展示了这一方法。