Evaluating generative image models remains a difficult problem. This is due to the high dimensionality of the outputs, the challenging task of representing but not replicating training data, and the lack of metrics that fully correspond to human perception and capture all the properties we want these models to exhibit. Therefore, qualitative evaluation of model outputs is an important part of model development and research publication practice. Quantitative evaluation is currently under-served by existing tools, which do not easily facilitate structured exploration of a large number of examples across the latent space of the model. To address this issue, we present Ravel, a visual analytics system that enables qualitative evaluation of model outputs on the order of hundreds of thousands of images. Ravel allows users to discover phenomena such as mode collapse, and find areas of training data that the model has failed to capture. It allows users to evaluate both quality and diversity of generated images in comparison to real images or to the output of another model that serves as a baseline. Our paper describes three case studies demonstrating the key insights made possible with Ravel, supported by a domain expert user study.
翻译:评估基因图象模型仍是一个困难的问题,原因是产出的高度维度、代表但不复制培训数据的艰巨任务,以及缺乏完全符合人类感知和捕捉我们想要这些模型所展示的所有特性的量度,因此,对模型产出的质量评价是模型开发和研究出版物做法的一个重要部分。定量评价目前没有得到现有工具的充分利用,这些工具不易便利在模型的潜伏空间对大量实例进行结构化探索。为了解决这一问题,我们介绍了一个视觉分析系统Ravel,这是一个视觉分析系统,能够对数十万图象的顺序进行模型产出的质量评价。Ravel允许用户发现模式崩溃等现象,并找到模型未能捕捉到的培训数据领域。它使用户能够对生成图像的质量和多样性进行与真实图像或作为基线的另一个模型的产出相比较的评估。我们的文件描述了三个案例研究,表明通过域专家用户研究,能够对Ravel进行的关键洞察。