Visual content often contains recurring elements. Text is made up of glyphs from the same font, animations, such as cartoons or video games, are composed of sprites moving around the screen, and natural videos frequently have repeated views of objects. In this paper, we propose a deep learning approach for obtaining a graphically disentangled representation of recurring elements in a completely self-supervised manner. By jointly learning a dictionary of texture patches and training a network that places them onto a canvas, we effectively deconstruct sprite-based content into a sparse, consistent, and interpretable representation that can be easily used in downstream tasks. Our framework offers a promising approach for discovering recurring patterns in image collections without supervision.
翻译:视觉内容通常包含重复元素。 文本由来自同一字体的图形组成, 动画, 如漫画或视频游戏, 由屏幕周围移动的图案组成, 自然视频经常会反复看到对象。 在本文中, 我们建议了一种深层次的学习方法, 以完全由自己监管的方式获得对重复元素的图形分解表达方式。 通过共同学习一个纹理补丁字典, 并培训一个将其放入画布的网络, 我们有效地将基于图案的内容拆解成一个稀少、 一致和可解释的表达方式, 可以在下游任务中轻易使用。 我们的框架提供了一种很有希望的方法, 可以在没有监督的情况下在图像收藏中发现重复的模式 。