A very recent trend in generative modeling is building 3D-aware generators from 2D image collections. To induce the 3D bias, such models typically rely on volumetric rendering, which is expensive to employ at high resolutions. During the past months, there appeared more than 10 works that address this scaling issue by training a separate 2D decoder to upsample a low-resolution image (or a feature tensor) produced from a pure 3D generator. But this solution comes at a cost: not only does it break multi-view consistency (i.e. shape and texture change when the camera moves), but it also learns the geometry in a low fidelity. In this work, we show that it is possible to obtain a high-resolution 3D generator with SotA image quality by following a completely different route of simply training the model patch-wise. We revisit and improve this optimization scheme in two ways. First, we design a location- and scale-aware discriminator to work on patches of different proportions and spatial positions. Second, we modify the patch sampling strategy based on an annealed beta distribution to stabilize training and accelerate the convergence. The resulted model, named EpiGRAF, is an efficient, high-resolution, pure 3D generator, and we test it on four datasets (two introduced in this work) at $256^2$ and $512^2$ resolutions. It obtains state-of-the-art image quality, high-fidelity geometry and trains ${\approx} 2.5 \times$ faster than the upsampler-based counterparts. Project website: https://universome.github.io/epigraf.
翻译:基因建模的最近趋势是从 2D 图像收藏中建立 3D 智能生成器。 为了诱导 3D 偏差, 这些模型通常依赖体积转换, 高分辨率使用成本很高。 在过去几个月里, 似乎有超过 10 个工程, 通过训练一个单独的 2D 解码器, 将纯 3D 生成的低分辨率图像( 或一个特性发压器) 升级, 来解决这个缩放问题。 但这个解决方案的成本是: 它不仅会打破多视图的一致性( 即相机移动时形状和纹理的变化), 而且还会学习低忠诚度的几何学 。 在这项工作中, 我们显示有可能获得一个高分辨率 3D 的高分辨率 3D 。 我们用两种方式重新审视并改进这一优化方案 。 首先, 我们设计一个基于位置和 比例 12 比例 和空间对应方 的基于 度 。 其次, 我们根据 亚氏 平面 平面 平面 图像分配来修改 校准 3 标准 。