We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace the root cause to careless signal processing that causes aliasing in the generator network. Interpreting all signals in the network as continuous, we derive generally applicable, small architectural changes that guarantee that unwanted information cannot leak into the hierarchical synthesis process. The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales. Our results pave the way for generative models better suited for video and animation.
翻译:我们注意到,尽管典型的基因对抗网络的合成过程具有分级变异性质,但它的合成过程以不健康的方式依赖于绝对像素坐标,例如,细节似乎被粘贴在图像坐标上,而不是被描绘的物体表面。我们追踪造成发电机网络内别名的粗心信号处理的根源。我们把网络中的所有信号解释为连续的,我们得出普遍适用的小型建筑变化,保证不需要的信息不会渗入等级合成过程。由此产生的网络与SysteleGAN2的FID相匹配,但内部表现差异很大,甚至在次像标尺上,它们也完全可以进行翻译和旋转。我们的结果为更适合视频和动动画的基因化模型铺平了道路。