Training of Generative Adversarial Network (GAN) on a video dataset is a challenge because of the sheer size of the dataset and the complexity of each observation. In general, the computational cost of training GAN scales exponentially with the resolution. In this study, we present a novel memory efficient method of unsupervised learning of high-resolution video dataset whose computational cost scales only linearly with the resolution. We achieve this by designing the generator model as a stack of small sub-generators and training the model in a specific way. We train each sub-generator with its own specific discriminator. At the time of the training, we introduce between each pair of consecutive sub-generators an auxiliary subsampling layer that reduces the frame-rate by a certain ratio. This procedure can allow each sub-generator to learn the distribution of the video at different levels of resolution. We also need only a few GPUs to train a highly complex generator that far outperforms the predecessor in terms of inception scores.
翻译:利用视频数据集对Generation Adversarial Network (GAN) 进行视频数据集培训是一项挑战,因为数据集规模庞大,每个观测都十分复杂。一般而言,用分辨率指数指数来培训GAN尺度的计算成本。在本研究中,我们提出了一个在不受监督的情况下学习高分辨率视频数据集的新型记忆高效方法,该视频数据集的计算成本尺度仅以分辨率线性计算。我们通过将发电机模型设计成一堆小小子生成器,并以特定方式培训模型来实现这一目标。我们用其自身的区分器对每个子生成器进行培训。在培训期间,我们在每一对连续的子生成器中引入一个辅助的副采集层,将框架率降低一定比例。这一程序可以让每个子生成器在不同分辨率级别上学习视频的分布。我们只需要少数几个GPUP来培训一个高度复杂的发电机,在初始分数方面远比前导者要差得多。