Machine learning methods for conditional data generation usually build a mapping from source conditional data X to target data Y. The target Y (e.g., text, speech, music, image, video) is usually high-dimensional and complex, and contains information that does not exist in source data, which hinders effective and efficient learning on the source-target mapping. In this paper, we present a learning paradigm called regeneration learning for data generation, which first generates Y' (an abstraction/representation of Y) from X and then generates Y from Y'. During training, Y' is obtained from Y through either handcrafted rules or self-supervised learning and is used to learn X-->Y' and Y'-->Y. Regeneration learning extends the concept of representation learning to data generation tasks, and can be regarded as a counterpart of traditional representation learning, since 1) regeneration learning handles the abstraction (Y') of the target data Y for data generation while traditional representation learning handles the abstraction (X') of source data X for data understanding; 2) both the processes of Y'-->Y in regeneration learning and X-->X' in representation learning can be learned in a self-supervised way (e.g., pre-training); 3) both the mappings from X to Y' in regeneration learning and from X' to Y in representation learning are simpler than the direct mapping from X to Y. We show that regeneration learning can be a widely-used paradigm for data generation (e.g., text generation, speech recognition, speech synthesis, music composition, image generation, and video generation) and can provide valuable insights into developing data generation methods.
翻译:有条件数据生成的机器学习方法通常是从源代码有条件的数据X到目标音乐Y。 目标Y(例如文字、语言、音乐、图像、视频)通常具有高度和复杂性,并且包含源数据中不存在的信息,这妨碍了在源目标绘图方面开展有效和高效的学习。 在本文中,我们提出了一个称为数据生成的再生学习的学习模式,它首先从源数据X产生Y'(抽取/表示Y),然后从Y产生Y。在培训期间,通过手制规则或自我监督的学习从Y获取Y。 目标Y(例如文字、语言、语言、图像、视频、视频)通常是在源数据生成过程中学习X-Y'和X'。 在生成过程中,再生成可以被视为与传统代表学习相对应,因为再学习用于数据生成(Y)的抽象(Y)数据生成,而传统代表可以处理源数据X的抽取(X'),为数据理解提供宝贵的语音数据读取(X); 2) 在再更新和 X- 学习过程中, X- 学习自我生成,可以显示自学习的自学习到自我生成。