Attribute extrapolation in sample generation is challenging for deep neural networks operating beyond the training distribution. We formulate a new task for extrapolation in sequence generation, focusing on natural language and proteins, and propose GENhance, a generative framework that enhances attributes through a learned latent space. Trained on movie reviews and a computed protein stability dataset, GENhance can generate strongly-positive text reviews and highly stable protein sequences without being exposed to similar data during training. We release our benchmark tasks and models to contribute to the study of generative modeling extrapolation and data-driven design in biology and chemistry.
翻译:样本生成中的属性外推对于在培训分布范围以外运作的深层神经网络来说具有挑战性。我们制定了一个新的序列生成外推任务,重点是自然语言和蛋白质,并提出了GENHance,这是一个通过学习的潜在空间增强属性的基因框架。GENHance在电影审查和计算蛋白稳定性数据集方面接受了培训,GENHance可以产生强烈积极的文本审查和高度稳定的蛋白序列,而无需在培训期间接触类似数据。我们发布了基准任务和模型,以促进生物和化学领域基因化模型外推和数据驱动设计的研究。