We report the Regression Transformer (RT), a method that abstracts regression as a conditional sequence modeling problem. The RT casts continuous properties as sequences of numerical tokens and encodes them jointly with conventional tokens. This yields a dichotomous model that can seamlessly transition between solving regression tasks and conditional generation tasks; solely governed by the mask location. We propose several extensions to the XLNet objective and adopt an alternating training scheme to concurrently optimize property prediction and conditional text generation based on a self-consistency loss. Our experiments on both chemical and protein languages demonstrate that the performance of traditional regression models can be surpassed despite training with cross entropy loss. Importantly, priming the same model with continuous properties yields a highly competitive conditional generative models that outperforms specialized approaches in a constrained property optimization benchmark. In sum, the Regression Transformer opens the door for "swiss army knife" models that excel at both regression and conditional generation. This finds application particularly in property-driven, local exploration of the chemical or protein space.
翻译:我们报告回归变异器(RT),这是一种将回归作为有条件序列建模问题的方法,我们将回归转换器(RT)作为回归转换器(RT)作为有条件的顺序模型进行。RT将连续的属性作为数字符号序列,并将它们与常规符号共同编码。这产生了一种分形模型,可以在解决回归任务和有条件生成任务之间无缝地过渡;完全由掩码位置管理。我们建议对 XLNet 目标进行若干次扩展,并采用一个交替培训计划,以同时优化财产预测和基于自一致性损失的有条件文本生成。我们在化学和蛋白两种语言上的实验都表明,传统的回归模型的性能可以超越,尽管经过交叉加密损失的培训。重要的是,将同一模型与连续特性相连接,产生一种高度竞争性的有条件基因化模型,在受限制的财产优化基准中超越了专门方法。总而言之,回归变异器打开了“wis 军队刀” 模型的大门,该模型在回归和有条件的生成中都优于。这在财产驱动的、本地对化学或蛋白质空间的探索中特别适用。