New astronomical tasks are often related to earlier tasks for which labels have already been collected. We adapt the contrastive framework BYOL to leverage those labels as a pretraining task while also enforcing augmentation invariance. For large-scale pretraining, we introduce GZ-Evo v0.1, a set of 96.5M volunteer responses for 552k galaxy images plus a further 1.34M comparable unlabelled galaxies. Most of the 206 GZ-Evo answers are unknown for any given galaxy, and so our pretraining task uses a Dirichlet loss that naturally handles unknown answers. GZ-Evo pretraining, with or without hybrid learning, improves on direct training even with plentiful downstream labels (+4% accuracy with 44k labels). Our hybrid pretraining/contrastive method further improves downstream accuracy vs. pretraining or contrastive learning, especially in the low-label transfer regime (+6% accuracy with 750 labels).
翻译:新的天文任务通常与先前的任务相关, 标签已经收集了。 我们修改对比性框架 BYOL, 将这些标签作为培训前的任务来使用, 同时强制实施增强性波动 。 对于大规模预培训, 我们引入了 GZ- Evo v0.1, 一套952k 星系图像的96.5M 志愿者响应, 加上另外的1.34M 可比的无标签星系。 206 GZ- Evo 的答案中, 多数对于任何特定的星系都未知, 因此我们的训练前任务使用Drichlet 丢失, 自然会处理未知的答案。 GZ- Evo 预培训, 不论是否进行混合学习, 都会改进直接训练, 即使使用宽度的下游标签( +4% 准确度加上 44k 标签 ) 。 我们的混合训练前/ 调试方法会进一步提高下游的精度, 也就是训练前或对比性学习, 特别是在低标签传输制度中( 750 标签的精度为+ 6% 精度 ) 。