Astronomers have typically set out to solve supervised machine learning problems by creating their own representations from scratch. We show that deep learning models trained to answer every Galaxy Zoo DECaLS question learn meaningful semantic representations of galaxies that are useful for new tasks on which the models were never trained. We exploit these representations to outperform several recent approaches at practical tasks crucial for investigating large galaxy samples. The first task is identifying galaxies of similar morphology to a query galaxy. Given a single galaxy assigned a free text tag by humans (e.g. "#diffuse"), we can find galaxies matching that tag for most tags. The second task is identifying the most interesting anomalies to a particular researcher. Our approach is 100% accurate at identifying the most interesting 100 anomalies (as judged by Galaxy Zoo 2 volunteers). The third task is adapting a model to solve a new task using only a small number of newly-labelled galaxies. Models fine-tuned from our representation are better able to identify ring galaxies than models fine-tuned from terrestrial images (ImageNet) or trained from scratch. We solve each task with very few new labels; either one (for the similarity search) or several hundred (for anomaly detection or fine-tuning). This challenges the longstanding view that deep supervised methods require new large labelled datasets for practical use in astronomy. To help the community benefit from our pretrained models, we release our fine-tuning code Zoobot. Zoobot is accessible to researchers with no prior experience in deep learning.
翻译:天文学家通常会通过从零开始创建自己的演示来解决受监督的机器学习问题。 我们显示, 深层次的学习模型可以用来回答银河系统每个 Zoo DECALS 问题。 我们显示, 深层次的学习模型可以用来回答每个银河系统 Zoo DECALS 问题, 学会对新任务有用的、 这些模型从未受过训练的星系的语义表达方式。 我们利用这些表达方式来超越对调查大型星系至关重要的实践任务方面最近采取的若干方法。 第一项任务是确定与查询星系相似的形态和查询星系的星系。 在一个单一的银河系里, 给人类指定了一个免费的文本标记( 例如“ # diffuse ” ) 。 我们能找到匹配大多数标记的星系。 第二个任务是找出某个特定研究者最有趣的异常点。 我们的方法是百分之百精确地识别最有趣的100个异常点( 由银河系统 2 志愿者来判断 ) 。 第三个任务是调整一个模型, 解决一个新任务,,, 仅使用少量的新标签 。 从我们 之前 的星系 的星系 的星系 的模型, 更精确的模型比 更能辨化的模型更能识别 。