In the era of big astronomical surveys, our ability to leverage artificial intelligence algorithms simultaneously for multiple datasets will open new avenues for scientific discovery. Unfortunately, simply training a deep neural network on images from one data domain often leads to very poor performance on any other dataset. Here we develop a Universal Domain Adaptation method DeepAstroUDA, capable of performing semi-supervised domain alignment that can be applied to datasets with different types of class overlap. Extra classes can be present in any of the two datasets, and the method can even be used in the presence of unknown classes. For the first time, we demonstrate the successful use of domain adaptation on two very different observational datasets (from SDSS and DECaLS). We show that our method is capable of bridging the gap between two astronomical surveys, and also performs well for anomaly detection and clustering of unknown data in the unlabeled dataset. We apply our model to two examples of galaxy morphology classification tasks with anomaly detection: 1) classifying spiral and elliptical galaxies with detection of merging galaxies (three classes including one unknown anomaly class); 2) a more granular problem where the classes describe more detailed morphological properties of galaxies, with the detection of gravitational lenses (ten classes including one unknown anomaly class).
翻译:在大天文调查的时代,我们同时利用人工智能算法进行多个数据集的能力将为科学发现开辟新的途径。 不幸的是,仅仅在一个数据领域对图像进行深神经网络培训往往导致任何其他数据集的性能极差。在这里,我们开发了通用域适应方法DeepAstroUDA, 能够进行半监督的域对齐,可以适用于不同类型类重叠的数据集。两个数据集中的任何一个都可能存在额外类别,甚至可以在未知类别出现时使用这种方法。第一次,我们展示了在两个非常不同的观测数据集(来自SDSS和DECLS)上成功使用了域适应性能。我们展示了我们的方法能够弥合两个天文调查之间的鸿沟,并且还可以很好地在未贴标签的数据集中进行异常检测和对未知数据进行组合。我们将模型应用于两个星系形态分类任务中的示例:1)对螺旋和椭圆星系进行分类,同时探测星系(包括一个未知的异常类);2)一个更不甚明的星系的星系变变,包括一个不明的星系的星系的变变变。