Data processing and analysis pipelines in cosmological survey experiments introduce data perturbations that can significantly degrade the performance of deep learning-based models. Given the increased adoption of supervised deep learning methods for processing and analysis of cosmological survey data, the assessment of data perturbation effects and the development of methods that increase model robustness are increasingly important. In the context of morphological classification of galaxies, we study the effects of perturbations in imaging data. In particular, we examine the consequences of using neural networks when training on baseline data and testing on perturbed data. We consider perturbations associated with two primary sources: 1) increased observational noise as represented by higher levels of Poisson noise and 2) data processing noise incurred by steps such as image compression or telescope errors as represented by one-pixel adversarial attacks. We also test the efficacy of domain adaptation techniques in mitigating the perturbation-driven errors. We use classification accuracy, latent space visualizations, and latent space distance to assess model robustness. Without domain adaptation, we find that processing pixel-level errors easily flip the classification into an incorrect class and that higher observational noise makes the model trained on low-noise data unable to classify galaxy morphologies. On the other hand, we show that training with domain adaptation improves model robustness and mitigates the effects of these perturbations, improving the classification accuracy by 23% on data with higher observational noise. Domain adaptation also increases by a factor of ~2.3 the latent space distance between the baseline and the incorrectly classified one-pixel perturbed image, making the model more robust to inadvertent perturbations.
翻译:在宇宙测量实验中,数据处理和分析的管道在宇宙测量实验中引入了数据扰动,从而大大降低深层次学习模型的性能。鉴于在处理和分析宇宙测量数据方面越来越多地采用受监督的深层学习方法,评估数据扰动效应和开发提高模型稳健性的方法已变得越来越重要。在对星系进行形态学分类时,我们研究了图像数据中扰动效应的影响。特别是,我们研究了在进行基线数据培训和测试扰动数据时使用神经网络的影响。我们考虑了与两个主要来源相关的扰动:1)在处理远深层次观测数据时更多地采用受监督的深层次学习方法,从而增加了观测噪音;2)由于图像压缩或望远镜错误等步骤导致的数据扰动效应的评估,提高了模型稳健性;我们还测试了域适应技术在减轻扰动驱动因素方面的功效。我们使用分类准确性、潜伏空间可视化模型和潜伏空间距离来评估模型的稳健性。我们发现,在进行域调整之前,处理平级级级差差差差差的噪音很容易将模型转换成一个不准确的轨道,在每类间进行准确的准确度观测,并且进行更精确的精确的精确度分类。我们训练了更精确的测测测测测测测测测测测了这些模型。