Compact stellar systems such as Ultra-compact dwarfs (UCDs) and Globular Clusters (GCs) around galaxies are known to be the tracers of the merger events that have been forming these galaxies. Therefore, identifying such systems allows to study galaxies mass assembly, formation and evolution. However, in the lack of spectroscopic information detecting UCDs/GCs using imaging data is very uncertain. Here, we aim to train a machine learning model to separate these objects from the foreground stars and background galaxies using the multi-wavelength imaging data of the Fornax galaxy cluster in 6 filters, namely u, g, r, i, J and Ks. The classes of objects are highly imbalanced which is problematic for many automatic classification techniques. Hence, we employ Synthetic Minority Over-sampling to handle the imbalance of the training data. Then, we compare two classifiers, namely Localized Generalized Matrix Learning Vector Quantization (LGMLVQ) and Random Forest (RF). Both methods are able to identify UCDs/GCs with a precision and a recall of >93 percent and provide relevances that reflect the importance of each feature dimension %(colors and angular sizes) for the classification. Both methods detect angular sizes as important markers for this classification problem. While it is astronomical expectation that color indices of u-i and i-Ks are the most important colors, our analysis shows that colors such as g-r are more informative, potentially because of higher signal-to-noise ratio. Besides the excellent performance the LGMLVQ method allows further interpretability by providing the feature importance for each individual class, class-wise representative samples and the possibility for non-linear visualization of the data as demonstrated in this contribution. We conclude that employing machine learning techniques to identify UCDs/GCs can lead to promising results.
翻译:银河系周围的银河系群星系系统,如Ultracompact 矮星(UCDs)和Global Croups(GCs),已知是形成这些星系的合并事件的跟踪者。因此,确定这些系统可以研究星系群的集合、形成和演变。然而,由于缺乏光谱学信息,利用成像数据检测UCD/GCs的光谱信息非常不确定。在这里,我们的目标是用一个机器学习模型将这些天体与地表恒星和背景星系分开,使用六种过滤器的多波长成像数据,即u、g、r、i、J和Ks。 对象的种类高度不平衡对于许多自动分类技术来说是成问题。因此,我们使用合成特性多采样来处理培训数据的不平衡。然后,我们比较两个分层,即具有地方化通用矩阵学习矢量的UDLVQ(LGLVQ)和Rom Forma(RF),两种方法都能够用更高颜色的颜色识别UCD/GC(UCD/GC)的不高频度数据,而具有精确和最精确的颜色的颜色的颜色的颜色, 颜色的颜色的颜色的颜色, 和最精确的颜色的颜色的颜色的颜色的颜色的颜色的颜色分析则显示的数值的比值的比值的比方法可以显示的颜色的颜色的颜色的颜色的比, 的颜色的颜色的颜色的颜色的颜色的比的比, 的比的比的比的比的比,用来用来用来解释。