The 21 cm spectral line emission of atomic neutral hydrogen (HI) is one of the primary wavelengths observed in radio astronomy. However, the signal is intrinsically faint and the HI content of galaxies depends on the cosmic environment, requiring large survey volumes and survey depth to investigate the HI Universe. As the amount of data coming from these surveys continues to increase with technological improvements, so does the need for automatic techniques for identifying and characterising HI sources while considering the tradeoff between completeness and purity. This study aimed to find the optimal pipeline for finding and masking the most sources with the best mask quality and the fewest artefacts in 3D neutral hydrogen cubes. Various existing methods were explored in an attempt to create a pipeline to optimally identify and mask the sources in 3D neutral hydrogen 21 cm spectral line data cubes. Two traditional source-finding methods were tested, SoFiA and MTObjects, as well as a new supervised deep learning approach, in which a 3D convolutional neural network architecture, known as V-Net was used. These three source-finding methods were further improved by adding a classical machine learning classifier as a post-processing step to remove false positive detections. The pipelines were tested on HI data cubes from the Westerbork Synthesis Radio Telescope with additional inserted mock galaxies. SoFiA combined with a random forest classifier provided the best results, with the V-Net-random forest combination a close second. We suspect this is due to the fact that there are many more mock sources in the training set than real sources. There is, therefore, room to improve the quality of the V-Net network with better-labelled data such that it can potentially outperform SoFiA.
翻译:原子中性氢(HI)的21厘米光谱线排放是射电天文学观测到的主要波长之一,然而,信号在本质上是暗淡的,而星系的HI含量取决于宇宙环境,需要大量的调查量和调查深度来调查HI宇宙。随着技术的改进,这些调查产生的数据数量继续增加,因此需要自动技术来查明HI来源并确定其特性,同时考虑到完整性和纯度之间的取舍。这项研究的目的是找到最佳管道,找到和遮盖大多数来源的最佳掩码,其质量和3D中性氢立方体中最少数的手工艺品。探索了各种现有方法,试图建立一个管道,以最佳的方式查明和掩蔽3D中性氢21厘米光谱线数据基的源。测试了两种传统的源调查方法,SoFIA和MTObjects,以及一种受监督的新的深层次学习方法,在这个方法中,以V-Net为名的3进化神经网络结构结构。这三个源调查方法得到了进一步的改进,通过增加一个正态的机器模拟机级分析器,从而从SIMFIFIFI 进行更多的模拟测试。