Machine learning (ML) is becoming a critical tool for interrogation of large complex data. Labeling, defined as the process of adding meaningful annotations, is a crucial step of supervised ML. However, labeling datasets is time consuming. Here we show that convolutional neural networks (CNNs), trained on crudely labeled astronomical videos, can be leveraged to improve the quality of data labeling and reduce the need for human intervention. We use videos of the solar magnetic field, crudely labeled into two classes: emergence or non-emergence of bipolar magnetic regions (BMRs), based on their first detection on the solar disk. We train CNNs using crude labels, manually verify, correct labeling vs. CNN disagreements, and repeat this process until convergence. Traditionally, flux emergence labelling is done manually. We find that a high-quality labeled dataset, derived through this iterative process, reduces the necessary manual verification by 50%. Furthermore, by gradually masking the videos and looking for maximum change in CNN inference, we locate BMR emergence time without retraining the CNN. This demonstrates the versatility of CNNs for simplifying the challenging task of labeling complex dynamic events.
翻译:暂无翻译