Machine learning has achieved an important role in the automatic classification of variable stars, and several classifiers have been proposed over the last decade. These classifiers have achieved impressive performance in several astronomical catalogues. However, some scientific articles have also shown that the training data therein contain multiple sources of bias. Hence, the performance of those classifiers on objects not belonging to the training data is uncertain, potentially resulting in the selection of incorrect models. Besides, it gives rise to the deployment of misleading classifiers. An example of the latter is the creation of open-source labelled catalogues with biased predictions. In this paper, we develop a method based on an informative marginal likelihood to evaluate variable star classifiers. We collect deterministic rules that are based on physical descriptors of RR Lyrae stars, and then, to mitigate the biases, we introduce those rules into the marginal likelihood estimation. We perform experiments with a set of Bayesian Logistic Regressions, which are trained to classify RR Lyraes, and we found that our method outperforms traditional non-informative cross-validation strategies, even when penalized models are assessed. Our methodology provides a more rigorous alternative to assess machine learning models using astronomical knowledge. From this approach, applications to other classes of variable stars and algorithmic improvements can be developed.
翻译:机器学习在变星自动分类中取得了重要作用,并提出了过去十年来若干分类方法。这些分类方法在一些天文目录中取得了令人印象深刻的成绩。然而,一些科学文章还表明,其中的培训数据含有多种偏差来源。因此,这些分类方法在不属于培训数据对象的物体上的性能不确定,可能导致选择不正确的模型。此外,这还导致部署误导性分类方法。后者的一个例子是创建带有偏差预测的公开源标签目录。在本文中,我们开发了一种方法,其基础是信息化的微小可能性来评价变星分类方法。我们收集了基于RRR Lyrae恒星物理描述器的确定性规则,然后,为了减轻偏差,我们将这些规则引入了边际可能性估计。我们用一套贝氏物流倒退方法进行实验,这些方法经过培训可以对RR Lyraes进行分类。我们发现,我们的方法超越了传统的非信息化交叉校准战略,即使对模型进行了评估。我们的方法可以提供一种更严格的替代方法,从一种从模型到另一种可变式的模型来评估。