Scoring rules promote rational and good decision making and predictions by models, this is increasingly important for automated procedures of `auto-ML'. The Brier score and Log loss are well-established scoring rules for classification and regression and possess the `strict properness' property that encourages optimal predictions. In this paper we survey proposed scoring rules for survival analysis, establish the first clear definition of `(strict) properness' for survival scoring rules, and determine which losses are proper and improper. We prove that commonly utilised scoring rules that are claimed to be proper are in fact improper. We further prove that under a strict set of assumptions a class of scoring rules is strictly proper for, what we term, `approximate' survival losses. We hope these findings encourage further research into robust validation of survival models and promote honest evaluation.
翻译:分级规则促进合理和良好的决策和模型预测,对于“自动-ML”的自动化程序,这一点日益重要。Brier分数和日志损失是分类和回归的既定评分规则,并拥有鼓励最佳预测的`严格适当性'财产。在本文中,我们调查了生存分析的评分规则,确定了生存评分规则的“(严格)适当性”的第一个明确定义,确定了哪些损失是适当和不适当的。我们证明,通常使用的评分规则据称是适当的,事实上是不适当的。我们进一步证明,在一套严格的假设下,一类评分规则对于我们所说的`近距离'生存损失是完全适当的。我们希望这些调查结果能够鼓励进一步研究对生存模式的有力验证,并促进诚实的评估。