Deep learning approaches often require huge datasets to achieve good generalization. This complicates its use in tasks like image-based medical diagnosis, where the small training datasets are usually insufficient to learn appropriate data representations. For such sensitive tasks it is also important to provide the confidence in the predictions. Here, we propose a way to learn and use probabilistic labels to train accurate and calibrated deep networks from relatively small datasets. We observe gains of up to 22% in the accuracy of models trained with these labels, as compared with traditional approaches, in three classification tasks: diagnosis of hip dysplasia, fatty liver, and glaucoma. The outputs of models trained with probabilistic labels are calibrated, allowing the interpretation of its predictions as proper probabilities. We anticipate this approach will apply to other tasks where few training instances are available and expert knowledge can be encoded as probabilities.
翻译:深层学习方法往往需要庞大的数据集才能实现良好的概括化。 这使其在图像医学诊断等任务中的使用变得复杂,因为在这些任务中,小的培训数据集通常不足以学习适当的数据表述。对于这些敏感的任务,同样重要的是提供预测的信心。在这里,我们建议一种方法来学习和使用概率标签,用相对较小的数据集来训练准确和校准的深层网络。我们观察到,与传统方法相比,在三种分类任务中,经过这些标签培训的模型的准确性提高了22%:诊断潮湿、脂肪肝脏和青光眼。受过概率标签培训的模型的输出经过校准,以便将其预测解释为适当的概率。我们预计,这种方法将适用于其他任务,在这些任务中,很少有培训案例,专家知识可以被编成概率。