Bootstrapping labels from radiology reports has become the scalable alternative to provide inexpensive ground truth for medical imaging. Because of the domain specific nature, state-of-the-art report labeling tools are predominantly rule-based. These tools, however, typically yield a binary 0 or 1 prediction that indicates the presence or absence of abnormalities. These hard targets are then used as ground truth to train image models in the downstream, forcing models to express high degree of certainty even on cases where specificity is low. This could negatively impact the statistical efficiency of image models. We address such an issue by training a Bidirectional Long-Short Term Memory Network to augment heuristic-based discrete labels of X-ray reports from all body regions and achieve performance comparable or better than domain-specific NLP, but with additional uncertainty estimates which enable finer downstream image model training.
翻译:放射学报告中的示意图标签已成为为医疗成像提供廉价地面真象的可扩展选择。 由于域性特定,最先进的报告标签工具以规则为主。 然而,这些工具通常产生二进制0或一预测,表明存在异常或不存在异常。这些硬目标随后被用作地面真象,用于在下游培训图像模型,迫使模型表示高度的确定性,即使在具体性低的情况下也是如此。这可能会对图像模型的统计效率产生负面影响。我们通过培训双向长短期内存网络来解决这一问题,以增加来自所有主体区域的X射线报告基于超常的离散标签,并实现比特定区域NLP的类似或更好的性能,但还有额外的不确定性估计,使得能够进行更精细的下游图像模型培训。