Research on decision support applications in healthcare, such as those related to diagnosis, prediction, treatment planning, etc., have seen enormously increased interest recently. This development is thanks to the increase in data availability as well as advances in artificial intelligence and machine learning research. Highly promising research examples are published daily. However, at the same time, there are some unrealistic expectations with regards to the requirements for reliable development and objective validation that is needed in healthcare settings. These expectations may lead to unmet schedules and disappointments (or non-uptake) at the end-user side. It is the aim of this tutorial to provide practical guidance on how to assess performance reliably and efficiently and avoid common traps. Instead of giving a list of do's and don't s, this tutorial tries to build a better understanding behind these do's and don't s and presents both the most relevant performance evaluation criteria as well as how to compute them. Along the way, we will indicate common mistakes and provide references discussing various topics more in-depth.
翻译:有关医疗领域决策支持应用的研究,例如与诊断、预测、治疗规划等有关的研究,最近引起了极大的兴趣。这一发展是由于数据提供量的增加以及人工智能和机器学习研究的进展。高度有希望的研究实例每天都在公布。但与此同时,对于保健环境中所需的可靠发展和客观验证的要求,人们有一些不切实际的期望。这些期望可能导致最终用户方面没有完成时间表和失望(或没有接受)。本指导的目的是就如何可靠和高效地评估业绩和避免常见陷阱提供实用指导。这个指导不是提供一份“做”和“不做”清单,而是试图在这些清单背后建立更好的理解,而不是提供最相关的业绩评价标准以及如何进行计算。此外,我们将指出常见的错误,并提供更深入地讨论各种专题的参考。