Statistical and computational methods are widely used in today's scientific studies. Using a female fertility potential in childhood cancer survivors as an example, we illustrate how these methods can be used to extract insight regarding biological processes from noisy observational data in order to inform decision making. We start by contextualizing the computational methods with the working example: the modelling of acute ovarian failure risk in female childhood cancer survivors to quantify the risk of permanent ovarian failure due to exposure to lifesaving but nonetheless toxic cancer treatments. This is followed by a description of the general framework of classification problems. We provide an overview of the modelling algorithms employed in our example, including one classic model (logistic regression) and two popular modern learning methods (random forest and support vector machines). Using the working example, we show the general steps of data preparation for modelling, variable selection steps for the classic model, and how model performance might be improved utilizing visualization tools. We end with a note on the importance of model evaluation.
翻译:统计和计算方法在今天的科学研究中广泛使用。我们以儿童癌症幸存者中的女性生育潜力为例,说明这些方法如何能够用来从吵闹的观察数据中提取生物过程的洞察力,以便为决策提供信息。我们首先从计算方法的背景化开始,以工作实例为例:对女性儿童癌症幸存者急性卵巢衰竭风险进行建模,以量化因接触拯救生命但有毒癌症治疗而导致卵巢永久衰竭的风险。随后将描述分类问题的一般框架。我们概述了我们的例子中使用的模型算法,包括一种经典模型(逻辑回归)和两种流行的现代学习方法(随机森林和支持矢量机)。我们以工作实例为例,展示了模型数据准备的一般步骤、典型模型的可变选择步骤,以及如何利用可视化工具改进模型性能。我们最后要指出模型评估的重要性。