Accurate and robust prediction of patient's response to drug treatments is critical for developing precision medicine. However, it is often difficult to obtain a sufficient amount of coherent drug response data from patients directly for training a generalized machine learning model. Although the utilization of rich cell line data provides an alternative solution, it is challenging to transfer the knowledge obtained from cell lines to patients due to various confounding factors. Few existing transfer learning methods can reliably disentangle common intrinsic biological signals from confounding factors in the cell line and patient data. In this paper, we develop a Coherent Deconfounding Autoencoder (CODE-AE) that can extract both common biological signals shared by incoherent samples and private representations unique to each data set, transfer knowledge learned from cell line data to tissue data, and separate confounding factors from them. Extensive studies on multiple data sets demonstrate that CODE-AE significantly improves the accuracy and robustness over state-of-the-art methods in both predicting patient drug response and de-confounding biological signals. Thus, CODE-AE provides a useful framework to take advantage of in vitro omics data for developing generalized patient predictive models. The source code is available at https://github.com/XieResearchGroup/CODE-AE.
翻译:准确和可靠地预测病人对药物治疗的反应对于发展精密医学至关重要,然而,通常很难直接从病人那里获得足够数量的一致药物反应数据,以便直接培训通用机器学习模型。尽管丰富的细胞线数据提供了替代解决方案,但利用丰富的细胞线数据将细胞线上获得的知识转让给病人是困难因素,很少有现有的转移学习方法能够可靠地分解细胞线和病人数据中混杂因素的共同内在生物信号。在本文中,我们开发了一个固执的解析解析自动编码器(CODE-AE),它能够提取由不连贯的样本和每个数据集独有的私人代表所共享的共同生物信号,将从细胞线数据中获取的知识转移给组织数据,并将这些因素分开。对多个数据集的广泛研究表明,CODE-AE在预测病人药物反应和脱混固生物信号方面,大大改进了状态方法的准确性和稳健性。因此,CODE-AE提供了一个有用的框架,用以在普遍化的化学/再分析源中利用现有的耐心模型。