Survival analysis is the branch of statistics that studies the relation between the characteristics of living entities and their respective survival times, taking into account the partial information held by censored cases. A good analysis can, for example, determine whether one medical treatment for a group of patients is better than another. With the rise of machine learning, survival analysis can be modeled as learning a function that maps studied patients to their survival times. To succeed with that, there are three crucial issues to be tackled. First, some patient data is censored: we do not know the true survival times for all patients. Second, data is scarce, which led past research to treat different illness types as domains in a multi-task setup. Third, there is the need for adaptation to new or extremely rare illness types, where little or no labels are available. In contrast to previous multi-task setups, we want to investigate how to efficiently adapt to a new survival target domain from multiple survival source domains. For this, we introduce a new survival metric and the corresponding discrepancy measure between survival distributions. These allow us to define domain adaptation for survival analysis while incorporating censored data, which would otherwise have to be dropped. Our experiments on two cancer data sets reveal a superb performance on target domains, a better treatment recommendation, and a weight matrix with a plausible explanation.
翻译:生存分析是统计的分支,它研究生物实体特征与它们各自的生存时间之间的关系,同时考虑到受审查案例所持有的部分信息。例如,良好的分析可以确定一组病人的治疗是否优于另一类病人。随着机器学习的兴起,生存分析可以模拟为学习一种功能,绘制研究病人到生存时间的地图。要成功,需要处理三个关键问题。首先,一些病人数据受到审查:我们不知道所有病人的真正生存时间。第二,数据稀缺,导致以往的研究将不同的疾病类型作为多重任务设置的域进行治疗。第三,需要适应新的或极为罕见的疾病类型,因为那里几乎没有或根本没有标签。与以往的多重任务设置相比,我们要研究如何有效地适应从多种生存来源领域到新的生存目标领域。为此,我们引入了新的生存指标和相应的生存分布差异衡量标准。这使我们能够界定生存分析域的适应性,同时纳入经过审查的数据,否则需要更好地解释。第三,需要适应新的或极为罕见的疾病类型,在这些类别中几乎没有标签。我们想要调查如何有效地适应从多种生存来源领域适应新的生存目标领域。我们关于癌症的实验提出了一种更好的指标。