We propose a reinforcement learning method for estimating an optimal dynamic treatment regime for survival outcomes with dependent censoring. The estimator allows the failure time to be conditionally independent of censoring and dependent on the treatment decision times, supports a flexible number of treatment arms and treatment stages, and can maximize either the mean survival time or the survival probability at a certain time point. The estimator is constructed using generalized random survival forests and can have polynomial rates of convergence. Simulations and data analysis results suggest that the new estimator brings higher expected outcomes than existing methods in various settings. An R package dtrSurv is available on CRAN.
翻译:我们建议了一种强化学习方法,用于估算依赖性检查的存活结果的最佳动态治疗机制。估计值允许不成功的时间有条件地独立于检查和依赖治疗决定时间,支持灵活数量的治疗武器和治疗阶段,并且可以在某个时间点最大限度地增加平均存活时间或存活概率。估计值是使用一般随机生存森林建造的,并具有多元性趋同率。模拟和数据分析结果表明,新的估计值带来的预期结果高于各种环境下的现有方法。CRAN上有一个R包件 dtrSurv。