Identifying subgroups of patients who benefit from a treatment is a key aspect of personalized medicine, these subgroups can be used to develop individualized treatment rules (ITRs). Many machine learning methods have been proposed to create such rules. However, to what extent methods lead to the same ITRs, i.e., recommending the same treatment for the same individuals is unclear. To see if methods lead to similar ITRs, we compared the most common approaches in two randomized control trials. Two classes of methods can be distinguished to develop an ITR. The first class of methods relies on predicting individualized treatment effects from which an ITR is derived by recommending the evaluated treatment to the individuals with a predicted benefit. In the second class, methods directly estimate the ITR without estimating individualized treatment effects. For each trial, the performance of ITRs was assessed with various metrics, and the pairwise agreement between ITRs was also calculated. Results showed that the ITRs obtained by the different methods generally had considerable disagreements regarding the individuals to be treated. A better concordance was found among akin methods. Overall, when evaluating the performance of ITRs in a validation sample, all methods produced ITRs with limited performance, suggesting a high potential for overfitting. The different methods do not lead to similar ITRs and are therefore not interchangeable. The choice of the method has a lot of influence on which patients end up being given a certain treatment which draws some concerns about the practical use of the methods.
翻译:暂无翻译