I congratulate Profs. Binyan Jiang, Rui Song, Jialiang Li, and Donglin Zeng (JSLZ) for an exciting development in conducting inferences on optimal dynamic treatment regimes (DTRs) learned via empirical risk minimization using the entropy loss as a surrogate. JSLZ's approach leverages a rejection-and-importance-sampling estimate of the value of a given decision rule based on inverse probability weighting (IPW) and its interpretation as a weighted (or cost-sensitive) classification. Their use of smooth classification surrogates enables their careful approach to analyzing asymptotic distributions. However, even for evaluation purposes, the IPW estimate is problematic as it leads to weights that discard most of the data and are extremely variable on whatever remains. In this comment, I discuss an optimization-based alternative to evaluating DTRs, review several connections, and suggest directions forward. This extends the balanced policy evaluation approach of Kallus (2018a) to the longitudinal setting.
翻译:我祝贺Binyan Jiang教授、Rui Song教授、Jialiang Li教授和Donglin Zeng教授(JSLZ)在对最佳动态治疗制度(DTRs)进行推断方面取得了令人振奋的进展,他们通过使用环球损失作为替代物来最大限度地减少实验风险而学到了这种制度。JSLZ的方法利用一个拒绝和重视的估计数,对某项决定规则的价值进行抽样估计,其依据是反概率加权(IPW)及其作为加权(或成本敏感)分类的解释。使用平稳的分类代用器,使得他们能够仔细分析无药可治的分布。然而,即使为了评估的目的,IPW的估计也存在问题,因为它导致权重,抛弃了大多数数据,并且对剩下的数据极具差异性。在本评论中,我讨论了一种基于优化的办法来评价DTRs,审查若干联系,并提出前进方向。这把卡卢斯(2018a)的平衡政策评价方法扩展到了纵向环境。