使用变化式自动编码器编制缺值记录数据时使用的保守政策建设 (Conservative Policy Construction Using Variational Autoencoders for Logged Data with Missing Values)

In high-stakes applications of data-driven decision making like healthcare, it is of paramount importance to learn a policy that maximizes the reward while avoiding potentially dangerous actions when there is uncertainty. There are two main challenges usually associated with this problem. Firstly, learning through online exploration is not possible due to the critical nature of such applications. Therefore, we need to resort to observational datasets with no counterfactuals. Secondly, such datasets are usually imperfect, additionally cursed with missing values in the attributes of features. In this paper, we consider the problem of constructing personalized policies using logged data when there are missing values in the attributes of features in both training and test data. The goal is to recommend an action (treatment) when $\Xt$, a degraded version of $\Xb$ with missing values, is observed. We consider three strategies for dealing with missingness. In particular, we introduce the \textit{conservative strategy} where the policy is designed to safely handle the uncertainty due to missingness. In order to implement this strategy we need to estimate posterior distribution $p(\Xb|\Xt)$, we use variational autoencoder to achieve this. In particular, our method is based on partial variational autoencoders (PVAE) which are designed to capture the underlying structure of features with missing values.

翻译：在数据驱动决策(如医疗保健)的高级应用中,最重要的是要学习一种政策,在不确定的情况下,在避免潜在危险行动的同时,尽量增加奖励,同时避免潜在危险行动。这个问题通常有两大挑战。首先,由于在线探索的关键性质,不可能进行在线探索学习。因此,我们需要使用观测数据集,而没有相反事实。其次,这类数据集通常是不完善的,在特性属性缺失的值之外,还被诅咒。在本文中,我们考虑在培训和测试数据特征特性缺失时,使用登录数据构建个性化政策的问题。目标是建议一项行动( 处理), 当$\ Xt$( 美元) 时, 一种有缺失值的退化版本 $\ Xb$( 美元) 。我们考虑三种策略来处理缺失问题。特别是, 我们引入了 ktextit{ 保守战略} 。在设计此政策是为了安全处理缺失的不确定性。为了执行这一战略,我们需要估算在培训和测试数据特性特性特性中缺失的后端值分配 $p( Xb) Xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, 我们使用该方法来实现该自动变换法。