以定点单词进行生存监督的专题模型:确定胰腺炎结果的特征 (Survival-Supervised Topic Modeling with Anchor Words: Characterizing Pancreatitis Outcomes)

We introduce a new approach for topic modeling that is supervised by survival analysis. Specifically, we build on recent work on unsupervised topic modeling with so-called anchor words by providing supervision through an elastic-net regularized Cox proportional hazards model. In short, an anchor word being present in a document provides strong indication that the document is partially about a specific topic. For example, by seeing "gallstones" in a document, we are fairly certain that the document is partially about medicine. Our proposed method alternates between learning a topic model and learning a survival model to find a local minimum of a block convex optimization problem. We apply our proposed approach to predicting how long patients with pancreatitis admitted to an intensive care unit (ICU) will stay in the ICU. Our approach is as accurate as the best of a variety of baselines while being more interpretable than any of the baselines.

翻译：我们引入了一种由生存分析监督的新主题模型方法。具体地说, 我们以最近关于以所谓的锚字进行不受监督主题模型的工作为基础, 通过一个弹性网常规化的考克斯比例危害模型提供监管。简而言之, 文件中有一个主词有力地表明文件部分是关于某个特定主题的。例如, 通过在文件中看到“ 凝固石 ”, 我们相当肯定该文件部分是关于医学的。我们提出的方法在学习一个主题模型和学习一个生存模型以找到一个块状锥形优化问题的地方最低值之间有所替代。我们采用了我们提出的方法来预测一个强化护理单位( ICU) 收治的胰腺炎患者将停留多久。我们的方法与各种基线的最佳方法一样准确,同时比任何基线都更容易解释。