In this paper, it has attempted to use Reinforcement learning to model the proper dosage of Warfarin for patients.The paper first examines two baselines: a fixed model of 35 mg/week dosages and a linear model that relies on patient data. We implemented a LinUCB bandit that improved performance measured on regret and percent incorrect. On top of the LinUCB bandit, we experimented with online supervised learning and reward reshaping to boost performance. Our results clearly beat the baselines and show the promise of using multi-armed bandits and artificial intelligence to aid physicians in deciding proper dosages.
翻译:本文试图利用加强学习来模拟对病人的Warfarin适当剂量。 论文首先考察了两个基线:35毫克/周剂量的固定模型和依赖病人数据的线性模型。 我们实施了LinUCB强盗,根据遗憾和不正确的百分率来改进业绩。 在LinUCB强盗中,我们实验了在线监督学习和奖励重塑,以提高业绩。我们的结果明显超越了基线,并展示了使用多臂强盗和人工智能帮助医生决定适当剂量的希望。