生物制造火化控制示范风险下的强化学习 (Reinforcement Learning under Model Risk for Biomanufacturing Fermentation Control)

In the biopharmaceutical manufacturing, fermentation process plays a critical role impacting on productivity and profit. Since biotherapeutics are manufactured in living cells whose biological mechanisms are complex and have highly variable outputs, in this paper, we introduce a model-based reinforcement learning framework accounting for model risk to support bioprocess online learning and guide the optimal and robust customized stopping policy for fermentation process. Specifically, built on the dynamic mechanisms of protein and impurity generation, we first construct a probabilistic model characterizing the impact of underlying bioprocess stochastic uncertainty on impurity and protein growth rates. Since biopharmaceutical manufacturing often has very limited data during the development and early stage of production, we derive the posterior distribution quantifying the process model risk, and further develop the Bayesian rule based knowledge update to support the online learning on underlying stochastic process. With the prediction risk accounting for both bioprocess stochastic uncertainty and model risk, the proposed reinforcement learning framework can proactively hedge all sources of uncertainties and support the optimal and robust customized decision making. We conduct the structural analysis of optimal policy and study the impact of model risk on the policy selection. We can show that it asymptotically converges to the optimal policy obtained under perfect information of underlying stochastic process. Our case studies demonstrate that the proposed framework can greatly improve the biomanufacturing industrial practice.

翻译：在生物制药制造业中,发酵过程对生产力和利润具有关键影响。由于生物治疗方法是在生物机制复杂且产出差异很大的活细胞中制造的,因此在本文件中,我们引入了一个基于模型的强化学习框架,用于计算模型风险,以支持生物工艺在线学习,并指导最佳和稳健的定制发酵过程制止政策。具体地说,在蛋白质和杂质生成的动态机制的基础上,我们首先构建一个概率模型,说明生物工艺潜在不确定性对不纯性和蛋白质增长率的影响。由于生物制药制造在生产和生产早期阶段往往数据非常有限,我们从后期分配到对过程模型风险进行量化,并进一步开发巴伊西亚规则的知识更新,以支持对基本的发酵过程进行在线学习。关于生物工艺的随机不确定性和模型风险的预测性会计,拟议的强化学习框架可以积极主动地对不确定因素的所有来源进行估定,并支持最佳和稳健的成型决策。我们从结构上对最佳政策进行了分析,我们从最优化的政策分析中得出了对风险进行最优化的风险评估。我们通过最优化的案例研究,从而展示了最佳的政策选择。