In the biopharmaceutical manufacturing, fermentation process plays a critical role impacting on productivity and profit. Since biotherapeutics are manufactured in living cells whose biological mechanisms are complex and have highly variable outputs, in this paper, we introduce a model-based reinforcement learning framework accounting for model risk to support bioprocess online learning and guide the optimal reliable customized stopping policy for fermentation process. Specifically, built on the dynamic mechanisms of protein and impurity generation, we first construct a probabilistic model characterizing the impact of underlying bioprocess stochastic uncertainty on impurity and protein growth rates. Since biopharmaceutical manufacturing often has very limited batch data during the development and early stage of production, we derive the posterior distribution quantifying the process model risk, and further develop the Bayesian rule based knowledge update to support bioprocess online learning. With the prediction risk accounting for both bioprocess stochastic uncertainty and model risk, the proposed reinforcement learning framework can provide the optimal and reliable decision making. We conduct the structural analysis of optimal policy and study the impact of model risk on the policy selection. We can show that it asymptotically converges to the optimal policy obtained under perfect information of underlying stochastic process. Our case studies demonstrate that the proposed framework can greatly improve the biomanufacturing industrial practice.
翻译:在生物制药制造中,发酵过程对生产力和利润具有关键影响。由于生物治疗方法是在生物机制复杂且产出差异很大的活细胞中制造的,因此在本文件中,我们引入一个基于模型的强化学习框架,对模型风险进行核算,以支持生物工艺在线学习,并指导最佳的可靠定制的发酵过程制止政策。具体地说,在蛋白质和杂质生成动态机制的基础上,我们首先构建一个概率模型,说明生物工艺基本不确定性对不纯性和蛋白增长率的影响。由于生物制药生产在生产和生产初期往往只有非常有限的批量数据,因此我们得出对过程模型风险进行量化的后方分布,并进一步开发贝叶斯规则的知识更新,以支持生物工艺的在线学习。在对生物工艺的随机不确定性和模型风险进行预测性核算时,拟议中的强化学习框架可以提供最佳和可靠的决策。我们对最佳政策进行结构分析,并研究模型风险对政策选择的影响。在政策开发和早期生产阶段,我们获得的批量数据分布非常有限,因此,我们可以得出对流程进行最完善的案例研究。我们可以展示最佳的理论基础。