Biopharmaceutical manufacturing is a rapidly growing industry with impact in virtually all branches of medicines. Biomanufacturing processes require close monitoring and control, in the presence of complex bioprocess dynamics with many interdependent factors, as well as extremely limited data due to the high cost of experiments as well as the novelty of personalized bio-drugs. We develop a novel model-based reinforcement learning framework that can achieve human-level control in low-data environments. The model uses a dynamic Bayesian network to capture causal interdependencies between factors and predict how the effects of different inputs propagate through the pathways of the bioprocess mechanisms. This enables the design of process control policies that are both interpretable and robust against model risk. We present a computationally efficient, provably convergence stochastic gradient method for optimizing such policies. Validation is conducted on a realistic application with a multi-dimensional, continuous state variable.
翻译:生物制药制造业是一个迅速增长的产业,对几乎所有医药部门都有影响;生物制造过程需要密切监测和控制,因为生物工艺动态复杂,存在许多相互依存的因素,而且由于实验成本高以及个性化生物药物的新颖性,数据极为有限;我们开发了一个新型的基于模型的强化学习框架,可以在低数据环境中实现人的水平控制;模型使用动态的贝叶西亚网络来捕捉各种因素之间的因果关系,并预测不同投入如何通过生物加工机制的路径传播。这使得能够设计既可解释又对模型风险具有活力的流程控制政策。我们为优化这种政策提出了一种计算效率高、可变趋同的梯度方法。在现实的应用中,利用多维、连续的状态变量进行了验证。