To date, a large number of experiments are performed to develop a biochemical process. The generated data is used only once, to take decisions for development. Could we exploit data of already developed processes to make predictions for a novel process, we could significantly reduce the number of experiments needed. Processes for different products exhibit differences in behaviour, typically only a subset behave similar. Therefore, effective learning on multiple product spanning process data requires a sensible representation of the product identity. We propose to represent the product identity (a categorical feature) by embedding vectors that serve as input to a Gaussian Process regression model. We demonstrate how the embedding vectors can be learned from process data and show that they capture an interpretable notion of product similarity. The improvement in performance is compared to traditional one-hot encoding on a simulated cross product learning task. All in all, the proposed method could render possible significant reductions in wet-lab experiments.
翻译:迄今为止,为发展生化过程进行了大量实验。生成的数据只用于一次,为发展作出决定。我们能否利用已经开发的工艺的数据,为一个新过程作出预测,我们可以大大减少所需的实验数量;不同产品的工艺在行为上表现出差异,通常只有一组人有相似的行为表现。因此,要对多种产品跨过程数据进行有效学习,就需要对产品特性进行合理的描述。我们提议通过嵌入矢量来代表产品特性(一个绝对特征),作为高斯进程回归模型的输入。我们展示了嵌入矢量如何从过程数据中学习,并表明它们捕捉到一种可解释的产品相似性概念。绩效的改进与模拟跨产品学习任务的传统一热编码相比。总的来说,拟议的方法可以显著减少湿实验室实验。