The compounding of plastics with recycled material remains a practical challenge, as the properties of the processed material is not as easy to control as with completely new raw materials. For a data scientist, it makes sense to plan the necessary experiments in the development of new compounds using Bayesian Optimization, an optimization approach based on a surrogate model that is known for its data efficiency and is therefore well suited for data obtained from costly experiments. Furthermore, if historical data and expert knowledge are available, their inclusion in the surrogate model is expected to accelerate the convergence of the optimization. In this article, we describe a use case in which the addition of data and knowledge has impaired optimization. We also describe the unsuccessful methods that were used to remedy the problem before we found the reasons for the poor performance and achieved a satisfactory result. We conclude with a lesson learned: additional knowledge and data are only beneficial if they do not complicate the underlying optimization goal.
翻译:塑料与回收材料的复合仍是一个实际挑战,因为加工后材料的性能不如全新原材料那样易于控制。对于数据科学家而言,利用贝叶斯优化来规划新复合材料开发所需的实验是合理的——这是一种基于代理模型的优化方法,以其数据高效性著称,因此非常适合从高成本实验中获取的数据。此外,若存在历史数据和专家知识,将其纳入代理模型有望加速优化的收敛。本文描述了一个实际案例,其中额外数据和知识的加入反而损害了优化效果。我们详细阐述了在找到性能不佳的原因并获得满意结果前,所尝试的无效解决方法。最后总结出一个经验:额外的知识和数据仅在不会使底层优化目标复杂化时才有益。