Profile extrusion is a continuous production process for manufacturing plastic profiles from molten polymer. Especially interesting is the design of the die, through which the melt is pressed to attain the desired shape. However, due to an inhomogeneous velocity distribution at the die exit or residual stresses inside the extrudate, the final shape of the manufactured part often deviates from the desired one. To avoid these deviations, the shape of the die can be computationally optimized, which has already been investigated in the literature using classical optimization approaches. A new approach in the field of shape optimization is the utilization of Reinforcement Learning (RL) as a learning-based optimization algorithm. RL is based on trial-and-error interactions of an agent with an environment. For each action, the agent is rewarded and informed about the subsequent state of the environment. While not necessarily superior to classical, e.g., gradient-based or evolutionary, optimization algorithms for one single problem, RL techniques are expected to perform especially well when similar optimization tasks are repeated since the agent learns a more general strategy for generating optimal shapes instead of concentrating on just one single problem. In this work, we investigate this approach by applying it to two 2D test cases. The flow-channel geometry can be modified by the RL agent using so-called Free-Form Deformation, a method where the computational mesh is embedded into a transformation spline, which is then manipulated based on the control-point positions. In particular, we investigate the impact of utilizing different agents on the training progress and the potential of wall time saving by utilizing multiple environments during training.
翻译:剖面剖面图是用熔化聚合物制造塑料剖面的连续生产过程。 尤其有趣的是, 熔化过程通过死亡的设计来达到理想的形状。 但是, 由于在死亡出口或末端剩余压力中, 制造部件的最后形状分布不均速, 往往偏离理想的形状 。 为了避免这些偏差, 死亡的形状可以进行计算优化, 文献中已经使用经典优化方法对此进行了研究 。 在形状优化领域, 一种新的方法是利用强化学习( RL) 来作为基于学习的优化算法 。 RL 是基于一个代理物与环境的试验和机率互动。 对于每种动作, 代理物的奖赏和告知环境与预期环境不同。 虽然不一定优于经典, 例如, 梯度或进化, 优化算法对于一个单一的问题, 当类似的优化任务被重复时, RL 技术将会特别有效。 因为代理物学习了一种更普遍的策略来生成最佳的形状, 而不是仅仅集中在一个基于环境的优化的优化的优化算法, 使用一个特殊的训练位置。 对于一个工序的精度的精度, 我们用这种精度的精度的精度的精度的精度的精度, 使用一种精度, 在使用一种精度的精度的精度的精度的精度的精度的精度上, 我们的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度分析法方法在一种精度的精度的精度上, 。