The objective of this study is to develop a model-free workspace trajectory planner for space manipulators using a Twin Delayed Deep Deterministic Policy Gradient (TD3) agent to enable safe and reliable debris capture. A local control strategy with singularity avoidance and manipulability enhancement is employed to ensure stable execution. The manipulator must simultaneously track a capture point on a non-cooperative target, avoid self-collisions, and prevent unintended contact with the target. To address these challenges, we propose a curriculum-based multi-critic network where one critic emphasizes accurate tracking and the other enforces collision avoidance. A prioritized experience replay buffer is also used to accelerate convergence and improve policy robustness. The framework is evaluated on a simulated seven-degree-of-freedom KUKA LBR iiwa mounted on a free-floating base in Matlab/Simulink, demonstrating safe and adaptive trajectory generation for debris removal missions.
翻译:本研究旨在利用双延迟深度确定性策略梯度(TD3)智能体,为空间机械臂开发一种无模型工作空间轨迹规划器,以实现安全可靠的碎片捕获。通过采用具有奇点规避与可操作性增强的局部控制策略来确保稳定执行。机械臂需同时追踪非合作目标上的捕获点、避免自碰撞并防止与目标发生意外接触。为应对这些挑战,我们提出了一种基于课程学习的多评价器网络架构,其中一个评价器侧重于精确追踪,另一个则强化碰撞规避。同时采用优先经验回放缓冲区以加速收敛并提升策略鲁棒性。该框架在Matlab/Simulink中搭载于自由浮动基座上的七自由度KUKA LBR iiwa仿真模型上进行评估,验证了其在碎片清除任务中生成安全自适应轨迹的有效性。