Perception of deformable linear objects (DLOs), such as cables, ropes, and wires, is the cornerstone for successful downstream manipulation. Although vision-based methods have been extensively explored, they remain highly vulnerable to occlusions that commonly arise in constrained manipulation environments due to surrounding obstacles, large and varying deformations, and limited viewpoints. Moreover, the high dimensionality of the state space, the lack of distinctive visual features, and the presence of sensor noises further compound the challenges of reliable DLO perception. To address these open issues, this paper presents UniStateDLO, the first complete DLO perception pipeline with deep-learning methods that achieves robust performance under severe occlusion, covering both single-frame state estimation and cross-frame state tracking from partial point clouds. Both tasks are formulated as conditional generative problems, leveraging the strong capability of diffusion models to capture the complex mapping between highly partial observations and high-dimensional DLO states. UniStateDLO effectively handles a wide range of occlusion patterns, including initial occlusion, self-occlusion, and occlusion caused by multiple objects. In addition, it exhibits strong data efficiency as the entire network is trained solely on a large-scale synthetic dataset, enabling zero-shot sim-to-real generalization without any real-world training data. Comprehensive simulation and real-world experiments demonstrate that UniStateDLO outperforms all state-of-the-art baselines in both estimation and tracking, producing globally smooth yet locally precise DLO state predictions in real time, even under substantial occlusions. Its integration as the front-end module in a closed-loop DLO manipulation system further demonstrates its ability to support stable feedback control in complex, constrained 3-D environments.
翻译:对电缆、绳索和线缆等可变形线性物体的感知是成功实现下游操作的基础。尽管基于视觉的方法已被广泛探索,但在受限操作环境中,由于周围障碍物、大范围且多变的形变以及有限的视角,遮挡现象普遍存在,现有方法对此仍极为敏感。此外,状态空间的高维性、缺乏显著视觉特征以及传感器噪声的存在,进一步增加了实现可靠DLO感知的挑战。为解决这些开放性问题,本文提出了UniStateDLO,这是首个采用深度学习方法、在严重遮挡下实现鲁棒性能的完整DLO感知流程,涵盖了从部分点云进行单帧状态估计和跨帧状态跟踪两项任务。这两项任务均被构建为条件生成问题,利用扩散模型强大的能力来捕捉高度部分观测与高维DLO状态之间的复杂映射关系。UniStateDLO能有效处理多种遮挡模式,包括初始遮挡、自遮挡以及由多个物体引起的遮挡。此外,该网络展现出强大的数据效率,整个网络仅在一个大规模合成数据集上进行训练,即可实现零样本的仿真到真实世界泛化,无需任何真实世界训练数据。全面的仿真和真实世界实验表明,UniStateDLO在估计和跟踪任务上均优于所有最先进的基线方法,即使在严重遮挡下,也能实时生成全局平滑且局部精确的DLO状态预测。将其作为前端模块集成到一个闭环DLO操作系统中,进一步证明了其在复杂、受限的三维环境中支持稳定反馈控制的能力。