Unsupervised online 3D instance segmentation is a fundamental yet challenging task, as it requires maintaining consistent object identities across LiDAR scans without relying on annotated training data. Existing methods, such as UNIT, have made progress in this direction but remain constrained by limited training diversity, rigid temporal sampling, and heavy dependence on noisy pseudo-labels. We propose a new framework that enriches the training distribution through synthetic point cloud sequence generation, enabling greater diversity without relying on manual labels or simulation engines. To better capture temporal dynamics, our method incorporates a flexible sampling strategy that leverages both adjacent and non-adjacent frames, allowing the model to learn from long-range dependencies as well as short-term variations. In addition, a dynamic-weighting loss emphasizes confident and informative samples, guiding the network toward more robust representations. Through extensive experiments on SemanticKITTI, nuScenes, and PandaSet, our method consistently outperforms UNIT and other unsupervised baselines, achieving higher segmentation accuracy and more robust temporal associations. The code will be publicly available at github.com/Eaphan/SFT3D.
翻译:无监督在线三维实例分割是一项基础且具有挑战性的任务,因为它需要在无需标注训练数据的情况下,在连续激光雷达扫描中维持一致的目标身份标识。现有方法(如UNIT)已在该方向取得进展,但仍受限于训练多样性不足、时序采样方式僵化以及对噪声伪标签的严重依赖。我们提出一种新框架,通过合成点云序列生成来丰富训练数据分布,从而在不依赖人工标注或仿真引擎的情况下实现更高的多样性。为更好地捕捉时序动态特性,本方法采用一种灵活的采样策略,同时利用相邻帧与非相邻帧,使模型能够学习长程依赖关系与短期变化。此外,一种动态加权损失函数能够强化对高置信度与信息丰富样本的关注,引导网络学习更鲁棒的特征表示。通过在SemanticKITTI、nuScenes和PandaSet数据集上的大量实验,本方法持续优于UNIT及其他无监督基线模型,实现了更高的分割精度与更鲁棒的时序关联。代码将在github.com/Eaphan/SFT3D公开。