Previous dominant methods for scene flow estimation focus mainly on input from two consecutive frames, neglecting valuable information in the temporal domain. While recent trends shift towards multi-frame reasoning, they suffer from rapidly escalating computational costs as the number of frames grows. To leverage temporal information more efficiently, we propose DeltaFlow ($Δ$Flow), a lightweight 3D framework that captures motion cues via a $Δ$ scheme, extracting temporal features with minimal computational cost, regardless of the number of frames. Additionally, scene flow estimation faces challenges such as imbalanced object class distributions and motion inconsistency. To tackle these issues, we introduce a Category-Balanced Loss to enhance learning across underrepresented classes and an Instance Consistency Loss to enforce coherent object motion, improving flow accuracy. Extensive evaluations on the Argoverse 2, Waymo and nuScenes datasets show that $Δ$Flow achieves state-of-the-art performance with up to 22% lower error and $2\times$ faster inference compared to the next-best multi-frame supervised method, while also demonstrating a strong cross-domain generalization ability. The code is open-sourced at https://github.com/Kin-Zhang/DeltaFlow along with trained model weights.
翻译:先前主流的场景流估计方法主要关注连续两帧的输入,忽略了时域中的宝贵信息。尽管近期研究趋势转向多帧推理,但随着帧数增加,其计算成本急剧上升。为更高效地利用时序信息,我们提出DeltaFlow($Δ$Flow),一种轻量级3D框架,通过$Δ$方案捕捉运动线索,以极低计算成本提取时序特征,且不受帧数影响。此外,场景流估计面临物体类别分布不均衡与运动不一致等挑战。为解决这些问题,我们引入类别平衡损失以增强对低代表性类别的学习,并提出实例一致性损失以确保物体运动的连贯性,从而提升流估计精度。在Argoverse 2、Waymo和nuScenes数据集上的大量实验表明,$Δ$Flow在取得最优性能的同时,误差较次优多帧监督方法降低达22%,推理速度提升$2\times$,并展现出强大的跨域泛化能力。代码及训练模型权重已开源:https://github.com/Kin-Zhang/DeltaFlow。