We introduce optical Flow transFormer, dubbed as FlowFormer, a transformer-based neural network architecture for learning optical flow. FlowFormer tokenizes the 4D cost volume built from an image pair, encodes the cost tokens into a cost memory with alternate-group transformer (AGT) layers in a novel latent space, and decodes the cost memory via a recurrent transformer decoder with dynamic positional cost queries. On the Sintel benchmark, FlowFormer achieves 1.159 and 2.088 average end-ponit-error (AEPE) on the clean and final pass, a 16.5% and 15.5% error reduction from the best published result (1.388 and 2.47). Besides, FlowFormer also achieves strong generalization performance. Without being trained on Sintel, FlowFormer achieves 0.64 and 1.50 AEPE on the clean and final pass of Sintel training set, outperforming the best published result (1.29 and 2.74) by 50.4% and 45.3%.
翻译:我们引入了光学流动转换器,称为Flow Former, 以变压器为基础的神经网络结构,用于学习光学流动。 Flower Former象征着从一对图像中建造的4D成本体积,将成本符号编码成代组变压器(AGT)层的成本内存,在一个新的潜在空间中将成本符号编码成成本内存,并通过一个具有动态位置成本查询的经常性变压器解码器解码。在Sintel基准上, FlowerFormer在清洁和最后通过上实现了平均1.159和2.088,比公布的最佳最终结果(AEPE)减少16.5%和15.5%(1.388和2.47)。此外, Flowerformer还实现了强大的通用性表现。在Sintel没有接受培训的情况下,Flowmer公司在清洁和最后通过Sintel培训上实现了0.64和1.50 AEPE,比公布的最佳结果(1.29和2.74)高出50.4%和45.3%。