We introduce an on-ground Pedestrian World Model, a computational model that can predict how pedestrians move around an observer in the crowd on the ground plane, but from just the egocentric-views of the observer. Our model, InCrowdFormer, fully leverages the Transformer architecture by modeling pedestrian interaction and egocentric to top-down view transformation with attention, and autoregressively predicts on-ground positions of a variable number of people with an encoder-decoder architecture. We encode the uncertainties arising from unknown pedestrian heights with latent codes to predict the posterior distributions of pedestrian positions. We validate the effectiveness of InCrowdFormer on a novel prediction benchmark of real movements. The results show that InCrowdFormer accurately predicts the future coordination of pedestrians. To the best of our knowledge, InCrowdFormer is the first-of-its-kind pedestrian world model which we believe will benefit a wide range of egocentric-view applications including crowd navigation, tracking, and synthesis.
翻译:我们引入了地上游人世界模型(Pedestrian World Model ), 这是一种计算模型,可以预测行人如何在地面人群中的观察者周围移动,但只能从观察者的自我中心观点出发。 我们的模型(InCrowdFormer)通过模拟行人互动和自我中心,充分利用变异结构,以关注的方式进行自上而下的观变形,自动递增地预测具有编码脱coder-decoder架构的可变人数。 我们用潜在代码将未知行人高度产生的不确定因素编码编码化,以预测行人位置的后方分布。 我们验证了InCrowdFormer在真实运动的新预测基准上的有效性。 结果表明, InCrowdFormer准确预测行人的未来协调。 据我们所知, InCrowdFormer是首个具有其特色的行人世界行人模型,我们相信这将有利于包括群导航、跟踪和合成在内的一系列自我中心应用。</s>