Pedestrian behavior prediction is one of the major challenges for intelligent driving systems. Pedestrians often exhibit complex behaviors influenced by various contextual elements. To address this problem, we propose BiPed, a multitask learning framework that simultaneously predicts trajectories and actions of pedestrians by relying on multimodal data. Our method benefits from 1) a bifold encoding approach where different data modalities are processed independently allowing them to develop their own representations, and jointly to produce a representation for all modalities using shared parameters; 2) a novel interaction modeling technique that relies on categorical semantic parsing of the scenes to capture interactions between target pedestrians and their surroundings; and 3) a bifold prediction mechanism that uses both independent and shared decoding of multimodal representations. Using public pedestrian behavior benchmark datasets for driving, PIE and JAAD, we highlight the benefits of the proposed method for behavior prediction and show that our model achieves state-of-the-art performance and improves trajectory and action prediction by up to 22% and 9% respectively. We further investigate the contributions of the proposed reasoning techniques via extensive ablation studies.
翻译:为解决这一问题,我们提议BiPed,这是一个多任务学习框架,这个框架通过依赖多式联运数据同时预测行人的轨迹和行动。 我们的方法的好处是1) 一种双倍编码方法,其中不同的数据模式可以独立地处理,允许他们利用共享参数发展自己的表现方式,并共同生成所有模式的代表性;2) 一种新的互动模型技术,依赖对场景的绝对语义分解来捕捉目标行人及其周围的相互作用;3) 一种双倍预测机制,使用独立和共同的多式联运代表方式的解码。我们使用公共行人行为基准数据集进行驾驶,PIE和JAAAD,我们强调拟议的行为预测方法的好处,并表明我们的模型达到最新业绩,并分别提高22%和9%的轨迹和行动预测。我们进一步通过广泛的模拟研究来调查拟议推理技术的贡献。