Predicting accurate future trajectories of multiple agents is essential for autonomous systems, but is challenging due to the complex agent interaction and the uncertainty in each agent's future behavior. Forecasting multi-agent trajectories requires modeling two key dimensions: (1) time dimension, where we model the influence of past agent states over future states; (2) social dimension, where we model how the state of each agent affects others. Most prior methods model these two dimensions separately, e.g., first using a temporal model to summarize features over time for each agent independently and then modeling the interaction of the summarized features with a social model. This approach is suboptimal since independent feature encoding over either the time or social dimension can result in a loss of information. Instead, we would prefer a method that allows an agent's state at one time to directly affect another agent's state at a future time. To this end, we propose a new Transformer, AgentFormer, that jointly models the time and social dimensions. The model leverages a sequence representation of multi-agent trajectories by flattening trajectory features across time and agents. Since standard attention operations disregard the agent identity of each element in the sequence, AgentFormer uses a novel agent-aware attention mechanism that preserves agent identities by attending to elements of the same agent differently than elements of other agents. Based on AgentFormer, we propose a stochastic multi-agent trajectory prediction model that can attend to features of any agent at any previous timestep when inferring an agent's future position. The latent intent of all agents is also jointly modeled, allowing the stochasticity in one agent's behavior to affect other agents. Our method significantly improves the state of the art on well-established pedestrian and autonomous driving datasets.
翻译:预测多试剂轨迹需要建模两个关键层面:(1) 时间维度,我们在其中建模过去代理人状态对未来状态的影响;(2) 社会维度,我们在那里建模每个代理人的状态对他人的影响;(2) 社会维度,我们在其中建模每个代理人的状态对另一个代理人的影响。大多数先前的方法分别建模这两个维度,例如,首先使用时间模型来独立地总结每个代理人的时态特征,然后用社会模型模拟所总结的特征与每个代理人未来行为的相互作用。这种方法是次优的,因为对时间或社会层面的独立特性进行编码可能导致信息丢失。相反,我们更倾向于一种方法,让一个代理人的状态对未来状态产生直接影响;至此,我们建议一个新的变压变器,Agent Formerer, 联合模拟时间和社会层面。模型利用多种代理人轨迹与社会模型的相互作用。这个方法,通过一个固定的轨迹定位定位来影响所有多试探状态的相互作用。这个方法是次优的,因为独立特性对时间或社会维度的特性进行编码,因为独立特性的特性可能会导致信息丢失信息。相反,因为标准注意操作会将每个代理人的代理人的特性的特性的特性的特性, 向另一个动力动力动力动力动力的特性的特性的特性的特性的特性,我们使用新的动力动力动力动力动力动力动力的特性的特性的特性的特性的动作的特性的特性的特性的特性的特性的特性的特性的特性的特性的特性的特性的特性的特性的特性的特性的特性的特性的特性的特性会改进。