Multi-agent spatiotemporal modeling is a challenging task from both an algorithmic design and computational complexity perspective. Recent work has explored the efficacy of traditional deep sequential models in this domain, but these architectures are slow and cumbersome to train, particularly as model size increases. Further, prior attempts to model interactions between agents across time have limitations, such as imposing an order on the agents, or making assumptions about their relationships. In this paper, we introduce baller2vec, a multi-entity generalization of the standard Transformer that, with minimal assumptions, can simultaneously and efficiently integrate information across entities and time. We test the effectiveness of baller2vec for multi-agent spatiotemporal modeling by training it to perform two different basketball-related tasks: (1) simultaneously forecasting the trajectories of all players on the court and (2) forecasting the trajectory of the ball. Not only does baller2vec learn to perform these tasks well, it also appears to "understand" the game of basketball, encoding idiosyncratic qualities of players in its embeddings, and performing basketball-relevant functions with its attention heads.
翻译:从算法设计和计算复杂的角度来看,多试剂超时建模是一项具有挑战性的任务。最近的工作探索了该领域传统深层次相继模型的功效,但这些结构在培训方面缓慢而繁琐,特别是随着模型规模的增大。此外,以往试图模拟代理人之间在时间上的相互作用,例如对代理人施加命令,或对其关系作出假设,都具有局限性。在本文中,我们引入了Baller2vec,即标准变形器的多实体通用化,该变形器在最低假设的情况下,可以同时有效地将各实体和时间的信息结合起来。我们测试了多剂间相继模型的功效,通过训练它执行两种不同的篮球相关任务:(1) 同时预测法庭上所有球员的轨迹,(2) 预测球轨迹。 球2vec不仅学会如何很好地完成这些任务,而且似乎“理解”篮球的游戏,将球员的特征合成质量编码成其嵌入式,并以其关注头执行篮球相关功能。