Video anomaly detection is an ill-posed problem because it relies on many parameters such as appearance, pose, camera angle, background, and more. We distill the problem to anomaly detection of human pose, thus reducing the risk of nuisance parameters such as appearance affecting the result. Focusing on pose alone also has the side benefit of reducing bias against distinct minority groups. Our model works directly on human pose graph sequences and is exceptionally lightweight ($\sim1K$ parameters), capable of running on any machine able to run the pose estimation with negligible additional resources. We leverage the highly compact pose representation in a normalizing flows framework, which we extend to tackle the unique characteristics of spatio-temporal pose data and show its advantages in this use case. Our algorithm uses normalizing flows to learn a bijective mapping between the pose data distribution and a Gaussian distribution, using spatio-temporal graph convolution blocks. The algorithm is quite general and can handle training data of only normal examples, as well as a supervised dataset that consists of labeled normal and abnormal examples. We report state-of-the-art results on two anomaly detection benchmarks - the unsupervised ShanghaiTech dataset and the recent supervised UBnormal dataset.
翻译:视频异常现象的探测是一个不恰当的问题,因为它依赖于许多参数,如外观、面貌、相机角度、背景等等。我们将问题归结为对人构成的异常检测,从而降低对人构成的干扰性参数的风险,例如影响结果的外观。只关注表面本身也具有减少对不同少数群体的偏见的附带好处。我们的模型直接在人构成图序列上工作,并且特别轻,(sim1K$),能够运行在任何能够运行图像估计的机器上运行,而追加的资源微不足道。我们在正常流框架中利用高度紧凑的外观代表制表,我们将这种结构扩展到处理spatio-时尚数据的独特特征,并在此使用的情况下展示其优势。我们的算法使用正常流来学习对构成数据分布和高斯分布的双向映图,使用spattio-peoporal 图形相控区块。算法非常笼统,能够处理只有正常例子的培训数据,以及由正常和不正常例子组成的监管数据集。我们报告在两处的异常异常数据检测上的最新数据基准。