We propose a fully convolutional multi-person pose estimation framework using dynamic instance-aware convolutions, termed FCPose. Different from existing methods, which often require ROI (Region of Interest) operations and/or grouping post-processing, FCPose eliminates the ROIs and grouping post-processing with dynamic instance-aware keypoint estimation heads. The dynamic keypoint heads are conditioned on each instance (person), and can encode the instance concept in the dynamically-generated weights of their filters. Moreover, with the strong representation capacity of dynamic convolutions, the keypoint heads in FCPose are designed to be very compact, resulting in fast inference and making FCPose have almost constant inference time regardless of the number of persons in the image. For example, on the COCO dataset, a real-time version of FCPose using the DLA-34 backbone infers about 4.5x faster than Mask R-CNN (ResNet-101) (41.67 FPS vs. 9.26FPS) while achieving improved performance. FCPose also offers better speed/accuracy trade-off than other state-of-the-art methods. Our experiment results show that FCPose is a simple yet effective multi-person pose estimation framework. Code is available at: https://git.io/AdelaiDet
翻译:我们提议一个完全革命性的多人构成估计框架,使用动态的、有想象力的、称为FCPose的、动态的、称为FCPose的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、动态的、称为FCPose的关键点的、设计非常紧凑凑合的、通常需要ROI(利益区域)运作和/或组合的处理后处理作业、FCPose的处理过程和组合式、以及动态的、有动态的、有自能的、能感的、感知的、能的、可感知的、可感知的、可感知的、可感知的、可知的、可知的、可知的、可知的、可知的、可知的、可知的、可知的、可知、可知、可知、可知、可知的、可知的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及、可及的、可及的、可及、可及、可及、可及的、可及、可及、可及、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可及的、可