Most state-of-the-art instance-level human parsing models adopt two-stage anchor-based detectors and, therefore, cannot avoid the heuristic anchor box design and the lack of analysis on a pixel level. To address these two issues, we have designed an instance-level human parsing network which is anchor-free and solvable on a pixel level. It consists of two simple sub-networks: an anchor-free detection head for bounding box predictions and an edge-guided parsing head for human segmentation. The anchor-free detector head inherits the pixel-like merits and effectively avoids the sensitivity of hyper-parameters as proved in object detection applications. By introducing the part-aware boundary clue, the edge-guided parsing head is capable to distinguish adjacent human parts from among each other up to 58 parts in a single human instance, even overlapping instances. Meanwhile, a refinement head integrating box-level score and part-level parsing quality is exploited to improve the quality of the parsing results. Experiments on two multiple human parsing datasets (i.e., CIHP and LV-MHP-v2.0) and one video instance-level human parsing dataset (i.e., VIP) show that our method achieves the best global-level and instance-level performance over state-of-the-art one-stage top-down alternatives.
翻译:多数最先进的实验级人类分解模型都采用基于双级锚基检测器,因此无法避免像像素级那样的螺旋锚箱设计和缺乏分析。为了解决这两个问题,我们设计了一个在像素级上不固定和可溶解的试金色人类分解网络。它由两个简单的子网络组成:一个无锚检测头,用于捆绑盒预测,另一个边缘引导分解头,用于人类分解。无锚检测头继承像像像像像像像像一样的优点,有效地避免了目标检测应用中证明的超参数的敏感度。通过引入分觉界线索,边缘引导分解头能够将相邻的人类部分从一个像素级的58个部分区分开来,甚至重复。与此同时,一个精细的首,将框级分分分和部分分级分解头等用于提高分解结果的质量。在两个人类分级级级的比值级别上实验了超比值的超标度,在目标级检测应用中证明了了超值的超比值。 通过引入了半级的分级图像级数据(i.M.l-HP.