In this work, we focus on Interactive Human Parsing (IHP), which aims to segment a human image into multiple human body parts with guidance from users' interactions. This new task inherits the class-aware property of human parsing, which cannot be well solved by traditional interactive image segmentation approaches that are generally class-agnostic. To tackle this new task, we first exploit user clicks to identify different human parts in the given image. These clicks are subsequently transformed into semantic-aware localization maps, which are concatenated with the RGB image to form the input of the segmentation network and generate the initial parsing result. To enable the network to better perceive user's purpose during the correction process, we investigate several principal ways for the refinement, and reveal that random-sampling-based click augmentation is the best way for promoting the correction effectiveness. Furthermore, we also propose a semantic-perceiving loss (SP-loss) to augment the training, which can effectively exploit the semantic relationships of clicks for better optimization. To the best knowledge, this work is the first attempt to tackle the human parsing task under the interactive setting. Our IHP solution achieves 85\% mIoU on the benchmark LIP, 80\% mIoU on PASCAL-Person-Part and CIHP, 75\% mIoU on Helen with only 1.95, 3.02, 2.84 and 1.09 clicks per class respectively. These results demonstrate that we can simply acquire high-quality human parsing masks with only a few human effort. We hope this work can motivate more researchers to develop data-efficient solutions to IHP in the future.
翻译:在这项工作中,我们侧重于互动的人类剖析( IPH), 目的是在用户互动的指导下将人类图像分为多个人体部分。 这一新任务继承了人类剖析的阶级认知属性, 而这些属性无法通过传统的交互式图像分割法得到很好解决, 通常都是阶级分解法。 为了完成这一新任务, 我们首先利用用户点击来识别给定图像中不同的人类部分。 这些点击随后被转换成语义识别本地化地图, 与 RGB 图像相融合, 形成简单的分割网络输入, 并产生初始解析结果 。 为了让网络在校正过程中更好地理解用户的目的, 我们调查了一些主要的改进方法, 并揭示随机抽样放大是提高校正效果的最佳方法 。 此外, 我们还提出一个语义感测损失( SP) 来增加这些培训, 这只能有效地利用 RGB 的语义性工作关系, 从而更好地优化 。 3 最了解的是, 这项工作是首次尝试在IMHI 和IMI 中分别解决 85 IMHI 和IMI 的 mL 任务, 。