Keypoint detection plays an important role in a wide range of applications. However, predicting keypoints of small objects such as human hands is a challenging problem. Recent works fuse feature maps of deep Convolutional Neural Networks (CNNs), either via multi-level feature integration or multi-resolution aggregation. Despite achieving some success, the feature fusion approaches increase the complexity and the opacity of CNNs. To address this issue, we propose a novel CNN model named Multi-Scale Deep Supervision Network (P-MSDSNet) that learns feature maps at different scales with deep supervisions to produce attention maps for adaptive feature propagation from layers to layers. P-MSDSNet has a multi-stage architecture which makes it scalable while its deep supervision with spatial attention improves transparency to the feature learning at each stage. We show that P-MSDSNet outperforms the state-of-the-art approaches on benchmark datasets while requiring fewer number of parameters. We also show the application of P-MSDSNet to quantify finger tapping hand movements in a neuroscience study.
翻译:然而,预测人类手等小物体的临界点是一个具有挑战性的问题。最近的工作通过多层次地物集成或多分辨率聚合的方式结合了深革命神经网络(CNNs)的特征图。尽管取得了一些成功,但特征集成方法增加了CNN的复杂性和不透明性。为了解决这一问题,我们提议了一个名为多空间深层监督网(P-MSDSNet)的新颖CNN模型,该模型在不同的尺度上学习地物图,并进行深入的监督,以产生从层到层的适应性地物传播的注意图。P-MSDSNet有一个多阶段结构,使它能够扩展,同时在空间上进行深入的监督提高了每个阶段地物学的透明度。我们表明,P-MSDSNet超越了基准数据集的最新方法,同时需要较少的参数。我们还展示了P-MSDSNet在神经科学研究中用于量化手指抽动的应用程序。