Animal tracking and pose estimation systems, such as STEP (Simultaneous Tracking and Pose Estimation) and ViTPose, experience substantial performance drops when processing images and videos with cage structures and systematic occlusions. We present a three-stage preprocessing pipeline that addresses this limitation through: (1) cage segmentation using a Gabor-enhanced ResNet-UNet architecture with tunable orientation filters, (2) cage inpainting using CRFill for content-aware reconstruction of occluded regions, and (3) evaluation of pose estimation and tracking on the uncaged frames. Our Gabor-enhanced segmentation model leverages orientation-aware features with 72 directional kernels to accurately identify and segment cage structures that severely impair the performance of existing methods. Experimental validation demonstrates that removing cage occlusions through our pipeline enables pose estimation and tracking performance comparable to that in environments without occlusions. We also observe significant improvements in keypoint detection accuracy and trajectory consistency.
翻译:现有的动物追踪与姿态估计系统(如STEP(同步追踪与姿态估计)和ViTPose)在处理包含笼体结构和系统性遮挡的图像与视频时,性能会显著下降。本文提出了一种三阶段预处理流程,通过以下方式解决这一局限:(1)使用具有可调方向滤波器的Gabor增强ResNet-UNet架构进行笼体分割,(2)利用CRFill进行内容感知的遮挡区域重建以实现笼体修复,(3)在去笼后的帧上评估姿态估计与追踪性能。我们的Gabor增强分割模型利用72个方向核的方向感知特征,能准确识别并分割严重损害现有方法性能的笼体结构。实验验证表明,通过本流程去除笼体遮挡后,姿态估计与追踪性能可达到与无遮挡环境相当的水平。我们还观察到关键点检测精度与轨迹一致性均有显著提升。