Existing approaches for autonomous control of pan-tilt-zoom (PTZ) cameras use multiple stages where object detection and localization are performed separately from the control of the PTZ mechanisms. These approaches require manual labels and suffer from performance bottlenecks due to error propagation across the multi-stage flow of information. The large size of object detection neural networks also makes prior solutions infeasible for real-time deployment in resource-constrained devices. We present an end-to-end deep reinforcement learning (RL) solution called Eagle to train a neural network policy that directly takes images as input to control the PTZ camera. Training reinforcement learning is cumbersome in the real world due to labeling effort, runtime environment stochasticity, and fragile experimental setups. We introduce a photo-realistic simulation framework for training and evaluation of PTZ camera control policies. Eagle achieves superior camera control performance by maintaining the object of interest close to the center of captured images at high resolution and has up to 17% more tracking duration than the state-of-the-art. Eagle policies are lightweight (90x fewer parameters than Yolo5s) and can run on embedded camera platforms such as Raspberry PI (33 FPS) and Jetson Nano (38 FPS), facilitating real-time PTZ tracking for resource-constrained environments. With domain randomization, Eagle policies trained in our simulator can be transferred directly to real-world scenarios.
翻译:现有的 PTZ 摄像头自主控制方法需要分别进行目标检测和定位,然后控制 PTZ 机制。这些方法需要手动标注,并且由于信息多阶段流的误差传播,会出现性能瓶颈。目标检测神经网络的大尺寸也使得现有解决方案无法用于资源受限设备的实时部署。我们提出了一种名为 Eagle 的端到端深度强化学习(RL)解决方案,用于训练一个直接接受图像输入以控制 PTZ 摄像头的神经网络策略。实时环境随机性、易碎的实验设置以及标注工作使得强化学习在现实中的训练非常困难。我们引入了一个逼真的仿真框架,用于 PTZ 摄像头控制策略的训练和评估。Eagle 通过使被追踪对象保持在高分辨率的图像中央,控制 PTZ 摄像头的性能优于现有的方法,追踪持续时间增长了高达 17%。Eagle 策略轻量级(参数数量是 Yolo5s 的 90 倍),可以在嵌入式摄像头平台(如 Raspberry PI 和 Jetson Nano)上运行,在资源受限环境中实现 PTZ 实时跟踪。通过域随机化,我们在仿真器中训练的 Eagle 策略可以直接迁移到真实世界的场景中。