In action understanding in indoor, we have to recognize human pose and action considering privacy. Although camera images can be used for highly accurate human action recognition, camera images do not preserve privacy. Therefore, we propose a new task for human instance segmentation from invisible information, especially airborne ultrasound, for action recognition. To perform instance segmentation from invisible information, we first convert sound waves to reflected sound directional images (sound images). Although the sound images can roughly identify the location of a person, the detailed shape is ambiguous. To address this problem, we propose a collaborative learning variational autoencoder (CL-VAE) that simultaneously uses sound and RGB images during training. In inference, it is possible to obtain instance segmentation results only from sound images. As a result of performance verification, CL-VAE could estimate human instance segmentations more accurately than conventional variational autoencoder and some other models. Since this method can obtain human segmentations individually, it could be applied to human action recognition tasks with privacy protection.
翻译:在室内的行动理解中,我们必须认识到人的形象和考虑到隐私的行动。虽然相机图像可以用于高度精确的人类行动识别,但相机图像并不能保护隐私。 因此, 我们提议一项新的任务, 即从无形信息中, 特别是空中超声波进行人类的分解, 以供行动识别。 要执行从无形信息进行的分解, 我们首先将声波转换为反映声音方向图像( 声音图像) 。 虽然声音图像可以大致识别一个人的位置, 但详细的形状是模糊的。 为了解决这个问题, 我们提议合作学习变异自动编码器( CL- VAE ), 在培训中同时使用声音和 RGB 图像。 推断, 只能从声音图像中获取实例分解结果。 作为性能验证的结果, CL- VAE 可以估计人类的分解比传统的变异自动解码器和其他一些模型更精确。 由于这种方法可以单个地获取人的分解, 它可以应用于保护隐私的人类行动识别任务。