Domain shifts, such as appearance changes, are a key challenge in real-world applications of activity recognition models, which range from assistive robotics and smart homes to driver observation in intelligent vehicles. For example, while simulations are an excellent way of economical data collection, a Synthetic-to-Real domain shift leads to a > 60% drop in accuracy when recognizing activities of Daily Living (ADLs). We tackle this challenge and introduce an activity domain generation framework which creates novel ADL appearances (novel domains) from different existing activity modalities (source domains) inferred from video training data. Our framework computes human poses, heatmaps of body joints, and optical flow maps and uses them alongside the original RGB videos to learn the essence of source domains in order to generate completely new ADL domains. The model is optimized by maximizing the distance between the existing source appearances and the generated novel appearances while ensuring that the semantics of an activity is preserved through an additional classification loss. While source data multimodality is an important concept in this design, our setup does not rely on multi-sensor setups, (i.e., all source modalities are inferred from a single video only.) The newly created activity domains are then integrated in the training of the ADL classification networks, resulting in models far less susceptible to changes in data distributions. Extensive experiments on the Synthetic-to-Real benchmark Sims4Action demonstrate the potential of the domain generation paradigm for cross-domain ADL recognition, setting new state-of-the-art results. Our code is publicly available at https://github.com/Zrrr1997/syn2real_DG
翻译:外观变化等内容变化是活动识别模型现实世界应用中的一个关键挑战,活动识别模型从辅助机器人和智能家庭到智能车辆的驱动器观测,从辅助机器人和智能家庭到智能车辆的驱动器观测等,都是现实世界应用中的关键挑战。例如,模拟是经济数据收集的极好方法,合成到实时域转换导致在识别日常生活(ADLs)活动时精确度下降60%以上。我们应对这一挑战,引入一个活动域生成框架,从视频培训数据中推断出新的ADL外观(新元域),从现有活动模式(源域域域域)中产生新的ADL外观。我们的框架对人造、身体联合的热图和光学流图进行计算,并使用它们来学习源域的本质,以产生全新的ADL域域。通过尽可能扩大现有源外观和新出现外观之间的距离,同时确保通过额外分类损失来保持一项活动的语义结构。虽然源数据多模式(源域域域域域域)是一个重要概念,但我们的设置并不依赖于多传感器配置,(i)和光流流流流流流流流图设置,(iL)利用原始域设置的模型,从而导致新生成数据流域域域域域域域域域域域域域域域域域域域域域域域域域变变。