Recording the dynamics of unscripted human interactions in the wild is challenging due to the delicate trade-offs between several factors: participant privacy, ecological validity, data fidelity, and logistical overheads. To address these, following a 'datasets for the community by the community' ethos, we propose the Conference Living Lab (ConfLab): a new concept for multimodal multisensor data collection of in-the-wild free-standing social conversations. For the first instantiation of ConfLab described here, we organized a real-life professional networking event at a major international conference. Involving 48 conference attendees, the dataset captures a diverse mix of status, acquaintance, and networking motivations. Our capture setup improves upon the data fidelity of prior in-the-wild datasets while retaining privacy sensitivity: 8 videos (1920x1080, 60 fps) from a non-invasive overhead view, and custom wearable sensors with onboard recording of body motion (full 9-axis IMU), privacy-preserving low-frequency audio (1250 Hz), and Bluetooth-based proximity. Additionally, we developed custom solutions for distributed hardware synchronization at acquisition, and time-efficient continuous annotation of body keypoints and actions at high sampling rates. Our benchmarks showcase some of the open research tasks related to in-the-wild privacy-preserving social data analysis: keypoints detection from overhead camera views, skeleton-based no-audio speaker detection, and F-formation detection.
翻译:记录野外未定的人类互动动态具有挑战性,因为以下几个因素之间的微妙取舍:参与者隐私、生态有效性、数据忠诚以及后勤管理。为了解决这些问题,在社区特质为社区提供的数据集之后,我们提议会议生活实验室(ConfLab):一个在野外自由社交对话中收集多式多传感器数据的新概念。关于Fonflab的第一次即时反应,我们在一次重大国际会议上组织了一次真正的专业联网活动。有48名与会者参加,数据集捕捉了不同形式的身份、熟人和联网动机的混合。为了解决这些问题,我们安装了“社区特质”数据库,同时保持了隐私敏感性:8部视频(1920x1080、60fps)来自非侵入性高端的多传感器,以及定制的惯用感传感器,在机上记录身体运动(全为9Axxis IMU),隐私保存低频相机(1250Hz),以及蓝牙图像检测、高清晰度检测、高清晰度检测、高清晰度分析等标准。