In this paper, we propose EmbodiedSense, a sensing system based on commercial earphones, which enables fine-grained activity logs using existing sensors. The activity logs record both user activities and the scenario in which the activities took place, benefiting detailed behavior understanding. By understanding both the user and the environment, EmbodiedSense addresses three main challenges: the limited recognition capability caused by information-hungry configurations (i.e., limited sensors available), the ineffective fusion to extract ambient information such as contextual scenarios, and the interference from ambient noise. Specifically, EmbodiedSense consists of a context-aware scenario recognition module and spatial-aware activity detection, which is further integrated with other attributes by expert knowledge. We implement our system on commercial earphones equipped with binaural microphones and an Inertial Measurement Unit (IMU). By distinguishing usage scenarios and identifying the source of sounds, EmbodiedSense enables fine-grained activity logs in a zero-shot manner (evaluated with up to 41 categories) and outperforms strong baselines like ImageBind-LLM by 38% F1-score. Extensive evaluations demonstrate that EmbodiedSense is a promising solution for long-term and short-term activity logs and provides significant benefits in monitoring the wearer's daily life.
翻译:暂无翻译