Advanced multimodal AI agents can now collaborate with users to solve challenges in the world. Yet, these emerging contextual AI systems rely on explicit communication channels between the user and system. We hypothesize that implicit communication of the user's interests and intent would reduce friction and improve user experience in contextual AI. In this work, we explore the potential of wearable eye tracking to convey user attention to the agents. We measure the eye tracking signal quality requirements to effectively map gaze traces to physical objects, then conduct experiments to provide visual scanpath history as additional context when querying multimodal agents. Our results show that eye tracking provides high value as a user attention signal and can convey information about the user's current task and interests to the agent.
翻译:暂无翻译