Multimodal health sensing offers rich behavioral signals for assessing mental health, yet translating these numerical time-series measurements into natural language remains challenging. Current LLMs cannot natively ingest long-duration sensor streams, and paired sensor-text datasets are scarce. To address these challenges, we introduce LENS, a framework that aligns multimodal sensing data with language models to generate clinically grounded mental-health narratives. LENS first constructs a large-scale dataset by transforming Ecological Momentary Assessment (EMA) responses related to depression and anxiety symptoms into natural-language descriptions, yielding over 100,000 sensor-text QA pairs from 258 participants. To enable native time-series integration, we train a patch-level encoder that projects raw sensor signals directly into an LLM's representation space. Our results show that LENS outperforms strong baselines on standard NLP metrics and task-specific measures of symptom-severity accuracy. A user study with 13 mental-health professionals further indicates that LENS-produced narratives are comprehensive and clinically meaningful. Ultimately, our approach advances LLMs as interfaces for health sensing, providing a scalable path toward models that can reason over raw behavioral signals and support downstream clinical decision-making.
翻译:多模态健康感知为心理健康评估提供了丰富的行为信号,然而将这些数值化时间序列测量结果转化为自然语言仍具挑战性。当前大型语言模型(LLM)无法原生处理长时程传感器数据流,且配对的传感器-文本数据集稀缺。为应对这些挑战,我们提出了LENS框架,通过将多模态感知数据与语言模型对齐,生成具有临床依据的心理健康叙事。LENS首先通过将抑郁和焦虑症状相关的生态瞬时评估(EMA)响应转化为自然语言描述,构建了大规模数据集,从258名参与者中获得了超过10万个传感器-文本问答对。为实现原生时间序列集成,我们训练了补丁级编码器,将原始传感器信号直接投影至LLM的表征空间。实验结果表明,LENS在标准自然语言处理指标及症状严重程度准确性的任务特定度量上均优于强基线模型。一项涉及13位心理健康专家的用户研究进一步证实,LENS生成的叙事具有全面性和临床意义。最终,我们的研究推动了LLM作为健康感知接口的发展,为构建能够推理原始行为信号并支持下游临床决策的模型提供了可扩展路径。