Recent research has demonstrated the capability of behavior signals captured by smartphones and wearables for longitudinal behavior modeling. However, there is a lack of a comprehensive public dataset that serves as an open testbed for fair comparison among algorithms. Moreover, prior studies mainly evaluate algorithms using data from a single population within a short period, without measuring the cross-dataset generalizability of these algorithms. We present the first multi-year passive sensing datasets, containing over 700 user-years and 497 unique users' data collected from mobile and wearable sensors, together with a wide range of well-being metrics. Our datasets can support multiple cross-dataset evaluations of behavior modeling algorithms' generalizability across different users and years. As a starting point, we provide the benchmark results of 18 algorithms on the task of depression detection. Our results indicate that both prior depression detection algorithms and domain generalization techniques show potential but need further research to achieve adequate cross-dataset generalizability. We envision our multi-year datasets can support the ML community in developing generalizable longitudinal behavior modeling algorithms.
翻译:最近的研究显示,智能手机和穿戴式纵向行为模型所捕捉到的行为信号的能力。然而,缺乏一个全面的公共数据集,作为公平比较算法的公开测试台。此外,先前的研究主要是在短时期内利用单个人群的数据评估算法,而没有测量这些算法的跨数据集的通用性。我们展示了第一个多年期被动感测数据集,其中包含700多个用户年数和497个独特的用户数据,这些数据来自移动和穿戴式传感器,以及一系列广泛的福利衡量标准。我们的数据集可以支持对行为模拟算法在不同用户和年份的通用性进行多重交叉数据集评估。作为起点,我们提供了18种算法的基准结果,用以测量抑郁症。我们的结果显示,先前的抑郁症检测算法和领域一般化技术都具有潜力,但需要进一步研究,才能实现充分的交叉数据集。我们设想的多年度数据集可以支持ML社区制定通用的长姿态行为模型。</s>