Mobile health (mHealth) apps such as menstrual trackers provide a rich source of self-tracked health observations that can be leveraged for health-relevant research. However, such data streams have questionable reliability since they hinge on user adherence to the app. Therefore, it is crucial for researchers to separate true behavior from self-tracking artifacts. By taking a machine learning approach to modeling self-tracked cycle lengths, we can both make more informed predictions and learn the underlying structure of the observed data. In this work, we propose and evaluate a hierarchical, generative model for predicting next cycle length based on previously-tracked cycle lengths that accounts explicitly for the possibility of users skipping tracking their period. Our model offers several advantages: 1) accounting explicitly for self-tracking artifacts yields better prediction accuracy as likelihood of skipping increases; 2) because it is a generative model, predictions can be updated online as a given cycle evolves, and we can gain interpretable insight into how these predictions change over time; and 3) its hierarchical nature enables modeling of an individual's cycle length history while incorporating population-level information. Our experiments using mHealth cycle length data encompassing over 186,000 menstruators with over 2 million natural menstrual cycles show that our method yields state-of-the-art performance against neural network-based and summary statistic-based baselines, while providing insights on disentangling menstrual patterns from self-tracking artifacts. This work can benefit users, mHealth app developers, and researchers in better understanding cycle patterns and user adherence.
翻译:月经追踪器等流动健康(健康)应用程序提供了丰富的自我跟踪健康观测来源,可用于开展与健康有关的研究。然而,这些数据流的可靠性令人怀疑,因为它们取决于用户对应用程序的遵守。因此,研究人员必须把真实行为与自我跟踪工艺品区分开来。通过采用机器学习方法来模拟自跟踪周期长度,我们既可以作出更知情的预测,也可以了解观察到的数据的基本结构。在这项工作中,我们提出并评价一个等级化、基因化的合规性模型,用于预测基于以往跟踪周期长度的下一个周期长度,明确考虑到用户跳过跟踪周期的可能性。我们的模型提供若干优势:(1) 明确进行自我跟踪工艺型的会计能够产生更好的预测准确性,作为跳过的可能性;(2) 因为它是一个基因化模型,预测可以随着特定周期的演变而更新在线数据,我们可以对这些预测如何随时间而变化获得解释性洞察;(3) 其等级性能使得个人周期长度的周期性模型能够建模,同时纳入人口水平的用户对时间的跟踪。我们的模型提供了若干种优势:(1) 我们的周期性周期性数据,在提供18岁前的性别统计学的周期内的数据,在18年的性别结构中,让我们的周期里的数据在18年的周期内进行中进行。