Behavioral risk factors, i.e., smoking, poor nutrition, alcohol misuse, and physical inactivity (SNAP), are leading contributors to chronic diseases and healthcare costs worldwide. Their prevalence is shaped %not only by demographic characteristics %but and also by contextual ones such as socioeconomic and occupational environments. In this study, we leverage data from the Italian health and behavioral surveillance system PASSI to model SNAP behaviors through a Bayesian framework that integrates textual information on occupations. We use Structural Topic Modeling (STM) to cluster free-text job descriptions into latent occupational groups, which inform mixture weights in a multivariate ordered probit model. Covariate effects are allowed to vary across occupational clusters and evolve over time. To enhance interpretability and variable selection, we impose non-local spike-and-slab priors on regression coefficients. Finally, an online learning algorithm based on sequential Monte Carlo enables efficient updating as new data become available. This dynamic, scalable, and interpretable approach permits observing how occupational contexts modulate the impact of socio-demographic factors on health behaviors, providing valuable insights for targeted public health interventions.
翻译:行为风险因素,即吸烟、不良营养、酒精滥用和缺乏身体活动(SNAP),是全球慢性疾病和医疗成本的主要诱因。其流行程度不仅受人口特征影响,还受到社会经济和职业环境等情境因素的塑造。在本研究中,我们利用意大利健康与行为监测系统PASSI的数据,通过一个整合职业文本信息的贝叶斯框架对SNAP行为进行建模。我们采用结构主题模型(STM)将自由文本的职业描述聚类为潜在职业群体,这些群体为多元有序概率模型中的混合权重提供信息。协变量效应允许在不同职业聚类间变化并随时间演化。为增强可解释性和变量选择,我们对回归系数施加非局部尖峰-厚板先验。最后,基于序列蒙特卡洛的在线学习算法能够在获得新数据时实现高效更新。这种动态、可扩展且可解释的方法能够观察职业环境如何调节社会人口因素对健康行为的影响,为针对性公共卫生干预提供有价值的见解。