We aim to investigate the potential impacts of smart homes on human behavior. To this end, we simulate a series of human models capable of performing various activities inside a reinforcement learning-based smart home. We then investigate the possibility of human behavior being altered as a result of the smart home and the human model adapting to one-another. We design a semi-Markov decision process human task interleaving model based on hierarchical reinforcement learning that learns to make decisions to either pursue or leave an activity. We then integrate our human model in the smart home which is based on Q-learning. We show that a smart home trained on a generic human model is able to anticipate and learn the thermal preferences of human models with intrinsic rewards similar to the generic model. The hierarchical human model learns to complete each activity and set optimal thermal settings for maximum comfort. With the smart home, the number of time steps required to change the thermal settings are reduced for the human models. Interestingly, we observe that small variations in the human model reward structures can lead to the opposite behavior in the form of unexpected switching between activities which signals changes in human behavior due to the presence of the smart home.
翻译:我们的目标是调查智能家庭对人类行为的潜在影响。 为此, 我们模拟一系列能够在一个强化学习型智能家庭内开展各种活动的人类模型。 然后我们调查人类行为因智能家庭和人类模型适应另一个人类模型而改变的可能性。 我们设计了一个半马尔科夫决策程序, 以等级强化学习为基础, 学习如何决定追求或退出一项活动。 然后, 我们把人类模型融入基于Q- 学习的智能家庭。 我们显示, 受过通用人类模型培训的智能家庭能够预测和学习人类模型的热偏好, 其内在奖赏与通用模型相似。 等级人类模型学会完成每一项活动, 并设定最佳热环境以获得最大舒适。 在智能家庭, 改变热环境所需的时间减少。 有趣的是, 我们观察到, 人类模型奖赏结构的微小变化可以导致不同的行为, 其形式是意外地转换活动, 显示人类行为因智能家庭的存在而发生变化。