The technologies used in smart homes have recently improved to learn the user preferences from feedback in order to enhance the user convenience and quality of experience. Most smart homes learn a uniform model to represent the thermal preferences of users, which generally fails when the pool of occupants includes people with different sensitivities to temperature, for instance due to age and physiological factors. Thus, a smart home with a single optimal policy may fail to provide comfort when a new user with a different preference is integrated into the home. In this paper, we propose a Bayesian Reinforcement learning framework that can approximate the current occupant state in a partially observable smart home environment using its thermal preference, and then identify the occupant as a new user or someone is already known to the system. Our proposed framework can be used to identify users based on the temperature and humidity preferences of the occupant when performing different activities to enable personalization and improve comfort. We then compare the proposed framework with a baseline long short-term memory learner that learns the thermal preference of the user from the sequence of actions which it takes. We perform these experiments with up to 5 simulated human models each based on hierarchical reinforcement learning. The results show that our framework can approximate the belief state of the current user just by its temperature and humidity preferences across different activities with a high degree of accuracy.
翻译:智能家庭使用的技术最近有所改进,以学习用户对反馈的偏好,从而提高用户的方便性和经验质量。大多数智能家庭学习一种统一模式,以代表用户的热偏好,如果用户群中包括对温度有不同敏感性的人,例如由于年龄和生理因素等原因,这种选择通常会失败。因此,智能家庭使用的技术最近有所改善,以便学习用户对反馈的偏好,从而了解用户对用户的偏好。在本文件中,我们提议一个巴耶斯强化学习框架,利用热偏好,在部分可观测的智能家庭环境中,可以近似当前占住状态,然后确定占住者为新的用户或系统已经知道的人。我们提议的框架可以用来根据住家的温度和湿度偏好度来识别用户,以便个人化,改善舒适度。我们然后将拟议的框架与一个基线短期记忆学习器比较,以了解用户对所采取行动的顺序的热偏好程度。我们进行这些实验时,每个实验最多要用5个模拟的人类模型,以等级加固度为新的用户或系统已经知道的某人。我们提议的框架的高度偏好度,可以显示我们当前最接近的温度。