利用强化学习在部分可观测环境中个人舒适度估算 (Personal Comfort Estimation in Partial Observable Environment using Reinforcement Learning)

The technology used in smart homes have improved to learn the user preferences from feedbacks in order to provide convenience to the user in the home environment. Most smart homes learn a uniform model to represent the thermal preference of user which generally fails when the pool of occupants includes people having different age, gender, and location. Having different thermal sensation for each user poses a challenge for the smart homes to learn a personalized preference for each occupant without forgetting the policy of others. A smart home with single optimal policy may fail to provide comfort when a new user with different preference is integrated in the home. In this paper, we propose POSHS, a Bayesian Reinforcement learning algorithm that can approximate the current occupant state in a partial observable environment using its thermal preference and then decide if its a new occupant or belongs to the pool of previously observed users. We then compare POSHS algorithm with an LSTM based algorithm to learn and estimate the current state of the occupant while also taking optimal actions to reduce the timesteps required to set the preferences. We perform these experiments with upto 5 simulated human models each based on hierarchical reinforcement learning. The results show that POSHS can approximate the current user state just from its temperature and humidity preference and also reduce the number of time-steps required to set optimal temperature and humidity by the human model in the presence of the smart home.

翻译：智能家庭所使用的技术已经改进,以学习用户对反馈的偏好,从而在家庭环境中为用户提供方便。大多数智能家庭都学习了一种代表用户热偏好的统一模式,这种模式在用户群中包括不同年龄、性别和地点的人时通常会失败。每个用户都有不同的热感,这对智能家庭来说是一个挑战,让每个用户都学习个性化偏好,而不会忘记他人的政策。智能家庭如果有一个具有单一最佳政策的智能家庭可能无法提供舒适。当家庭融入一个具有不同偏好的新用户时,我们在此文件中建议采用一种Bayesian加强学习算法,即一种可以使用其热偏好率在部分可观测环境中对当前占住状态进行近似,然后决定其新的占住状态或属于以前观察到的用户群。我们随后将POSSHS算法与基于LSTM的算法进行比较,以学习和估计占住家现状,同时采取最佳行动来缩短设定偏好时间段。我们用5个模拟人类模型进行这些实验,每个模型都以其精度强化程度为基础,学习其最优度和最优度最优度的温度。