Smart services are an important element of the smart cities and the Internet of Things (IoT) ecosystems where the intelligence behind the services is obtained and improved through the sensory data. Providing a large amount of training data is not always feasible; therefore, we need to consider alternative ways that incorporate unlabeled data as well. In recent years, Deep reinforcement learning (DRL) has gained great success in several application domains. It is an applicable method for IoT and smart city scenarios where auto-generated data can be partially labeled by users' feedback for training purposes. In this paper, we propose a semi-supervised deep reinforcement learning model that fits smart city applications as it consumes both labeled and unlabeled data to improve the performance and accuracy of the learning agent. The model utilizes Variational Autoencoders (VAE) as the inference engine for generalizing optimal policies. To the best of our knowledge, the proposed model is the first investigation that extends deep reinforcement learning to the semi-supervised paradigm. As a case study of smart city applications, we focus on smart buildings and apply the proposed model to the problem of indoor localization based on BLE signal strength. Indoor localization is the main component of smart city services since people spend significant time in indoor environments. Our model learns the best action policies that lead to a close estimation of the target locations with an improvement of 23% in terms of distance to the target and at least 67% more received rewards compared to the supervised DRL model.
翻译:智能服务是智能城市和Tings Internet(IoT)生态系统中的一个重要元素。 智能服务背后的情报是通过感官数据获得和改进的。 提供大量培训数据并非总是可行; 因此, 我们需要考虑其他方法, 包括未贴标签的数据。 近年来, 深强化学习( DRL) 在许多应用领域取得了巨大成功 。 这是IoT 和智能城市情景的一个适用方法, 其中自动生成的数据可以部分由用户反馈标记, 用于培训目的 。 在本文中, 我们提出了一个半监督的深层强化学习模型, 适合智能城市应用程序, 因为它消耗了标签和未贴标签的数据, 以提高学习代理的性能和准确性; 因此, 模型使用Variational Autencorders(VAE) 作为推导引力, 推广最佳政策。 根据我们的知识, 提议的模型是将深度强化学习模型推广到半超超强模式。 作为对智能城市应用的案例研究, 我们注重智能建筑, 将拟议模型应用于智能城市应用到智能改造的智能模型, 比较智能化目标点, 智能化的智能化模型, 智能化模型在本地环境中, 学习了我们的主要选择环境中, 学习了我们的主要选择点, 。 学习了我们最深点 学习了精度 。