In this paper, we focus on the problem of modeling dynamic geo-human interactions in streams for online POI recommendations. Specifically, we formulate the in-stream geo-human interaction modeling problem into a novel deep interactive reinforcement learning framework, where an agent is a recommender and an action is a next POI to visit. We uniquely model the reinforcement learning environment as a joint and connected composition of users and geospatial contexts (POIs, POI categories, functional zones). An event that a user visits a POI in stream updates the states of both users and geospatial contexts; the agent perceives the updated environment state to make online recommendations. Specifically, we model a mixed-user event stream by unifying all users, visits, and geospatial contexts as a dynamic knowledge graph stream, in order to model human-human, geo-human, geo-geo interactions. We design an exit mechanism to address the expired information challenge, devise a meta-path method to address the recommendation candidate generation challenge, and develop a new deep policy network structure to address the varying action space challenge, and, finally, propose an effective adversarial training method for optimization. Finally, we present extensive experiments to demonstrate the enhanced performance of our method.
翻译:在本文中,我们侧重于在流中模拟动态地球-人类互动的问题,以便在线宣传倡议建议。具体地说,我们将流中地球-人类互动模拟问题纳入一个全新的深层互动强化学习框架,其中代理是一个建议者,一项行动是下一个POI访问的POI。我们以独特的模式将强化学习环境作为用户和地理空间环境(POI、POI类别、功能区)的联结和相互联系的组合。一个用户访问流中的POI更新用户和地理空间环境状况的活动;该代理了解最新环境状况,以提出在线建议。具体地说,我们通过将所有用户、访问和地理空间环境统一为动态知识图表流,将混合用户活动作为动态知识流的模式,以模拟人类、地球-人类、地理-地球互动。我们设计了一个退出机制,以应对过期的信息挑战,设计了应对建议候选人生成挑战的元式方法,并开发了一个新的深度政策网络结构,以应对不同的行动空间挑战,最后,提出了有效的对抗性培训优化方法。我们提出了广泛的实验,以展示我们方法的强化性表现。