The continuous growth of data production in almost all scientific areas raises new problems in data access and management, especially in a scenario where the end-users, as well as the resources that they can access, are worldwide distributed. This work is focused on the data caching management in a Data Lake infrastructure in the context of the High Energy Physics field. We are proposing an autonomous method, based on Reinforcement Learning techniques, to improve the user experience and to contain the maintenance costs of the infrastructure.
翻译:几乎所有科学领域数据生产的持续增长在数据存取和管理方面都带来了新的问题,特别是在最终用户以及他们能够获取的资源分布在世界各地的情况下,这项工作的重点是高能物理领域数据湖基础设施中的数据缓存管理,我们提议一种基于强化学习技术的自主方法,以改善用户的经验,控制基础设施的维护费用。