The XRootD system is used to transfer, store, and cache large datasets from high-energy physics (HEP). In this study we focus on its capability as distributed on-demand storage cache. Through exploring a large set of daily log files between 2020 and 2021, we seek to understand the data access patterns that might inform future cache design. Our study begins with a set of summary statistics regarding file read operations, file lifetimes, and file transfers. We observe that the number of read operations on each file remains nearly constant, while the average size of a read operation grows over time. Furthermore, files tend to have a consistent length of time during which they remain open and are in use. Based on this comprehensive study of the cache access statistics, we developed a cache simulator to explore the behavior of caches of different sizes. Within a certain size range, we find that increasing the XRootD cache size improves the cache hit rate, yielding faster overall file access. In particular, we find that increase the cache size from 40TB to 56TB could increase the hit rate from 0.62 to 0.89, which is a significant increase in cache effectiveness for modest cost.
翻译:XROotD 系统用于传输、存储和缓存高能物理(HEP)的大型数据集。 在这项研究中,我们侧重于其按需分发存储缓存的能力。 通过探索2020至2021年之间大量每日日志文件,我们寻求了解数据访问模式,以在未来缓存设计中提供信息。我们的研究首先从一系列关于文件阅读操作、文件寿命和文件传输的简要统计数据开始。我们发现,每个文件的读数几乎保持不变,而读数的平均规模随着时间的增长而增长。此外,文件往往有持续的时间,在它们按需发放的存储缓存缓存缓存中,并且仍在使用中。基于对缓存存存数据的全面研究,我们开发了一个缓存模拟器,以探索不同大小的缓存行为。在一定范围内,我们发现,增加XROotD缓存的大小可以提高缓存率,从而更快地获得总体文件访问。我们发现,将缓存量规模从40TB增加到56TB的平均数。此外,文件的缓存量从0.62提高到0.89,这大大提高了缓存率。