Vivan hashing -- -- 改用在Fly上公布数据并改用Skew(扩展版) (VIP Hashing -- Adapting to Skew in Popularity of Data on the Fly (extended version))

All data is not equally popular. Often, some portion of data is more frequently accessed than the rest, which causes a skew in popularity of the data items. Adapting to this skew can improve performance, and this topic has been studied extensively in the past for disk-based settings. In this work, we consider an in-memory data structure, namely hash table, and show how one can leverage the skew in popularity for higher performance. Hashing is a low-latency operation, sensitive to the effects of caching, branch prediction, and code complexity among other factors. These factors make learning in-the-loop especially challenging as the overhead of performing any additional operations can be significant. In this paper, we propose VIP hashing, a fully online hash table method, that uses lightweight mechanisms for learning the skew in popularity and adapting the hash table layout. These mechanisms are non-blocking, and their overhead is controlled by sensing changes in the popularity distribution to dynamically switch-on/off the learning mechanism as needed. We tested VIP hashing against a variety of workloads generated by Wiscer, a homegrown hashing measurement tool, and find that it improves performance in the presence of skew (22% increase in fetch operation throughput for a hash table with one million keys under low skew, 77% increase under medium skew) while being robust to insert and delete operations, and changing popularity distribution of keys. We find that VIP hashing reduces the end-to-end execution time of TPC-H query 9, which is the most expensive TPC-H query, by 20% under medium skew.

翻译：并非所有数据都同样受欢迎。通常, 某些部分的数据访问频率比其它部分更频繁, 这导致数据项目受欢迎程度下降。适应此片可提高性能, 而这个话题在过去曾为基于磁盘的设置进行了广泛研究。在这项工作中, 我们考虑一个模拟数据结构, 即散列表, 并显示如何为更高的性能利用这种受欢迎程度。散列是一个低延迟操作, 敏感于缓冲、分支预测和代码复杂性的影响。这些因素使得当期学习特别具有挑战性, 因为执行任何额外操作的顶部可能很重要。在本文中, 我们提议要使用全线性散列表方法, 使用轻量机制来学习受欢迎程度的斜度, 即散列表。这些机制是无阻力的, 其管理由感应感应感地显示性能分布的变化, 以动态的开关、分支预测和代码的复杂性复杂。我们测试了贵宾在进行一系列的递增量任务中, 在Wisererer公司下, 的递增量运行中, 正在不断测量。