An immutable multi-map is a many-to-many thread-friendly map data structure with expected fast insert and lookup operations. This data structure is used for applications processing graphs or many-to-many relations as applied in static analysis of object-oriented systems. When processing such big data sets the memory overhead of the data structure encoding itself is a memory usage bottleneck. Motivated by reuse and type-safety, libraries for Java, Scala and Clojure typically implement immutable multi-maps by nesting sets as the values with the keys of a trie map. Like this, based on our measurements the expected byte overhead for a sparse multi-map per stored entry adds up to around 65B, which renders it unfeasible to compute with effectively on the JVM. In this paper we propose a general framework for Hash-Array Mapped Tries on the JVM which can store type-heterogeneous keys and values: a Heterogeneous Hash-Array Mapped Trie (HHAMT). Among other applications, this allows for a highly efficient multi-map encoding by (a) not reserving space for empty value sets and (b) inlining the values of singleton sets while maintaining a (c) type-safe API. We detail the necessary encoding and optimizations to mitigate the overhead of storing and retrieving heterogeneous data in a hash-trie. Furthermore, we evaluate HHAMT specifically for the application to multi-maps, comparing them to state-of-the-art encodings of multi-maps in Java, Scala and Clojure. We isolate key differences using microbenchmarks and validate the resulting conclusions on a real world case in static analysis. The new encoding brings the per key-value storage overhead down to 30B: a 2x improvement. With additional inlining of primitive values it reaches a 4x improvement.
翻译:无法变换的多映射是一个多到多线索友好的地图数据结构, 并预期会快速插入和查找操作 。 此数据结构用于应用处理图表或多到多种关系图的应用程序, 用于对对象导向系统进行静态分析。 当处理这样的大数据时, 数据结构编码本身的内存管理是内存使用瓶颈。 受再利用和类型安全驱动的驱动, Java、 Scala 和 Clojure 的图书馆通常会使用不可变的多线索数据结构, 与三角地图的键一起, 嵌入不可变的多线索。 类似地, 根据我们的测量, 每个存储条目的稀释多图或多图的预留图, 添加到大约65B, 这使得它无法在 JVM 上有效地拼凑。 在 JVM 上的 Hash- Array Mapped Tri 库中, 我们提议一个总框架, 存储类型有缺异性的关键键和值 : 肝脏变变变, 将多变变( HHAMTri) 和其他应用程序中, 将一个高效的硬化、 预化、 预置的硬化、 等的硬化、 数据分析, 等的硬化、 的硬化、 等的硬化、 等的硬化、 等的硬化、 等的硬化、 等的硬化、 。