These are the lecture notes for the course CM0622 - Algorithms for Massive Data, Ca' Foscari University of Venice. The goal of this course is to introduce algorithmic techniques for dealing with massive data: data so large that it does not fit in the computer's memory. Broadly speaking, there are two main solutions to deal with massive data: (lossless) compressed data structures and (lossy) data sketches. These notes cover the latter topic: probabilistic filters, sketching under various metrics, Locality Sensitive Hashing, nearest neighbour search, algorithms on streams (pattern matching, counting).
翻译:这些是威尼斯Ca'Foscari大学Ca' Foscari大学的CM0622-大规模数据算法课程的讲课笔记,其目的是引进处理大规模数据的算法技术:大到无法在计算机记忆中保存的大数据。广义而言,处理大规模数据有两个主要解决办法:(无损失的)压缩数据结构和(亏损的)数据草图。这些笔记涉及后一个主题:概率过滤器、根据各种标准绘制草图、地方敏感散列、近邻搜索、溪流算法(模式匹配、计数)。