These are the lecture notes for the course CM0622 - Algorithms for Massive Data, Ca' Foscari University of Venice. The goal of this course is to introduce algorithmic techniques for dealing with massive data: data so large that it does not fit in the computer's memory. Broadly speaking, there are two main solutions to deal with massive data: (lossless) compressed data structures and (lossy) data sketches. These notes cover the latter topic: probabilistic filters, sketching under various metrics, Locality Sensitive Hashing, nearest neighbour search, algorithms on streams (pattern matching, counting).
翻译:这是威尼斯卡·佛斯卡里大学的CM0622-海量数据算法课程讲义。本课程的目标是介绍用于处理海量数据的算法技术:数据量过大,无法存储在计算机的内存中。广义上来说,处理海量数据有两种解决方案:(无损)压缩数据结构和(有损)数据草图。这些讲义涵盖了后一种方法:概率过滤器、各种度量下的草图、局部敏感哈希、最近邻居搜索、流算法(模式匹配、计数)。