These are the lecture notes for the course CM0622 - Algorithms for Massive Data, Ca' Foscari University of Venice. The goal of this course is to introduce algorithmic techniques for dealing with massive data: data so large that it does not fit in the computer's memory. Broadly speaking, there are two main solutions to deal with massive data: (lossless) compressed data structures and (lossy) data sketches. These notes cover the latter topic: probabilistic filters, sketching under various metrics, Locality Sensitive Hashing, nearest neighbour search, algorithms on streams (pattern matching, counting).
翻译:这是CM0622课程(威尼斯Ca' Foscari大学)的讲义,旨在介绍处理大规模数据的算法技术:数据量太大,无法适应计算机内存。广义上,处理大规模数据有两个主要解决方案:(无损)压缩数据结构和(有损)数据草图。这些笔记涵盖了后者:概率过滤器,基于各种度量的草图,局部敏感哈希,最近的邻居搜索,流算法(模式匹配,计数)。