维持分立概率分布的简单数据结构 (A Simple Data Structure for Maintaining a Discrete Probability Distribution)

We revisit the following problem: given a set of indices $S = \{1, \dots, n\}$ and weights $w_1, \dots, w_n \in \mathbb{R}_{> 0}$, provide samples from $S$ with distribution $p(i) = w_i / W$ where $W = \sum_j w_j$ gives the proper normalization. In the static setting, there is a simple data structure due to Walker called Alias Table that allows for samples to be drawn in constant time. A more challenging task is to maintain the distribution in a dynamic setting, where elements may be added or removed, or weights may change over time; here, existing solutions restrict the permissible weights, require rebuilding of the associated data structure after a number of updates, or are rather complex. In this paper, we describe, analyze, and engineer a simple data structure for maintaining a discrete probability distribution in the dynamic setting. Construction of the data structure for an arbitrary distribution takes time $O(n)$, sampling takes expected time $O(1)$, and updates of size $\Delta = O(W / n)$ can be processed in time $O(1)$. To evaluate the efficiency of the data structure we conduct an experimental study. The results suggest that the dynamic sampling performance is comparable to the static Alias Table with a minor slowdown.

翻译：我们重新研究以下问题:如果有一套指数(US)= ⁇ 1,\dots, n_美元和重量($w_1,\dots,\dots, w_n_n@in\mathbb{R ⁇ 0}$美元),我们从美元和美元中提供样本(US)= w_i/W$(W)= w_i)/W$(W) = sum_j_j_j美元,从而实现适当的正常化。在静态环境中,由于沃克称为Alias表格,有一个简单的数据结构,允许在固定时间里采集样本。更具有挑战性的任务是维持动态环境中的分布,在动态环境中可以添加或删除元素的分布,在动态环境中可以添加或删除元素,或者重量可能随时间变化而变化;在这里,现有的解决方案限制了允许的重量,需要经过一些更新后重建相关的数据结构。在本文件中,我们描述、分析和设计一个简单的数据结构,以保持动态环境中的离散概率分布。为任意分布的数据结构的建造需要时间(n),取样需要预期的时间(1美元(1美元),而要花一定的时间(1美元),或者1美元,或更新一个比小的时间,可以进行一个可比较的图像结构。