A classical problem in random number generation is the sampling of elements from a given discrete distribution. Formally, given a set of indices $S = \{1, \dots, n\}$ and sequence of weights $w_1, \dots, w_n \in \mathbb{R}^+$, the task is to provide samples from $S$ with distribution $p(i) = w_i / W$ where $W = \sum_j w_j$. A commonly accepted solution is Walker's Alias Table, which allows for each sample to be drawn in constant time. However, some applications correspond to a dynamic setting, where elements are inserted or removed, or weights change over time. Here, the Alias Table is not efficient, as it needs to be re-built whenever the underlying distribution changes. In this paper, we engineer a simple data structure for maintaining discrete probability distributions in the dynamic setting. Construction of the data structure is possible in time $O(n)$, sampling is possible in expected time $O(1)$, and an update of size $\Delta$ can be processed in time $O(\Delta n / W)$. As a special case, we maintain an urn containing $W$ marbles of $n$ colors where with each update $O(W / n)$ marbles can be added or removed in $O(1)$ time per update. To evaluate the efficiency of the data structure in practice we conduct an empirical study. The results suggest that the dynamic sampling performance is competitive with the static Alias Table. Compared to existing more complex dynamic solutions we obtain a sampling speed-up of up to half an order of magnitude.
翻译:暂无翻译