Binary data are very common in many applications, and are typically simulated independently via a Bernoulli distribution with a single probability of success. However, this is not always the physical truth, and the probability of a success can be dependent on the outcome successes of past events. Presented here is a novel approach for simulating binary data where, for a chain of events, successes (1) and failures (0) cluster together according to a distance correlation. The structure is derived from de Bruijn Graphs - a directed graph, where given a set of symbols, V, and a 'word' length, m, the nodes of the graph consist of all possible sequences of V of length m. De Bruijn Graphs are a generalisation of Markov chains, where the 'word' length controls the number of states that each individual state is dependent on. This increases correlation over a wider area. To quantify how clustered a sequence generated from a de Bruijn process is, the run lengths of letters are observed along with run length properties.
翻译:二进制数据在许多应用中非常常见,通常通过伯努利分布法独立模拟,且有单一成功概率。 但是,这并不总是物理真实性, 成功概率可能取决于过去事件的结果成功与否。 在此展示了一种新型的模拟二进制数据方法, 对于一连串事件, 成功(1) 和失败( 0) 组群, 根据距离相关关系, 其结构是来自 de Bruijn 图形。 结构来自 de Bruijn 图形 - 直方向图, 给出了一系列符号, V 和 “ 字” 长度, m, 图形的节点由所有可能的 m 长度的 V 序列组成 。 De Bruijn 图形是Markov 链条的常规化, 其中“ 字” 长度控制了每个单个状态所依赖的状态数量。 这增加了一个大区域的关联性。 要量化从 de Bruijn 进程生成的序列是如何组合的, 字符的长度与运行长度特性一起观察到。