The complexity of a string can be measured by the richness of its substrings. For example in genetics a region of DNA is considered to be highly informative if many of the possible substrings of a certain length actually occur. Abstractly this kind of complexity is captured by the standard string complexity function. When dealing with binary strings, we have the additional feature that substrings can be viewed as subsets of an index set. This allows us to apply measures of subset complexity such as VC dimension. In this paper we define a notion of VC dimension for binary strings and investigate the structure of strings of finite VC dimension.
翻译:字符串的复杂性可以用其子字符串的丰富性来衡量。 例如,在遗传学中,如果实际出现许多可能具有一定长度的子字符串,DNA区域被认为是信息量很高的区域。 简单来说,这种复杂性被标准字符串复杂性函数所捕捉。 在处理二进制字符串时,我们还有另一个特性, 即子字符串可以被视为索引集子集的子字符串。 这使我们能够应用子集复杂性的量度, 如 VC 维度。 在本文中, 我们定义了二进制字符串的 VC 维度概念, 并调查有限 VC 维度的字符串结构 。