This article discusses a particular case of the data clustering problem, where it is necessary to find groups of adjacent text segments of the appropriate length that match a fuzzy pattern represented as a sequence of fuzzy properties. To solve this problem, a heuristic algorithm for finding a sufficiently large number of solutions is proposed. The key idea of the proposed algorithm is the use of the prefix structure to track the process of mapping text segments to fuzzy properties. An important special case of the text segmentation problem is the fuzzy string matching problem, when adjacent text segments have unit length and, accordingly, the fuzzy pattern is a sequence of fuzzy properties of text characters. It is proven that the heuristic segmentation algorithm in this case finds all text segments that match the fuzzy pattern. Finally, we consider the problem of a best segmentation of the entire text based on a fuzzy pattern, which is solved using the dynamic programming method. Keywords: fuzzy clustering, fuzzy string matching, approximate string matching
翻译:文章讨论了数据分组问题的一个特定案例, 需要在此找到相邻文本段的组群, 相邻文本段相匹配的适当长度, 与模糊的特性序列相匹配。 为了解决这个问题, 提出了寻找足够多的解决方案的粗略算法 。 拟议算法的关键理念是使用前缀结构将文本段绘图过程跟踪到模糊的属性。 文本分割问题的一个重要特殊案例是模糊的字符串匹配问题, 当相邻文本段有单位长度, 因此, 模糊的模式是文本字符的模糊特性序列 。 事实证明, 在本案中, 粗略的分解算法能找到与模糊模式匹配的所有文本段 。 最后, 我们考虑了基于模糊模式对全部文本进行最佳分解的问题, 这个问题是使用动态编程法解决的 。 关键词: 模糊的组合、 模糊的字符串匹配、 近似的字符串匹配 。