字符串的模糊分段

论文标题

字符串的模糊分段

Fuzzy Segmentations of a String

论文作者

Kostanyan, Armen, Harmandayan, Arevik

论文摘要

本文讨论了数据聚类问题的特定情况，在其中有必要在其中找到适当长度的相邻文本段的组，该长度与模糊模式相匹配，该模式表示为一系列模糊属性。为了解决这个问题，提出了一种用于查找足够数量解决方案的启发式算法。提出的算法的关键思想是使用前缀结构来跟踪将文本段映射到模糊属性的过程。文本分割问题的一个重要特殊情况是模糊字符串匹配问题，当相邻的文本段具有单位长度，因此模糊模式是文本字符的模糊属性的序列。事实证明，在这种情况下，启发式分割算法可以找到与模糊模式相匹配的所有文本段。最后，我们考虑了基于模糊模式对整个文本进行最佳分割的问题，该模式使用动态编程方法解决。关键字：模糊聚类，模糊字符串匹配，近似字符串匹配

This article discusses a particular case of the data clustering problem, where it is necessary to find groups of adjacent text segments of the appropriate length that match a fuzzy pattern represented as a sequence of fuzzy properties. To solve this problem, a heuristic algorithm for finding a sufficiently large number of solutions is proposed. The key idea of the proposed algorithm is the use of the prefix structure to track the process of mapping text segments to fuzzy properties. An important special case of the text segmentation problem is the fuzzy string matching problem, when adjacent text segments have unit length and, accordingly, the fuzzy pattern is a sequence of fuzzy properties of text characters. It is proven that the heuristic segmentation algorithm in this case finds all text segments that match the fuzzy pattern. Finally, we consider the problem of a best segmentation of the entire text based on a fuzzy pattern, which is solved using the dynamic programming method. Keywords: fuzzy clustering, fuzzy string matching, approximate string matching

下载PDF全文

下载文献需遵守相关版权规定

论文标题