<technical report>
Bit-parallel approach to approximate string matching in compressed texts

Creator
Language
Publisher
Date
Source Title
Vol
Publication Type
Access Rights
Related DOI
Related URI
Relation
Abstract In this paper, we address the problem of approximate string matching on compressed text. We consider this problem for a text string described in terms of collage system, which is a formal system propo...sed by Kida et al. (1999) that captures various dictionary-based compression methods. We present an algorithm that exploits bit-parallelism, assuming that our problem fits in a single machine word, e.g., $ (m − k + 1)(k + 1) leq L $, where m is the pattern length, $ k $ is the number of allowed errors, and $ L $ is the length in bits of the machine word. For a class of simple collage systems, the algorithm runs in $ O(k^2 ( left |D\right |+ mid S mid)+ km) $ time using $ O(k^2 ( left |D\right |) $ space, where $ left |D\right | $ is the size of dictionary $ D $ and $ mid S mid $ is the number of tokens in $ S $. The LZ78 and the LZW compression methods are covered by this class. Since we can regard $ n = left |D\right | + mid S mid $ as the compressed length, the time and the space complexities are $ O(k^2n + km) $ and $ O(k^2n) $, respectively.show more

Hide fulltext details.

gz trcs174.ps gz 360 KB 47  
pdf trcs174 pdf 264 KB 159  

Details

Record ID
Peer-Reviewed
Type
Created Date 2009.04.22
Modified Date 2018.08.31

People who viewed this item also viewed