## Abstract

Addresses the problem of approximate string matching on compressed text. We consider this problem for a text string described in terms of a collage system, which is a formal system proposed by T. Kida et al. (1999) that captures various dictionary-based compression methods. We present an algorithm that exploits bit-parallelism, assuming that our problem fits in a single machine word, e.g. (m - k)(k + 1) ≤ L, where m is the pattern length, k is the number of allowed errors and L is the length, in bits, of the machine word. For a class of simple collage systems, the algorithm runs in O(k^{2}(∥D∥ + |S|) + km) time using O(k^{2}∥D∥) space, where ∥D∥ is the size of dictionary D and |S| is the number of tokens in the sequence S. The LZ78 (Lempel-Ziv, 1978) and the LZW (Lempel-Ziv-Welch, 1984) compression methods are covered by this class. Since we can regard ∥D∥ + |S| as the compressed length, the time and space complexities are O(k^{2}n + km) and O(k^{2}n), respectively. For general k and m, they become O(k^{3}mn/L + km) and O(k^{3}mn/L). Thus, our algorithm is competitive to the algorithm proposed by J. Kärkkäinen, et al. (2000), which runs in O(km) time using O(kmn) space, when k = O(√L).

Original language | English |
---|---|

Title of host publication | Proceedings - 7th International Symposium on String Processing and Information Retrieval, SPIRE 2000 |

Publisher | Institute of Electrical and Electronics Engineers Inc. |

Pages | 221-228 |

Number of pages | 8 |

ISBN (Electronic) | 0769507468, 9780769507460 |

DOIs | |

Publication status | Published - 2000 |

Event | 7th International Symposium on String Processing and Information Retrieval, SPIRE 2000 - A Curuna, Spain Duration: Sep 27 2000 → Sep 29 2000 |

### Other

Other | 7th International Symposium on String Processing and Information Retrieval, SPIRE 2000 |
---|---|

Country/Territory | Spain |

City | A Curuna |

Period | 9/27/00 → 9/29/00 |

## All Science Journal Classification (ASJC) codes

- Information Systems
- Signal Processing
- Information Systems and Management