Simple linear-time off-line text compression by longest-first substitution

Eyosuke Nakamura, Hideo Bannai, Shunsuke Ineriaga, Masayuki Takeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

We consider grammar based text compression with longest first substitution where non-overlapping occurrences of a longest repeating substring of the input text are replaced by a new non-terminal symbol. We present a new text compression algorithm by simplifying the algorithm presented in [4]. We give a new formulation of the correctness proof introducing the sparse lazy suffix tree data structure. We also present another type of longest first substitution strategy that allows better compression. We show results of preliminary experiments comparing grammar sizes of the two versions of the longest first strategy and the most frequent strategy.

Original languageEnglish
Title of host publicationProceedings - DCC 2007: 2007 Data Compression Conference
Pages123-132
Number of pages10
DOIs
Publication statusPublished - 2007
EventDCC 2007: 2007 Data Compression Conference - Snowbird, UT, United States
Duration: Mar 27 2007Mar 29 2007

Other

OtherDCC 2007: 2007 Data Compression Conference
CountryUnited States
CitySnowbird, UT
Period3/27/073/29/07

Fingerprint

Substitution reactions
Data structures
Experiments

All Science Journal Classification (ASJC) codes

  • Electrical and Electronic Engineering
  • Hardware and Architecture

Cite this

Nakamura, E., Bannai, H., Ineriaga, S., & Takeda, M. (2007). Simple linear-time off-line text compression by longest-first substitution. In Proceedings - DCC 2007: 2007 Data Compression Conference (pp. 123-132). [4148751] https://doi.org/10.1109/DCC.2007.70

Simple linear-time off-line text compression by longest-first substitution. / Nakamura, Eyosuke; Bannai, Hideo; Ineriaga, Shunsuke; Takeda, Masayuki.

Proceedings - DCC 2007: 2007 Data Compression Conference. 2007. p. 123-132 4148751.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nakamura, E, Bannai, H, Ineriaga, S & Takeda, M 2007, Simple linear-time off-line text compression by longest-first substitution. in Proceedings - DCC 2007: 2007 Data Compression Conference., 4148751, pp. 123-132, DCC 2007: 2007 Data Compression Conference, Snowbird, UT, United States, 3/27/07. https://doi.org/10.1109/DCC.2007.70
Nakamura E, Bannai H, Ineriaga S, Takeda M. Simple linear-time off-line text compression by longest-first substitution. In Proceedings - DCC 2007: 2007 Data Compression Conference. 2007. p. 123-132. 4148751 https://doi.org/10.1109/DCC.2007.70
Nakamura, Eyosuke ; Bannai, Hideo ; Ineriaga, Shunsuke ; Takeda, Masayuki. / Simple linear-time off-line text compression by longest-first substitution. Proceedings - DCC 2007: 2007 Data Compression Conference. 2007. pp. 123-132
@inproceedings{3a336c30421149cc8d33ad9ac83ba10c,
title = "Simple linear-time off-line text compression by longest-first substitution",
abstract = "We consider grammar based text compression with longest first substitution where non-overlapping occurrences of a longest repeating substring of the input text are replaced by a new non-terminal symbol. We present a new text compression algorithm by simplifying the algorithm presented in [4]. We give a new formulation of the correctness proof introducing the sparse lazy suffix tree data structure. We also present another type of longest first substitution strategy that allows better compression. We show results of preliminary experiments comparing grammar sizes of the two versions of the longest first strategy and the most frequent strategy.",
author = "Eyosuke Nakamura and Hideo Bannai and Shunsuke Ineriaga and Masayuki Takeda",
year = "2007",
doi = "10.1109/DCC.2007.70",
language = "English",
isbn = "0769527914",
pages = "123--132",
booktitle = "Proceedings - DCC 2007: 2007 Data Compression Conference",

}

TY - GEN

T1 - Simple linear-time off-line text compression by longest-first substitution

AU - Nakamura, Eyosuke

AU - Bannai, Hideo

AU - Ineriaga, Shunsuke

AU - Takeda, Masayuki

PY - 2007

Y1 - 2007

N2 - We consider grammar based text compression with longest first substitution where non-overlapping occurrences of a longest repeating substring of the input text are replaced by a new non-terminal symbol. We present a new text compression algorithm by simplifying the algorithm presented in [4]. We give a new formulation of the correctness proof introducing the sparse lazy suffix tree data structure. We also present another type of longest first substitution strategy that allows better compression. We show results of preliminary experiments comparing grammar sizes of the two versions of the longest first strategy and the most frequent strategy.

AB - We consider grammar based text compression with longest first substitution where non-overlapping occurrences of a longest repeating substring of the input text are replaced by a new non-terminal symbol. We present a new text compression algorithm by simplifying the algorithm presented in [4]. We give a new formulation of the correctness proof introducing the sparse lazy suffix tree data structure. We also present another type of longest first substitution strategy that allows better compression. We show results of preliminary experiments comparing grammar sizes of the two versions of the longest first strategy and the most frequent strategy.

UR - http://www.scopus.com/inward/record.url?scp=34547638395&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34547638395&partnerID=8YFLogxK

U2 - 10.1109/DCC.2007.70

DO - 10.1109/DCC.2007.70

M3 - Conference contribution

AN - SCOPUS:34547638395

SN - 0769527914

SN - 9780769527918

SP - 123

EP - 132

BT - Proceedings - DCC 2007: 2007 Data Compression Conference

ER -