TY - GEN
T1 - Online LZ77 parsing and matching statistics with RLBWTs
AU - Bannai, Hideo
AU - Gagie, Travis
AU - I, Tomohiro
N1 - Funding Information:
Supported by FONDECYT Grant Number 1171058. 2 Supported by JSPS KAKENHI Grant Number JP16K16009.
Funding Information:
Supported by FONDECYT Grant Number 1171058. Supported by JSPS KAKENHI Grant Number JP16K16009
Publisher Copyright:
© 2018 Yoshifumi Sakai; licensed under Creative Commons License CC-BY.
PY - 2018/5/1
Y1 - 2018/5/1
N2 - Lempel-Ziv 1977 (LZ77) parsing, matching statistics and the Burrows-Wheeler Transform (BWT) are all fundamental elements of stringology. In a series of recent papers, Policriti and Prezza (DCC 2016 and Algorithmica, CPM 2017) showed how we can use an augmented run-length compressed BWT (RLBWT) of the reverse TR of a text T, to compute offline the LZ77 parse of T in O(n log r) time and O(r) space, where n is the length of T and r is the number of runs in the BWT of TR. In this paper we first extend a well-known technique for updating an unaugmented RLBWT when a character is prepended to a text, to work with Policriti and Prezza's augmented RLBWT. This immediately implies that we can build online the LZ77 parse of T while still using O(n log r) time and O(r) space; it also seems likely to be of independent interest. Our experiments, using an extension of Ohno, Takabatake, I and Sakamoto's (IWOCA 2017) implementation of updating, show our approach is both time- and space-efficient for repetitive strings. We then show how to augment the RLBWT further -albeit making it static again and increasing its space by a factor proportional to the size of the alphabet -such that later, given another string S and O(log log n)-time random access to T, we can compute the matching statistics of S with respect to T in O(|S| log log n) time.
AB - Lempel-Ziv 1977 (LZ77) parsing, matching statistics and the Burrows-Wheeler Transform (BWT) are all fundamental elements of stringology. In a series of recent papers, Policriti and Prezza (DCC 2016 and Algorithmica, CPM 2017) showed how we can use an augmented run-length compressed BWT (RLBWT) of the reverse TR of a text T, to compute offline the LZ77 parse of T in O(n log r) time and O(r) space, where n is the length of T and r is the number of runs in the BWT of TR. In this paper we first extend a well-known technique for updating an unaugmented RLBWT when a character is prepended to a text, to work with Policriti and Prezza's augmented RLBWT. This immediately implies that we can build online the LZ77 parse of T while still using O(n log r) time and O(r) space; it also seems likely to be of independent interest. Our experiments, using an extension of Ohno, Takabatake, I and Sakamoto's (IWOCA 2017) implementation of updating, show our approach is both time- and space-efficient for repetitive strings. We then show how to augment the RLBWT further -albeit making it static again and increasing its space by a factor proportional to the size of the alphabet -such that later, given another string S and O(log log n)-time random access to T, we can compute the matching statistics of S with respect to T in O(|S| log log n) time.
UR - http://www.scopus.com/inward/record.url?scp=85048304169&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85048304169&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.CPM.2018.7
DO - 10.4230/LIPIcs.CPM.2018.7
M3 - Conference contribution
AN - SCOPUS:85048304169
T3 - Leibniz International Proceedings in Informatics, LIPIcs
SP - 71
EP - 712
BT - 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018
A2 - Zhu, Binhai
A2 - Navarro, Gonzalo
A2 - Sankoff, David
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018
Y2 - 2 July 2018 through 4 July 2018
ER -