TY - JOUR

T1 - The Smallest Grammar Problem Revisited

AU - Bannai, Hideo

AU - Hirayama, Momoko

AU - Hucke, Danny

AU - Inenaga, Shunsuke

AU - Jez, Artur

AU - Lohrey, Markus

AU - Reh, Carl Philipp

N1 - Funding Information:
Manuscript received January 9, 2018; revised February 14, 2020; accepted September 10, 2020. Date of publication November 16, 2020; date of current version December 21, 2020. The work of Hideo Bannai was supported by JSPS KAKENHI under Grant JP16H02783 and Grant JP20H04141. The work of Shunsuke Inenaga was supported in part by JSPS KAKENHI under Grant JP17H01697 and in part by JST PRESTO under Grant JPMJPR1922. The work of Artur Jeż was supported by the National Science Centre, Poland, under Project 2017/26/E/ST6/00191. The work of Markus Lohrey was supported by the DFG research project LO 748/10-1 (QUANT-KOMP). (Corresponding author: Carl Philipp Reh.) Hideo Bannai is with the Department of Data Science Algorithm Design and Analysis, Tokyo Medical and Dental University, Tokyo 113-8510, Japan.

PY - 2021/1

Y1 - 2021/1

N2 - In a seminal paper, Charikar et al. derive upper and lower bounds on the approximation ratios for several grammar-based compressors, but in all cases there is a gap between the lower and upper bound. Here the gaps for LZ78 and BISECTION are closed by showing that the approximation ratio of LZ78 is $\Theta ((\text {n}/\log \text {n})^{2/3})$ , whereas the approximation ratio of BISECTION is $\Theta (\sqrt {\text {n}/\log \text {n}})$. In addition, the lower bound for RePair is improved from $\Omega (\sqrt {\log \text {n}})$ to $\Omega (\log \text {n}/\log \log \text {n})$. Finally, results of Arpe and Reischuk relating grammar-based compression for arbitrary alphabets and binary alphabets are improved.

AB - In a seminal paper, Charikar et al. derive upper and lower bounds on the approximation ratios for several grammar-based compressors, but in all cases there is a gap between the lower and upper bound. Here the gaps for LZ78 and BISECTION are closed by showing that the approximation ratio of LZ78 is $\Theta ((\text {n}/\log \text {n})^{2/3})$ , whereas the approximation ratio of BISECTION is $\Theta (\sqrt {\text {n}/\log \text {n}})$. In addition, the lower bound for RePair is improved from $\Omega (\sqrt {\log \text {n}})$ to $\Omega (\log \text {n}/\log \log \text {n})$. Finally, results of Arpe and Reischuk relating grammar-based compression for arbitrary alphabets and binary alphabets are improved.

UR - http://www.scopus.com/inward/record.url?scp=85097139422&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85097139422&partnerID=8YFLogxK

U2 - 10.1109/TIT.2020.3038147

DO - 10.1109/TIT.2020.3038147

M3 - Article

AN - SCOPUS:85097139422

VL - 67

SP - 317

EP - 328

JO - IEEE Transactions on Information Theory

JF - IEEE Transactions on Information Theory

SN - 0018-9448

IS - 1

M1 - 9259056

ER -