Document image decoding (DID) is a trial to understand the contents of a whole document without any reference information about font, language, etc. Typically, DID approaches assume the correct segmentation of the document and some a priori knowledge about the language or the script. Unfortunately, this assumption will not hold if we deal with various documents, such as documents with various sized fonts, camera-captured documents, free-layout documents, or historical documents. In this paper, we propose a part-based character identification method where no segmentation into characters is necessary and no a priori information about the document is needed. The approach clusters similar key points and groups frequent neighboring key point clusters. Then a second iteration is performed, i.e., the groups are again clustered and optionally pairs frequent group clusters are detected. Our first experimental results on multi font-size documents look already very promising. We could find nearly perfect correspondences between characters and detected group clusters.