Finding high-level semantic information from a point cloud is a challenging task, and it can be used in various applications. For instance, it is useful to compactly represent the scene structure and efficiently understand the scene context. This task is even more challenging when using a hand-held monocular visual SLAM system that outputs a noisy sparse point cloud. In order to tackle this issue, we propose an incremental primitive modeling method using both geometric and statistical analyses for such point cloud. The main idea is to select only reliably-modeled shapes by analyzing the geometric relationship between the point cloud and the estimated shapes. Besides that, a statistical evaluation is incorporated to filter wrongly-detected primitives in a noisy point cloud. As a result of this processing, our approach largely improved precision when compared with state of the art methods. We also show the impact of segmenting and representing a scene using primitives instead of a point cloud.