Modeling large-scale indoor scenes with rigid fragments using RGB-D cameras

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Hand-held consumer depth cameras have become a commodity tool for constructing 3D models of indoor environments in real time. Recently, many methods to fuse low quality depth images into a single dense and high fidelity 3D model have been proposed. Nonetheless, dealing with large-scale scenes remains a challenging problem. In particular, the accumulation of small errors due to imperfect camera localization becomes crucial (at large scale) and results in dramatic deformations of the built 3D model. These deformations have to be corrected whenever it is possible (when a loop exists for example). To facilitate such correction, we use a structured 3D representation where points are clustered into several planar patches that compose the scene. We then propose a two-stage framework to build in details and in real-time a large-scale 3D model. The first stage (the local mapping) generates local structured 3D models with rigidity constraints from short subsequences of RGB-D images. The second stage (the global mapping) aggregates all local 3D models into a single global model in a geometrically consistent manner. Minimizing deformations of the global model reduces to re-positioning the planar patches of the local models thanks to our structured 3D representation. This allows efficient, yet accurate computations. Our experiments using real data confirm the effectiveness of our proposed method.

Original languageEnglish
Pages (from-to)103-116
Number of pages14
JournalComputer Vision and Image Understanding
Volume157
DOIs
Publication statusPublished - Apr 1 2017

Fingerprint

Cameras
Electric fuses
Rigidity
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition

Cite this

Modeling large-scale indoor scenes with rigid fragments using RGB-D cameras. / Thomas, Diego Gabriel Francis; Sugimoto, Akihiro.

In: Computer Vision and Image Understanding, Vol. 157, 01.04.2017, p. 103-116.

Research output: Contribution to journalArticle

@article{46f437a5798947a0a0aab3e4ebb64c70,
title = "Modeling large-scale indoor scenes with rigid fragments using RGB-D cameras",
abstract = "Hand-held consumer depth cameras have become a commodity tool for constructing 3D models of indoor environments in real time. Recently, many methods to fuse low quality depth images into a single dense and high fidelity 3D model have been proposed. Nonetheless, dealing with large-scale scenes remains a challenging problem. In particular, the accumulation of small errors due to imperfect camera localization becomes crucial (at large scale) and results in dramatic deformations of the built 3D model. These deformations have to be corrected whenever it is possible (when a loop exists for example). To facilitate such correction, we use a structured 3D representation where points are clustered into several planar patches that compose the scene. We then propose a two-stage framework to build in details and in real-time a large-scale 3D model. The first stage (the local mapping) generates local structured 3D models with rigidity constraints from short subsequences of RGB-D images. The second stage (the global mapping) aggregates all local 3D models into a single global model in a geometrically consistent manner. Minimizing deformations of the global model reduces to re-positioning the planar patches of the local models thanks to our structured 3D representation. This allows efficient, yet accurate computations. Our experiments using real data confirm the effectiveness of our proposed method.",
author = "Thomas, {Diego Gabriel Francis} and Akihiro Sugimoto",
year = "2017",
month = "4",
day = "1",
doi = "10.1016/j.cviu.2016.11.008",
language = "English",
volume = "157",
pages = "103--116",
journal = "Computer Vision and Image Understanding",
issn = "1077-3142",
publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Modeling large-scale indoor scenes with rigid fragments using RGB-D cameras

AU - Thomas, Diego Gabriel Francis

AU - Sugimoto, Akihiro

PY - 2017/4/1

Y1 - 2017/4/1

N2 - Hand-held consumer depth cameras have become a commodity tool for constructing 3D models of indoor environments in real time. Recently, many methods to fuse low quality depth images into a single dense and high fidelity 3D model have been proposed. Nonetheless, dealing with large-scale scenes remains a challenging problem. In particular, the accumulation of small errors due to imperfect camera localization becomes crucial (at large scale) and results in dramatic deformations of the built 3D model. These deformations have to be corrected whenever it is possible (when a loop exists for example). To facilitate such correction, we use a structured 3D representation where points are clustered into several planar patches that compose the scene. We then propose a two-stage framework to build in details and in real-time a large-scale 3D model. The first stage (the local mapping) generates local structured 3D models with rigidity constraints from short subsequences of RGB-D images. The second stage (the global mapping) aggregates all local 3D models into a single global model in a geometrically consistent manner. Minimizing deformations of the global model reduces to re-positioning the planar patches of the local models thanks to our structured 3D representation. This allows efficient, yet accurate computations. Our experiments using real data confirm the effectiveness of our proposed method.

AB - Hand-held consumer depth cameras have become a commodity tool for constructing 3D models of indoor environments in real time. Recently, many methods to fuse low quality depth images into a single dense and high fidelity 3D model have been proposed. Nonetheless, dealing with large-scale scenes remains a challenging problem. In particular, the accumulation of small errors due to imperfect camera localization becomes crucial (at large scale) and results in dramatic deformations of the built 3D model. These deformations have to be corrected whenever it is possible (when a loop exists for example). To facilitate such correction, we use a structured 3D representation where points are clustered into several planar patches that compose the scene. We then propose a two-stage framework to build in details and in real-time a large-scale 3D model. The first stage (the local mapping) generates local structured 3D models with rigidity constraints from short subsequences of RGB-D images. The second stage (the global mapping) aggregates all local 3D models into a single global model in a geometrically consistent manner. Minimizing deformations of the global model reduces to re-positioning the planar patches of the local models thanks to our structured 3D representation. This allows efficient, yet accurate computations. Our experiments using real data confirm the effectiveness of our proposed method.

UR - http://www.scopus.com/inward/record.url?scp=85007613460&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85007613460&partnerID=8YFLogxK

U2 - 10.1016/j.cviu.2016.11.008

DO - 10.1016/j.cviu.2016.11.008

M3 - Article

AN - SCOPUS:85007613460

VL - 157

SP - 103

EP - 116

JO - Computer Vision and Image Understanding

JF - Computer Vision and Image Understanding

SN - 1077-3142

ER -