In this paper, we propose a method to create automatically multi-layered contents from real world scene based on Depth from Focus and Spatio-Temporal Image Analysis. Since the contents are generated by layer representation directly from real world, the change of point of view is able to freely and it reduces the labor and cost of creating three-dimensional (3-D) contents using Computer Graphics. To extraction layer in the real images, Depth from Focus is used in case of stationary objects and Spatio-Temporal Image Analysis is used in case of moving objects. We selected above two methods, because of stability of system. Depth from Focus method doesn't need to search correspondence point and Spatio-Temporal Image Analysis has also simple computing algorithm relatively. We performed an experiment to extract layer contents from stationary and moving object automatically and the feasibility of the method was confirmed.