With the recent developments of Mixed Reality (MR) devices and advances in 3D scene understanding, MR applications on mobile devices are becoming available to a large part of the society. These applications allow users to mix virtual content into the surrounding environment. However the ability to mediate (i.e., modify or alter) the surrounding environment remains a difficult and unsolved problem that limits the degree of immersion of current MR applications on mobile devices. In this paper, we present a method to mediate 2D views of a real environment using a single consumer-grade RGB-D camera and without the need of pre-scanning the scene. Our proposed method creates in real-Time a dense and detailed keyframe-based 3D map of the real scene and takes advantage of a semantic instance segmentation to isolate target objects. We show that our proposed method allows to remove target objects in the environment and to replace them by their virtual counterpart, which are built on-The-fly. Such an approach is well suited for creating mobile Mediated Reality applications.