We present a technique for capturing facial performance in real time using an RGB-D camera. Such method can be applied to face augmentation by leveraging facial expression changes. The technique is able to perform both 3D facial modeling and facial motion tracking without the need of pre-scanning or training for a specific user. The proposed approach builds on an existing method that we refer as FaceCap, which uses a blendshape representation and a Bump image for tracking facial motion and capturing geometric details. The original FaceCap algorithm fails in some scenarios with complex motion and occlusions, mainly due to problems in the face detection and tracking steps. FaceCap also has problems with the Bump image filtering step that generates outliers, causing more distortion on the 3D augmented blendshape. In order to solve these problems, we propose two refinements: (a) a new framework for face detection and landmark localization based on the state-of-the-art methods MTCNN and CE-CLM, respectively; and (b) a simple but effective modification in the filtering step, removing reconstruction failures in the eye region. Experiments showed that the proposed approach can deal with unconstrained scenarios, such as large head pose variations and partial occlusions, while achieving real-time execution.