Deep learning (DL) has achieved remarkable progress over the past decade and has been widely applied to many industry domains. However, the robustness of DL systems recently becomes great concerns, where minor perturbation on the input might cause the DL malfunction. These robustness issues could potentially result in severe consequences when a DL system is deployed to safety-critical applications and hinder the real-world deployment of DL systems. Testing techniques enable the robustness evaluation and vulnerable issue detection of a DL system at an early stage. The main challenge of testing a DL system attributes to the high dimensionality of its inputs and large internal latent feature space, which makes testing each state almost impossible. For traditional software, combinatorial testing (CT) is an effective testing technique to balance the testing exploration effort and defect detection capabilities. In this paper, we perform an exploratory study of CT on DL systems. We propose a set of combinatorial testing criteria specialized for DL systems, as well as a CT coverage guided test generation technique. Our evaluation demonstrates that CT provides a promising avenue for testing DL systems.