In this paper, we proposed a classification method of spectators' state in video sequences by voting of facial expressions and face directions. The task of this paper is to classify the state of the spectators in a given video sequence into "Positive Scene" or "Negative Scene", and "Watching Seriously" or "Not Watching Seriously". The proposed classifier is designed by a "bag-of-visual-words" approach based on face recognitions. First, the multiview (left-profile, front, rightprofile) faces are detected from each image in the given video sequence. Then the detected faces are classified into the two expressions, smile or not smile. The classification results of the face directions and the facial expressions are voted to each classes' histogram over the video sequence. Finally, the state of the spectators is classified by using the kernel SVM on the voted histograms. We conducted experiments using spectators' video sequences captured from TV. Our approach demonstrated promising results for classifications of "Positive Scene" and "Negative Scene" or "Watching Seriously" and "Not Watching Seriously". It was also ascertained that the facial expression is important in the classification of "Positive" and "Negative". On the other hand, face direction is important to classify whether the spectators are "Watching Seriously" or "Not".