It is essential for a monitoring system or a communication robot that interacts with an elderly person to accurately understand the user's state and generate actions based on their condition. To ensure elderly welfare, quality of life (QOL) is a useful indicator for determining human physical suffering and mental and social activities in a comprehensive manner. In this study, we hypothesize that visual information is useful for extracting high-dimensional information on QOL from data collected by an agent while interacting with a person. We propose a QOL estimation method to integrate facial expressions, head fluctuations, and eye movements that can be extracted as visual information during the interaction with the communication agent. Our goal is to implement a multiple feature vectors learning estimator that incorporates convolutional 3D to learn spatiotemporal features. However, there is no database required for QOL estimation. Therefore, we implement a free communication agent and construct our database based on information collected through interpersonal experiments using the agent. To verify the proposed method, we focus on the estimation of the mental health QOL scale, which is the most difficult to estimate among the eight scales that compose QOL based on a previous study. We compare the four estimation accuracies: single-modal learning using each of the three features, i.e., facial expressions, head fluctuations, and eye movements and multiple feature vectors learning integrating all the three features. The experimental results show that multiple feature vectors learning has fewer estimation errors than all the other single-modal learning, which uses each feature separately. The experimental results for evaluating the difference between the estimated QOL score by the proposed method and the actual QOL score calculated by the conventional method also show that the average error is less than 10 points and, thus, the proposed system can estimate the QOL score. Thus, it is clear that the proposed new approach for estimating human conditions can improve the quality of human-robot interactions and personalized monitoring.