The medical field is embracing the information age, and the rapidly increasing medical data generated from hospital information system signified the advent of Big Data in the healthcare arena, such that real-time data are now available to assist many clinical decisions. Real World Data (RWD) from hospital information system structured numerical data and unstructured text data, and it is imperative that phenotyping reproducibly extracts patients with an accurate phenotype from RWD using a rule-based approach. In this study, of sampling computed tomography reports from 100 patients, 48 were diagnosed with interstitial pneumonia. Three machine learning methods (Support Vector Machine, Feature Selection and Gradient Boosting Decision Tree (GBDT)) were combined for development of a text phenotyping method, which was applied for the analysis to achieve prediction with good performance. We extracted several feature words to predict true cases of interstitial pneumonia and recognized that the effect of feature selection was identified from a good performance of GBDT’s AUC. We also identified that while applying machine learning to text-based RWD, variables have to be narrowed down.