Prediction of bus travel time and/or delay time is a useful tool for passengers who want to plan their journey, e.g., when they should leave from the origin bus stop, what they will do after arriving at the destination bus stop, and so on. Many studies have tackled this task using probe data and/or the real time data provided by automatic vehicle location (AVL) systems. Most of them only targeted a small number of routes, short time periods, e.g. less than one week, and used few machine learning models to evaluate their methods. However, different routes generally show different characteristics. In fact, there are big differences between urban routes and rural routes. Furthermore, the performance of machine learning models also varies according to the data dealt with by the models. In this paper, we propose prediction models for bus delay over all intervals between pairs of adjacent bus stops. To build the models, we use one month of bus probe data, which includes more than 80 routes, and apply several machine learning models: linear regression (LR), artificial neural network (ANN), support vector regression (SVR), random forest (RF), and gradient boosting decision tree (GBDT). Experimental results demonstrate the superiority of the GBDT-based prediction model and the effects of considering travel time over prior intervals.