A large-scale ensemble prediction model to predict train delays is presented. The ensemble model uses a disparate set of models, two statistical and one simulation-based to generate forecasts of train delays. The first statistical model is a context-aware random forest that accounts for network traffic states, such as likely stretch conflicts and current headway’s, exogenous weather, event, and work zone information. The second model is a kernel regression that captures train-specific dynamics. A mesoscopic simulation model that accounts for travel and dwell time variations as well as inferred track occupation conflicts, train connections and rolling stock rotations, is additionally considered. The models have been used in a proof of concept to forecast delays for nationwide passenger services network of Deutsche Bahn, which operates roughly 25,000 trains daily in Germany. Results demonstrate a 25% improvement potential in forecast correctness (fraction of predictions within one minute) and 50% reduction in root mean squared errors compared to the published schedule. The paper describes the models along with the big data challenges that were addressed in data storage, feature and model building, and computation.