Fidelity evaluations are an important part of simulation design. They identify any gaps in fidelity, help establish the simulator’s overall validity, and their outcomes may significantly enhance the correspondence between the simulated environment and the real world. However, fidelity evaluations are easily overwhelmed by the technical focus in the simulator commissioning and acceptance processes, and too often, are overlooked or undermined in favour of remaining within the scope of the original specification. This article presents a fidelity evaluation that was applied to a railway safety research simulator after it was deployed for operational use. The evaluation was stratified according to the physical, functional and task-based strands of fidelity, and undertaken in a collaborative research approach that integrated the relative domain, task, simulation and human factors-based expertise of a team of evaluators. Furthermore, the evaluation also examined the simulator’s tractability in terms of its intended users (researchers), and investigated the scope for scenario development. The findings revealed several opportunities for improving the fidelity of the simulator for research (and training) applications, but also identified a number of critical deficiencies in its underlying architecture. This paper discusses the outcomes of the evaluation in terms of the fidelity expectations of the developer and user, and the tension encountered when trying to adjust these post-deployment. Lastly, it provides some clarification between what to improve and what can be improved, when the proverbial train has, for all intents and purposes, left the station.