Design of and comparison with verification and validation benchmarks


Oberkampf,-William-L.; Trucano,-Timothy-G, Verification and validation (V and V) are the primary means to assess accuracy and reliability of computational simulations. V and V methods and procedures have fundamentally improved the credibility of simulations in several high-consequence application areas, such as, nuclear reactor safety, underground storage of nuclear waste, and safety of nuclear weapons. Although the terminology is not uniform across engineering disciplines, code verification deals with the assessment of the reliability of the software coding and solution verification deals with the numerical accuracy of the solution to a computational model. Validation addresses the physics modeling accuracy of a computational simulation by comparing with experimental data. Code verification benchmarks and validation benchmarks have been constructed for a number of years in every field of computational simulation. However, no comprehensive guidelines have been proposed for the construction and use of V and V benchmarks. Some fields, such as nuclear reactor safety, place little emphasis on code verification benchmarks and great emphasis on validation benchmarks that are closely related to actual reactors operating near safety-critical conditions. This paper proposes recommendations for the optimum design and use of code verification benchmarks based on classical analytical solutions, manufactured solutions, and highly accurate numerical solutions. It is believed that these benchmarks will prove useful to both in-house developed codes, as well as commercially licensed codes. In addition, this paper proposes recommendations for the design and use of validation benchmarks with emphasis on careful design of building-block experiments, estimation of experiment measurement uncertainty for both inputs and outputs to the code, validation metrics, and the role of model calibration in validation. It is argued that predictive capability of a computational model is built on both the measurement of achievement in V and V, as well as how closely related are the V and V benchmarks to the actual application of interest, e.g., the magnitude of extrapolation beyond a validation benchmark to a complex engineering system of interest.