The paper describes difficulties in the proper evaluation of obstacle-resolving urban CFD models. After a brief description of the evaluation methodology suggested by the European COST action 732, focus is laid on the question of how to obtain validation data that can be regarded as a reliable standard. Data from an entire year of measurements at an urban monitoring station are analyzed, which show a large amount of scattering for seemingly identical cases. The atmospheric variability issue is thoroughly discussed, and a concept for the provision of validation data based on a combination of field and boundary layer wind tunnel experiments is presented. (C) 2011 Elsevier Ltd. All rights reserved.