WRF-CMAQ v5.3.2 Compiler Test

Revision as of 14:00, 10 August 2020 by Lizadams (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Following the directions available on

Made some modifications to the tutorial. They are located under.


I will combine and submit a pull request to Kristen when the edits are complete.

I was able to get the debug version to run without a floating point error by changing Debug Flag in configure.wrf using the following


Modifying FCOPTIM in this way, means that I don't need to edit the Makefile.twoway in the cmaq code.

When I used the following debug flags the floating point error stopped the run as before (almost before it got started)

  1. -ggdb -fbacktrace -fcheck=bounds,do,mem,pointer -ffpe-trap=invalid,zero,overflow

The Optimized version of WRF-CMAQv5.3.2 does not match the debug version. I ran WRFv4.1.1-CMAQv5.3.2 with the following option

setenv RUN_CMAQ_DRIVER            F   # [F]

This confirmed that the WRF output does not change using the optimized -O2 option of WRF.

FCOPTIM         =       -O2 -ftree-vectorize -funroll-loops

Optimized Flag in configure.wrf
FCOPTIM = -O2 -ftree-vectorize -funroll-loops

The plots are available here:
The debug version matches Fahim's results for the no-feedback case.
Click on the following link and then click submit. You should be able to scroll thru the variables.

The debug 16 pe version matches Fahim's results for the short-wave feedback case. (#pe dependent)

Side by side comparison of the no feedback and shortwave feedback is available here: https://dataviewer-dept-cempd.cloudapps.unc.edu/compare.cfm?back_address2=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_16pe_sf/plots/compiler_sens/base_sf/layer1_only&back_address1=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_16pe_nf/plots/compiler_sens/base_nf/layer1_only

The shortwave feedback percent difference plot shows that there is a difference in output for the debug version on 16 and 32 pe. We see this with the no feed back runs, but to a lesser magnitude.

David Wong suggested that I use a -O2 or less aggressive compiler optimization to see if that helps reduce the difference between the debug and optimized versions.

I tried the following compile options

Caption: Compiler Options and Run Times on 16 pe
-O2 -O1 -Og -O0 -g (debug)
real 2009.05 2230.82 2753.16 8090.41
user 20,597.59 35,559.21 43,522.31 64,661.29
m3diff at 2016183:230000 A:B 1.12602E+02@( 90,30, 1) -1.05625E+01@( 90,32, 1) 3.87858E+01 1.53473E+01 A:B 4.97734E+01@( 99,65, 1) -4.92266E+01@( 99,58, 1) 4.80420E-01 5.49204E+00 A:B 0.000 B

Fahim and David will take a look at the domain decomposition issues. Looking at the CO variable

Note: the debug version of WRF-CMAQ with Shortwave Feedback will not run on 8 pe in time to complete successfully in the debug queue, and can't run in the larger queues as it is too small a PE configuration.

Table of Plots Available

debug base vs debug optimized base vs opt debug base vs debug and opt debug CMAQv5.3.2 vs debug nf
nf (no feedback) https://dataviewer-dept-cempd.cloudapps.unc.edu/compare.cfm?back_address1=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_16pe_nf/compiler_sens/base_nf&back_address2=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_16pe_nf/prc_diff/base_nf https://dataviewer-dept-cempd.cloudapps.unc.edu/compare.cfm?back_address1=/WRFv4.1.1-CMAQv5.3.2/fahim_opt_16pe_nf/compiler_sens/base_nf/layer1_only&back_address2=/WRFv4.1.1-CMAQv5.3.2/fahim_opt_16pe_nf/prc_diff/base_nf/layer1_only https://dataviewer-dept-cempd.cloudapps.unc.edu/compare.cfm?back_address1=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_and_opt_16pe_nf/plots/compiler_sens/base_nf/layer1_only&back_address2=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_and_opt_16pe_nf/plots/prc_diff/base_nf/layer1_only
sf (shortwave feedback) https://dataviewer-dept-cempd.cloudapps.unc.edu/compare.cfm?back_address1=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_16pe_sf/compiler_sens/base_sf/layer1_only&back_address2=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_16pe_sf/prc_diff/base_sf/layer1_only https://dataviewer-dept-cempd.cloudapps.unc.edu/compare.cfm?back_address1=/WRFv4.1.1-CMAQv5.3.2/fahim_opt_16pe_sf/compiler_sens/base_sf&back_address2=/WRFv4.1.1-CMAQv5.3.2/fahim_opt_16pe_sf/prc_diff/base_sf https://dataviewer-dept-cempd.cloudapps.unc.edu/compare.cfm?back_address1=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_and_opt_16pe_sf/plots/compiler_sens/base_sf/layer1_only&back_address2=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_and_opt_16pe_sf/plots/prc_diff/base_sf/layer1_only
Caption: Max %Diff for ACLI
debug opt Opt&Debug vs Debug
WRF-CMAQ NF 120 8 200
WRF-CMAQ SF 200 200 200
CMAQ 0 25 (2x4) 150