WRF-CMAQ v5.3.2 Compiler Test

From CMASWIKI
Revision as of 14:00, 10 August 2020 by Lizadams (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Following the directions available on
https://github.com/kmfoley/CMAQ/blob/v532_20200702/DOCS/Users_Guide/Tutorials/CMAQ_UG_tutorial_WRF-CMAQ_build_gcc.md

Made some modifications to the tutorial. They are located under.

https://github.com/lizadams/CMAQ/blob/patch-3/DOCS/Users_Guide/Tutorials/CMAQ_UG_tutorial_WRF-CMAQ_build_gcc.md
or
https://github.com/lizadams/CMAQ/blob/master/DOCS/Users_Guide/Tutorials/CMAQ_UG_tutorial_WRF-CMAQ_build_gcc.md

I will combine and submit a pull request to Kristen when the edits are complete.

I was able to get the debug version to run without a floating point error by changing Debug Flag in configure.wrf using the following

FCOPTIM = -O0
FCDEBUG = -g $(FCNOOPT)

Modifying FCOPTIM in this way, means that I don't need to edit the Makefile.twoway in the cmaq code.

When I used the following debug flags the floating point error stopped the run as before (almost before it got started)

  1. -ggdb -fbacktrace -fcheck=bounds,do,mem,pointer -ffpe-trap=invalid,zero,overflow

The Optimized version of WRF-CMAQv5.3.2 does not match the debug version. I ran WRFv4.1.1-CMAQv5.3.2 with the following option

setenv RUN_CMAQ_DRIVER            F   # [F]

This confirmed that the WRF output does not change using the optimized -O2 option of WRF.

FCOPTIM         =       -O2 -ftree-vectorize -funroll-loops


Optimized Flag in configure.wrf
FCOPTIM = -O2 -ftree-vectorize -funroll-loops

The plots are available here:
The debug version matches Fahim's results for the no-feedback case.
Click on the following link and then click submit. You should be able to scroll thru the variables.
https://dataviewer-dept-cempd.cloudapps.unc.edu/index.cfm?back_address=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_16pe_nf/plots/compiler_sens/base_nf/layer1_only

The debug 16 pe version matches Fahim's results for the short-wave feedback case. (#pe dependent)
https://dataviewer-dept-cempd.cloudapps.unc.edu/index.cfm?back_address=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_16pe_sf/plots/compiler_sens/base_sf/layer1_only

Side by side comparison of the no feedback and shortwave feedback is available here: https://dataviewer-dept-cempd.cloudapps.unc.edu/compare.cfm?back_address2=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_16pe_sf/plots/compiler_sens/base_sf/layer1_only&back_address1=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_16pe_nf/plots/compiler_sens/base_nf/layer1_only

The shortwave feedback percent difference plot shows that there is a difference in output for the debug version on 16 and 32 pe. We see this with the no feed back runs, but to a lesser magnitude.

David Wong suggested that I use a -O2 or less aggressive compiler optimization to see if that helps reduce the difference between the debug and optimized versions.

I tried the following compile options

Caption: Compiler Options and Run Times on 16 pe
-O2 -O1 -Og -O0 -g (debug)
real 2009.05 2230.82 2753.16 8090.41
user 20,597.59 35,559.21 43,522.31 64,661.29
m3diff at 2016183:230000 A:B 1.12602E+02@( 90,30, 1) -1.05625E+01@( 90,32, 1) 3.87858E+01 1.53473E+01 A:B 4.97734E+01@( 99,65, 1) -4.92266E+01@( 99,58, 1) 4.80420E-01 5.49204E+00 A:B 0.000 B

Fahim and David will take a look at the domain decomposition issues. Looking at the CO variable

Note: the debug version of WRF-CMAQ with Shortwave Feedback will not run on 8 pe in time to complete successfully in the debug queue, and can't run in the larger queues as it is too small a PE configuration.

Table of Plots Available

Caption:
debug base vs debug optimized base vs opt debug base vs debug and opt debug CMAQv5.3.2 vs debug nf
nf (no feedback) https://dataviewer-dept-cempd.cloudapps.unc.edu/compare.cfm?back_address1=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_16pe_nf/compiler_sens/base_nf&back_address2=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_16pe_nf/prc_diff/base_nf https://dataviewer-dept-cempd.cloudapps.unc.edu/compare.cfm?back_address1=/WRFv4.1.1-CMAQv5.3.2/fahim_opt_16pe_nf/compiler_sens/base_nf/layer1_only&back_address2=/WRFv4.1.1-CMAQv5.3.2/fahim_opt_16pe_nf/prc_diff/base_nf/layer1_only https://dataviewer-dept-cempd.cloudapps.unc.edu/compare.cfm?back_address1=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_and_opt_16pe_nf/plots/compiler_sens/base_nf/layer1_only&back_address2=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_and_opt_16pe_nf/plots/prc_diff/base_nf/layer1_only
sf (shortwave feedback) https://dataviewer-dept-cempd.cloudapps.unc.edu/compare.cfm?back_address1=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_16pe_sf/compiler_sens/base_sf/layer1_only&back_address2=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_16pe_sf/prc_diff/base_sf/layer1_only https://dataviewer-dept-cempd.cloudapps.unc.edu/compare.cfm?back_address1=/WRFv4.1.1-CMAQv5.3.2/fahim_opt_16pe_sf/compiler_sens/base_sf&back_address2=/WRFv4.1.1-CMAQv5.3.2/fahim_opt_16pe_sf/prc_diff/base_sf https://dataviewer-dept-cempd.cloudapps.unc.edu/compare.cfm?back_address1=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_and_opt_16pe_sf/plots/compiler_sens/base_sf/layer1_only&back_address2=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_and_opt_16pe_sf/plots/prc_diff/base_sf/layer1_only
Caption: Max %Diff for ACLI
debug opt Opt&Debug vs Debug
WRF-CMAQ NF 120 8 200
WRF-CMAQ SF 200 200 200
CMAQ 0 25 (2x4) 150