Guidelines to Running WRF in the HPC [DRAFT]

This guide outlines how to run Weather Research and Forecasting (WRF) model in the CoARE High Performance Computing (HPC) facility. This document is not meant to provide a step-by-step tutorial of WRF but is intended to facilitate HPC utilization for WRF application. If you are new to WRF, familiarize yourself first with the application through online tutorials and WRF documentation. Before continuing, it is important that you read on How to Use the HPC especially on sections Loading Software Into Your Environment and Managing Jobs to understand how the HPC works and how to use the resources. The instructions there will be used here quite extensively.

Getting everything ready

A. Choosing among available WRF packages in the HPC

As of this writing, there are a few versions of WRF installed with two types of MPI implementation (openmpi and mpich).It is highly recommended to use the same version and MPI implementation of WPS and WRF when running your application. For instance, if you intend to use wrf/3.7-intel-openmpi, use wps/3.7-intel-openmpi as your pre-processing system.

[user@hpc ~]$ module whatis
wps/3.7-intel-openmpi: WRF Preprocessing System
wps/3.7.1-intel-mpich: WRF Preprocessing System
wps/3.7.1-intel-openmpi: WRF Preprocessing System
wrf/3.7-intel-openmpi: WRF
wrf/3.7.1-intel-mpich: WRF
wrf/3.7.1-intel-openmpi: WRF
wrf/3.8.1-intel-mpich: WRF
wrf-chem/3.7-intel-openmpi: WRF Chem
wrfda/3.7-intel-openmpi: WRFDA
wrfplus/3.7-intel-openmpi: WRF Plus

Each module contains binaries needed to implement your WRF application. However, certain processes in WRF read static files that are included in the WPS and WRF source codes. This means that the binaries loaded by module load <package name> won't be enough to run the application. Once you have chosen the version, please download and extract the source codes (WPS and WRF) from this download page to your home directory (/home/user).

B. Geographical Input Data for WPS

Geographical input data can be retrieved from this link. The whole dataset is ~49 GB, uncompressed but the files can be downloaded individually, as needed. These static files must be kept inside your home directory (e.g. /home/user/geog/).

Running WRF

A. Downloading Initial Conditions

Input datasets (or know as initial conditions) should be stored in the scratch directories (/scratch1/user or /scratch2/user). These files tend to be very large and will likely fill /home/user directory quickly. Below is a sample slurm script for downloading GFS files:

#SBATCH --partition=batch
#SBATCH --ntasks=1


for ((i=0;i<=24; i=i+3))
  num=$(printf %03d ${i})
  mpirun wget -c -O "/scratch2/user/gfs/gfs.t00z.mastergrb2f"${num} ${pref}${num}${suff}

Then, run the slurm script using:

[user@hpc ~]$ sbatch <download_slurm_script>

Alternatively, the /scratch2/user/gfs in the sample slurm script can be removed and use:

[user@hpc ~]$ sbatch -D /scratch2/user/gfs <download_slurm_script>

There are multiple ways of improving this script, though. Using job arrays, for instance, can do the download in a more elegant fashion. A more simpler method, however, is just to issue a wget command directly but you should limit it to at most two instances at once. It is highly recommended to keep the load of the frontend node at a minimum and let the HPC nodes (not the frontend node) handle multiple file downloads.

B. WRF Pre-processing (geogrid.exe, link_grib.csh, ungrib.exe, metgrid.exe)

Create a WPS working directory (<wps-work-dir>) that must reside on a scratch directory. Make sure it contains all the necessary static files. A simple sample slurm script is written below.

#SBATCH --partition=batch

module load wps/3.7-intel-openmpi

mpirun ?*

The ?* after the mpirun command will accept all arguments passed on to the slurm script. With this, you can do the following commands in succession:

[user@hpc ~]$ geog=`sbatch -D <wps-work-dir> <wps_slurm_script> "geogrid.exe" | egrep -o -e "\b[0-9]+$"`
[user@hpc ~]$ link=`sbatch -D <wps-work-dir> -d afterok:${geog} <wps_slurm_script> "link_grib.csh <gfs-dir>/gfs.*" | egrep -o -e "\b[0-9]+$"`
[user@hpc ~]$ ugrb=`sbatch -D <wps-work-dir> -d afterok:${link} <wps_slurm_script> "ungrib.exe" | egrep -o -e "\b[0-9]+$"`
[user@hpc ~]$ sbatch -D <wps-work-dir> -d afterok:${ugrb} <wps_slurm_script> "metgrid.exe" 

with egrep -o -e "\b[0-9]+$" capturing the slurm job ID assigned to each line of code and -d afterok:<id> specifying a job dependency. You could also add other flags to each command that suit your needs. For example, you can add an "--ntasks=20" flag when executing metgrid.exe to speed up the generation of intermediate files.

C. WRF execution (real.exe, wrf.exe)

Running WRF binaries follows the same principle discussed in the previous section -- create a working directory with the required static files (<wrf-work-dir>) in a scratch directory. Then do the following:

#SBATCH --partition=batch

module load wrf/3.7-intel-openmpi

mpirun ?*

And perform:

[user@hpc ~]$ ln -sf <wps-work-dir>/met_em* <wrf-work-dir>
[user@hpc ~]$ real=`sbatch -D <wrf-work-dir> <wrf_slurm_script> "real.exe" | egrep -o -e "\b[0-9]+$"`
[user@hpc ~]$ sbatch -D <wrf-work-dir> -d afterok:${real} <wrf_slurm_script> "wrf.exe" 


There are a number of tools already installed in the HPC for post-processing WRF outputs. Currently, these are Climate Data Operators (CDO), NetCDF Operators (NCO), NetCDF binaries, and scripting languages such as NCAR Command Language (NCL), Python, and R. The packages CDO, NCO, and NetCDF binaries have built-in support for WRF outputs. They are quite handy for simple operations. The scripting language NCL has also built-in support for WRF which makes it a good starting point for users wanting to do more complex calculations. Python and R, on the other hand, do not but python modules and R packages can be installed to enable such support.

[user@hpc ~]$ module whatis
cdo/1.7.1-intel      : CDO
cdo/1.7.2-intel      : CDO
ncl-ncarg/6.3.0-intel: NCL-NCARG
nco/4.5.2-intel      : NCO
nco/4.6.1-intel      : NCO
netcdf/ : Intel Fortran/C++ Compilers
netcdf/4.4-intel     : NETCDF
python/2.7.11        : PYTHON
python/2.7.11-intel  : PYTHON
python/2.7.12-intel  : PYTHON
python/2.7.6         : PYTHON
python/3.4.5-intel   : PYTHON
r/3.2.5-intel        : R

To use these packages, create a slurm script similar to:

#SBATCH --partition=batch

module load netcdf/4.4-intel
module load nco/4.6.1-intel

# run a netcdf binary
srun ncdump -h "<wrfout-file>" > "wrfout-headers.log" 

# run an nco binary
srun ncks -O -v "RAINC" "<wrfout-file>" "" 

Then, run the script with:

[user@hpc ~]$ sbatch <postproc_slurm_script>

If you only wish inspect the WRF output quickly, you can interactively do it with:

[user@hpc ~]$ salloc -n 1 -p debug
salloc: Granted job allocation <some_id>
[user@hpc ~]$ squeue | grep <some_id>
            <some_id>     debug     bash user      R       0:41      1 tux-01
[user@hpc ~]$ ssh tux-01
Last login: Fri Oct 21 13:31:23 2016 from tux.asti.local

Now, that you are inside a node of the HPC, the following commands should work:

[user@tux-01 ~]$ module load netcdf/4.4-intel
[user@tux-01 ~]$ ncdump -h /path/to/wrfout-file

Important Remarks

As a rule of thumb, when running WRF, only your scripts and static files should be in your /home/user directory; while the outputs of your application should be directed to the scratch directories. WRF often operates on multiple and potentially large datasets thus it is imperative that calculations are handled by the HPC nodes and not the frontend node.