To run the TensorFlow application on ASTI's HPC-GPU Facility, follow the instructions below:
- Log in using your ASTI designated user account to 18.104.22.168
$ ssh email@example.com
- Load the “anaconda2” module which allows you to set up your very own python environment. This is necessary since the OS (CentOS 7.2) does not support Python 3.x as of yet
$ [username@tux-gpu-01-g2 ~] module load anaconda2/4.3.0
- Load the latest CUDA.
$ [username@tux-gpu-01-g2 ~] module load cuda/8.0_cudnn-6.0
- Create a new anaconda environment.
$ conda create -n your_environment_name python=3.5
- Activate your newly created conda environment
$ source activate your_environment_name
- Install the latest TensorFlow (1.4.1 as of this writing).
$ pip install tensorflow-gpu
- Validate your installation. Try executing an interactive python session on your shell and import tensorflow. Make sure you invoke
exit()to leave the interactive session
(your_environment_name) [username@tux-gpu-01-g2 ~]$ python iPython 3.5.4 |Continuum Analytics, Inc.| (default, Aug 14 2017, 13:26:58) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>>
- To start your first tensorflow job in the HPC, grab the latest version of the mnist code mnist_softmax.py from https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/examples/tutorials/mnist/ and store it in
scratch1. It’s a good idea to put it in the “scratch[1,2]” directories instead of home (You’ll learn more about the different storage services attached to the HPC on this link.
[username@tux-gpu-01-g2 ~] cd ~/scratch1 [username@tux-gpu-01-g2 scratch1] wget https://raw.githubusercontent.com/tensorflow/tensorflow/r1.4/tensorflow/examples/tutorials/mnist/mnist_softmax.py
- Copy of the SLURM batch script written below and save as mnist.slurm in
~/scratch1. The SLURM script contains necessary information about the specific amount and type of computational resources you’ll be requiring for a particular job/run. It includes the sequence of commands you normally invoke in an interactive session in order to properly execute an application using the batch scheduler.
#!/bin/bash #SBATCH --output=my_first_job.out #SBATCH --gres=gpu:1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=2 # check which GPU device was allocated echo "CUDA_DEVICE=/dev/nvidia$CUDA_VISIBLE_DEVICES" # prepare working environment module load anaconda2/4.3.0 module load module load cuda/8.0_cudnn-6.0 # activate your python environment source activate your_environment_name # execute your application srun python mnist_softmax.py source deactivate
By now you should have two files in your
[username@tux-gpu-01-g2 scratch1]$ ls mnist.slurm mnist_softmax.py [username@tux-gpu-01-g2 scratch1]$
- Edit the SLURM script and insert the following line just below the #!/bin/bash directive. Replace the variable <jobname> with any preferred “string”. This string serves as an identifier for your job (especially when you are managing multiple jobs).
#SBATCH -J <jobname>
- If you edit the name of your MNIST python script, make sure to reflect the changes to your SLURM script. You will need to import the logging module and replace all the instances of the “print” function in the MNIST script with “logging.debug” so that the output messages emitted by the script are properly recorded in the slurm output files. You will need to load the logging module in order to call logging.debug. Insert the following lines below the lines where you import other python modules
… # load the logging module import logging # customize the log message format logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%Y%m%d%H%M%S',level=logging.DEBUG)
A few words about SLURM parameters:
The --gres=gpu:1 specifies the number of GPU devices that your job requires for it to run. The MNIST code used in the lecture only needs one (1) GPU.
The --ntasks=1 instructs the batch scheduler that the job will spawn one process. Take note that the base code of TensorFlow is written in Python and inherently, it is single-threaded. However, the developers of the TensorFlow were very smart and had the proper wits to incorporate threading mechanisms in the code to achieve parallelism in order to improve run times and scale the complexity of the models.
--cpus-per-task=2 indicates the number of processors that will be assigned to the “python” processes. Note that the batch scheduler leverages on Linux Control Groups (cgroups) to prevent users from consuming resources beyond their allocations. It is a kernel mechanism that isolate user processes from each other.
- Submit your job script to the queue and wait for available resources to turn up.
$ sbatch mnist.slurm
- Check the status of your job. R - Running; PD - Pending
- As soon as your job starts to run, all of the console messages generated by the MNIST script will appear in a file named
The file name can be altered by setting the appropriate parameters in the slurm job script. More information about the usage of SLURM commands as well as the parameters available for configuring job runs here.
- To check the “occupancy” or usage of the GPU devices, one can issue:
- Once the MNIST job is finished, you should see the following content in the
CUDA_DEVICE=/dev/nvidia0 >> anaconda2/4.3.0 has been loaded. >> cuda-8.0_cudnn-6.0 has been loaded. 2018-01-16 11:37:43.434840: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2018-01-16 11:37:48.596303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235 pciBusID: 0000:04:00.0 totalMemory: 11.17GiB freeMemory: 11.11GiB 2018-01-16 11:37:48.596340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0, compute capability: 3.7) Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz 0.9156
Note that the value 0.9156 varies in each run.