SW:TensorFlow Benchmarks
HPRC TensorFlow Benchmarks
In order to compare different installations of TensorFlow on our clusters (and with other clusters), we conducted a number of benchmarks.
Contents
Overview
The benchmarks came from the git repository at TensorFlow Benchmarks downloaded February 8, 2018.
All runs were done in a single batch job on a single node on the cluster. All runs used every CPU available on the node that was allocated. All GPU runs used 2 GPUs (NVIDIA Tesla K20s on ada and NVIDIA Tesla K80s on terra).
For Anaconda's Tensorflow, we used Anaconda3/5.1.0 with either tensorflow/1.5.0 (CPU only) or tensorflow-gpu/1.4.1 (GPU). For locally built versions, we used TensorFlow/1.5.0-foss-2017b-Python-3.6.3 (CPU only) and TensorFlow/1.5.0-goolfc-2017b-Python-3.6.3 (GPU) [Note: we do not yet have a version built with Intel compilers. Nor have we managed to build these on ada or curie yet]
For non-GPU (CPU only) runs, we used:
python tf_cnn_benchmarks.py --batch_size=32 --model=$model --variable_update=parameter_server --device=cpu --data_format=NHWC
For GPU runs, we used:
python tf_cnn_benchmarks.py --batch_size=32 --model=$model --variable_update=parameter_server --num_gpus=2
Summary
Unsurprising result #1... the CUDA/GPU versions ran faster than the non-CUDA versions .
In the end, we found, unsurprisingly, that in most cases, the locally-built non-GPU version outperformed the Anaconda version by a factor of three (x3) or more. This is due to taking advantage of CPU extensions available for the cluster it was built on such as AVX.
Similarly unsurprisingly we found that the CUDA versions showed little difference in performance. Since both use the same CUDA code the difference on the CPU performance was not as significant an impact.
It should be noted we haven't entirely explored the CPU/GPU interaction and how it could be utilized but given the results it would seem that if CPU/GPU (hybrid) use is desired then the locally built version might have better performance.
Regardless, for now, you should probably stick with the GPU versions. The queues might be longer but the SUs saved and the turnaround once it starts might outweigh the time needed to wait for a GPU.
Results
Terra
A complete description of Terra can be found in the Terra user's guide.
model | CPUx28 | GPUx2 | ||||||
---|---|---|---|---|---|---|---|---|
images/second | time (seconds) | images/second | time (seconds) | |||||
local | Anaconda | local | Anaconda | local | Anaconda | local | Anaconda | |
alexnet | 18.87 | 5.94 | 288.03 | 859.70 | 515.45 | 548.97 | 11.15 | 10.75 |
vgg11 | 2.58 | 0.71 | 2159.76 | 7501.26 | 114.83 | 125.05 | 46.84 | 40.63 |
vgg16 | 1.33 | 0.36 | 4201.86 | 15438.79 | 63.97 | 64.94 | 110.25 | 115.46 |
resnet50 | 3.33 | 0.58 | 1695.20 | 7891.19 | 93.58 | 88.94 | 126.03 | 138.17 |
vgg19 | 1.05 | 0.29 | 5289.64 | 19234.43 | 56.53 | 54.80 | 149.88 | 148.97 |
inception3 | 3.27 | 0.68 | 2747.74 | 9412.49 | 55.41 | 52.86 | 235.85 | 262.88 |
resnet101 | 1.93 | 0.34 | 2827.87 | 13259.07 | 52.87 | 50.73 | 246.10 | 255.40 |
resnet152 | 1.33 | 0.23 | 4020.98 | 19172.05 | 36.57 | 34.89 | 372.59 | 394.98 |
Ada
A complete description of Ada can be found in the Ada user's guide.
Note that we've had some problems installing the non-GPU versions on ada due to (the older) RHEL/CentOS 6. Given the results for terra it seems apparent that we shouldn't spend too much time trying to install them.
Also note we ran into a number of "out of memory" warnings, and then in the end errors, probably due to the NVIDIA Tesla K20's lesser memory than the Tesla K80. So the table below is incomplete for GPUx2 benchmarks.
model | CPUx20 | GPUx2 | ||||||
---|---|---|---|---|---|---|---|---|
images/second | time (seconds) | images/second | time (seconds) | |||||
local | Anaconda | local | Anaconda | local | Anaconda | local | Anaconda | |
alexnet | 456.37 | 463.81 | 11.63 | 10.81 | ||||
vgg11 | 104.99 | 105.87 | 51.13 | 43.88 | ||||
vgg16 | 51.32 | 35.29 | 126.71 | 129.85 | ||||
resnet50 | 88.47 | 68.00 | 140.74 | 148.59 | ||||
vgg19 | 46.25 | 28.85 | 135.28 | 162.56 | ||||
inception3 | 52.93 | 45.42 | 281.28 | 293.71 | ||||
resnet101 | ||||||||
resnet152 |
Curie
A complete description of Curie can be found in the Curie user's guide.
This is just to point out how useless it would be to ask for TensorFlow on curie. Given the above, it should be obvious that curie isn't the platform you want for TensorFlow.
model | CPUx(8 or 32)??? | GPUx2 | ||||||
---|---|---|---|---|---|---|---|---|
images/second | time (seconds) | images/second | time (seconds) | |||||
local | Anaconda | local | Anaconda | local | Anaconda | local | Anaconda | |
alexnet | N/A(1) | N/A(1) | N/A(2) | N/A(1)(2) | N/A(2) | N/A(1)(2) | ||
vgg11 | N/A(1) | N/A(1) | N/A(2) | N/A(1)(2) | N/A(2) | N/A(1)(2) | ||
vgg16 | N/A(1) | N/A(1) | N/A(2) | N/A(1)(2) | N/A(2) | N/A(1)(2) | ||
resnet50 | N/A(1) | N/A(1) | N/A(2) | N/A(1)(2) | N/A(2) | N/A(1)(2) | ||
vgg19 | N/A(1) | N/A(1) | N/A(2) | N/A(1)(2) | N/A(2) | N/A(1)(2) | ||
inception3 | N/A(1) | N/A(1) | N/A(2) | N/A(1)(2) | N/A(2) | N/A(1)(2) | ||
resnet101 | N/A(1) | N/A(1) | N/A(2) | N/A(1)(2) | N/A(2) | N/A(1)(2) | ||
resnet152 | N/A(1) | N/A(1) | N/A(2) | N/A(1)(2) | N/A(2) | N/A(1)(2) |
- N/A(1): the only versions of Anaconda for ppc64 are "little-endian" (i.e. ppc64le) and will not work on curie at present without a complete reinstall which is not on the list of priorities at this point
- N/A(2): there are no GPUs in curie
Scripts
Common run script
Referred to as bmtf.sh in the below job scripts.
#!/bin/bash WITH_GPU=0 if [ "$1" == "--with-gpu" ] ; then WITH_GPU=1 fi RUN_SOURCE=1 RUN_ANACONDA=1 for model in alexnet vgg11 vgg16 resnet50 vgg19 inception3 resnet101 resnet152; do # Parameters to pass to the benchmark if [ 0 -ne $WITH_GPU ] ; then TFPARMS="--batch_size=32 --model=$model --variable_update=parameter_server --num_gpus=2" # from the documentation else TFPARMS="--batch_size=32 --model=$model --variable_update=parameter_server --device=cpu --data_format=NHWC" # need last two for non-GPU fi echo -e "\n### `date` - Running on `hostname` with parameters: python tf_cnn_benchmarks.py $TFPARMS\n" # built from scratch if [ 0 -ne $RUN_SOURCE ] ; then echo -e "\n### Running locally built version of TensorFlow\n" ml purge if [ 0 -ne $WITH_GPU ] ; then ml TensorFlow/1.5.0-goolfc-2017b-Python-3.6.3 else ml TensorFlow/1.5.0-foss-2017b-Python-3.6.3 # not available on ada yet fi ml which python /usr/bin/time -av python /scratch/group/hprc/TensorFlowBM/tensorflow_benchmarks-20180208/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py $TFPARMS fi # installed by conda if [ 0 -ne $RUN_ANACONDA ] ; then echo -e "\n### Running Anaconda version of TensorFlow\n" ml purge ml Anaconda3/5.1.0 if [ 0 -ne $WITH_GPU ] ; then source activate $SCRATCH/myAnaconda3/5.1.0-tensorflow-gpu else source activate $SCRATCH/myAnaconda3/5.1.0-tensorflow fi conda list which python /usr/bin/time -av python /scratch/group/hprc/TensorFlowBM/tensorflow_benchmarks-20180208/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py $TFPARMS source deactivate fi done # EOF
Batch job scripts
Terra
GPU
#!/bin/bash ## ENVIRONMENT SETTINGS; CHANGE WITH CAUTION #SBATCH --export=NONE # Do not propagate environment #SBATCH --get-user-env=L # Replicate login environment ## NECESSARY JOB SPECIFICATIONS #SBATCH --job-name=bmtf-gpu #SBATCH --time=24:00:00 #SBATCH --nodes=1 # Request 1 node #SBATCH --ntasks-per-node=28 # Request all avail cores on node #SBATCH --mem=56G # Request all avail memory on node #SBATCH --output=out.bmtf-gpu.%j #SBATCH --partition=staff # Priority testing queue ## OPTIONAL JOB SPECIFICATIONS #JKPBATCH --account=123456 # Set billing account to 123456 #JKPBATCH --mail-type=ALL # Send email on all job events #JKPBATCH --mail-user=j-perdue@tamu.edu # Where to send email ## GPUS #SBATCH --gres=gpu:2 echo -e "\n# Job submitted from $SLURM_SUBMIT_HOST:$SLURM_SUBMIT_DIR\n" echo "#################### Start: Job Script `basename $0` ###########################" cat $0 cat bmtf.sh echo "#################### End Job Script `basename $0` ###########################" ./bmtf.sh --with-gpu echo "### Ending Job" # EOF
CPU-only
#!/bin/bash ## ENVIRONMENT SETTINGS; CHANGE WITH CAUTION #SBATCH --export=NONE # Do not propagate environment #SBATCH --get-user-env=L # Replicate login environment ## NECESSARY JOB SPECIFICATIONS #SBATCH --job-name=bmtf #SBATCH --time=24:00:00 #SBATCH --nodes=1 # Request 1 node #SBATCH --ntasks-per-node=28 # Request all avail cores on node #SBATCH --mem=56G # Request all avail memory on node #SBATCH --output=out.bmtf.%j #SBATCH --partition=staff # Priority testing queue ## OPTIONAL JOB SPECIFICATIONS #JKPBATCH --account=123456 # Set billing account to 123456 #JKPBATCH --mail-type=ALL # Send email on all job events #JKPBATCH --mail-user=j-perdue@tamu.edu # Where to send email ## GPUS #SBATCH --gres=gpu:0 echo -e "\n# Job submitted from $SLURM_SUBMIT_HOST:$SLURM_SUBMIT_DIR\n" echo "#################### Start: Job Script `basename $0` ###########################" cat $0 cat bmtf.sh echo "#################### End Job Script `basename $0` ###########################" ./bmtf.sh echo "### Ending Job" # EOF
Ada
GPU
#!/bin/bash #BSUB -L /bin/bash #BSUB -J bmtf-gpu #BSUB -W 24:00 #BSUB -R 'span[ptile=20]' #BSUB -n 20 #BSUB -R "rusage[mem=2500]" #BSUB -M 2500 #BSUB -o out.bmtf-gpu.%J #BSUB -q staff #BSUB -R 'select[gpu]' echo -e "\n# Job submitted from $LSB_SUBCWD (`pwd`) using $LSB_MCPU_HOSTS\n" echo "#################### Start: Job Script `basename $0` ($LSB_JOBNAME) ###########################" cat $0 cat bmtf.sh echo "#################### End Job Script `basename $0` ($LSB_JOBnAME) ###########################" ./bmtf.sh --with-gpu echo "### Ending Job" # EOF
CPU-only
#!/bin/bash #BSUB -L /bin/bash #BSUB -J bmtf #BSUB -W 24:00 #BSUB -R 'span[ptile=20]' #BSUB -n 20 #BSUB -R "rusage[mem=2500]" #BSUB -M 2500 #BSUB -o out.bmtf.%J #BSUB -q staff #DISABLEBSUB -R 'select[gpu]' echo -e "\n# Job submitted from $LSB_SUBCWD (`pwd`) using $LSB_MCPU_HOSTS\n" echo "#################### Start: Job Script `basename $0` ($LSB_JOBNAME) ###########################" cat $0 cat bmtf.sh echo "#################### End Job Script `basename $0` ($LSB_JOBnAME) ###########################" ./bmtf.sh echo "### Ending Job" # EOF