Hprc banner tamu.png

Difference between revisions of "SW:R-CNN"

From TAMU HPRC
Jump to: navigation, search
(foss/2018b)
(Build Detectron)
 
(14 intermediate revisions by the same user not shown)
Line 13: Line 13:
  
 
=== foss/2018b ===
 
=== foss/2018b ===
 +
 +
==== Download sources ====
 
Download, via Git, the needed sources.  Note: we used the system git here (no module), but if you have problems you may try loading a Git module.
 
Download, via Git, the needed sources.  Note: we used the system git here (no module), but if you have problems you may try loading a Git module.
 
<pre>
 
<pre>
Line 24: Line 26:
 
</pre>
 
</pre>
  
Clean the module environment and install directory.
+
==== Create virtual environment ====
 +
For a clean (re)start, remove previous virtual environment directory.
 
<pre>
 
<pre>
ml purge
 
 
rm -rf $SCRATCH/Detectron-foss-2018b # remove previous attempt, if there was one.
 
rm -rf $SCRATCH/Detectron-foss-2018b # remove previous attempt, if there was one.
 
</pre>
 
</pre>
Line 32: Line 34:
 
Create and activate a Python VE to install into.
 
Create and activate a Python VE to install into.
 
<pre>
 
<pre>
ml Python/3.6.6-foss-2018b
+
ml purge  # module purge
 +
ml Python/3.6.6-foss-2018b  # module load Python/3.6.6-foss-2018b
 
python -m venv $SCRATCH/Detectron-foss-2018b
 
python -m venv $SCRATCH/Detectron-foss-2018b
 
source /scratch/user/j-perdue/Detectron-foss-2018b/bin/activate
 
source /scratch/user/j-perdue/Detectron-foss-2018b/bin/activate
Line 41: Line 44:
 
pip install --upgrade pip setuptools
 
pip install --upgrade pip setuptools
 
</pre>
 
</pre>
 +
 +
==== Install pytorch/caffe2 ====
  
 
Load a newer CMake module (system cmake is too old) need for the below.
 
Load a newer CMake module (system cmake is too old) need for the below.
Line 47: Line 52:
 
</pre>
 
</pre>
  
Install the modules needed by Detectron (and pytorch).
+
Install needed module(s).
 
<pre>
 
<pre>
# install non-binary numpy first so the binary version isn't brought in by opencv-python
+
pip install pyaml
pip install --no-binary :all: numpy
 
# we install the binary version of opencv-python (listed in requirements.txt) first since building from source seems to have problems.
 
pip install --only-binary :all: opencv-python
 
 
 
#install scipy and kiwisolver from wheel to avoid lapack problems
 
pip install scipy kiwisolver
 
 
 
# install other needed modules from source
 
pip install -r $SCRATCH/tmp/Detectron/requirements.txt
 
 
</pre>
 
</pre>
  
Line 66: Line 62:
 
</pre>
 
</pre>
  
On ada, edit line 185 of $SCRATCH/tmp/pytorch/torch/csrc/DataLoader.cpp, change:
+
On ada, edit line 185 of $SCRATCH/tmp/pytorch/torch/csrc/DataLoader.cpp and change:
 
<pre>
 
<pre>
 
throw ValueError("Cannot find worker information for _BaseDataLoaderIter with id %" PRId64, key);
 
throw ValueError("Cannot find worker information for _BaseDataLoaderIter with id %" PRId64, key);
Line 79: Line 75:
 
<pre>
 
<pre>
 
cd $SCRATCH/tmp/pytorch
 
cd $SCRATCH/tmp/pytorch
 +
rm -rf build  # remove previous attempts
 
python setup.py install
 
python setup.py install
 
</pre>
 
</pre>
 
+
Go have lunch or something. This will take a while (just under 2 hours on terra).
Fails here if cuDNN/CUDA was loaded with:
 
<pre>
 
/sw/eb/sw/CUDA/9.2.148.1/bin/..//include/crt/host_config.h:119:2: error: #error -- unsupported GNU version! gcc versions later than 7 are not supported!
 
</pre>
 
so we need to rework these instructions for foss/2018b.
 
  
 
Test.
 
Test.
Line 100: Line 92:
 
</pre>
 
</pre>
  
 +
==== Install COCO API ====
 
Install the [https://github.com/cocodataset/cocoapi COCO API].
 
Install the [https://github.com/cocodataset/cocoapi COCO API].
 +
 +
Install needed module(s).
 +
<pre>
 +
pip install numpy
 +
</pre>
 +
 +
Build/install.
 
<pre>
 
<pre>
 
cd $SCRATCH/tmp/cocoapi/PythonAPI
 
cd $SCRATCH/tmp/cocoapi/PythonAPI
Line 106: Line 106:
 
</pre>
 
</pre>
  
Build Detectron.
+
==== Build Detectron ====
 +
 
 +
Install the needed module(s).
 +
 
 +
On ada, install opencv-python.
 +
<pre>
 +
# on ada, we install the binary version of opencv-python (listed in requirements.txt) first since building from source seems to have problems.
 +
pip install --only-binary :all: opencv-python
 +
</pre>
 +
 
 +
Install other needed modules.
 +
<pre>
 +
pip install -r $SCRATCH/tmp/Detectron/requirements.txt
 +
</pre>
 +
 
 +
Build.
 
<pre>
 
<pre>
 
cd $SCRATCH/tmp/Detectron
 
cd $SCRATCH/tmp/Detectron

Latest revision as of 11:13, 14 August 2020

"Region Based Convolutional Neural Networks (R-CNN) are a family of machine learning models for computer vision and specifically object detection." -- Wikipedia

The page above mentions a number of packages available for using R-CNNs. For now, this page will concentrate on Detectron2.

Note that these instructions are for building from source using a Python vitual environment so we can get optimizations for the current CPU/machine. We explicitly do NOT use Anaconda (Python for newbies) which uses precompiled binaries for CPU architectures from over a decade ago which are poorly suited for high-performance computing (HPC) in the 2020s.

Detectron

We'll start with single-node (no MPI) Detectron, the predecessor to Detecron2, since we've successfully built it on ada (terra test to come). The instructions for Detectron2 (not written/tested) will be added below later.

These steps come from the Detectron's INSTALL.md and from the Caffe2 instructions for building from source.

Installing Detectron in a Python virtual environment on HPRC clusters

foss/2018b

Download sources

Download, via Git, the needed sources. Note: we used the system git here (no module), but if you have problems you may try loading a Git module.

mkdir $SCRATCH/tmp
cd $SCRATCH/tmp
git clone https://github.com/facebookresearch/Detectron.git
git clone https://github.com/cocodataset/cocoapi.git
git clone https://github.com/pytorch/pytorch.git # for caffe2
cd pytorch
git submodule update --init --recursive

Create virtual environment

For a clean (re)start, remove previous virtual environment directory.

rm -rf $SCRATCH/Detectron-foss-2018b # remove previous attempt, if there was one.

Create and activate a Python VE to install into.

ml purge  # module purge
ml Python/3.6.6-foss-2018b  # module load Python/3.6.6-foss-2018b
python -m venv $SCRATCH/Detectron-foss-2018b
source /scratch/user/j-perdue/Detectron-foss-2018b/bin/activate

Update pip/setuptools.

pip install --upgrade pip setuptools

Install pytorch/caffe2

Load a newer CMake module (system cmake is too old) need for the below.

ml CMake/3.12.1-GCCcore-7.3.0

Install needed module(s).

pip install pyaml

Load cuDNN/CUDA for GPU support (WARNING: adding these currently causing problems on terra

ml cuDNN/7.6.5.32-CUDA-9.2.148.1  # See WARNING

On ada, edit line 185 of $SCRATCH/tmp/pytorch/torch/csrc/DataLoader.cpp and change:

throw ValueError("Cannot find worker information for _BaseDataLoaderIter with id %" PRId64, key);

to

throw ValueError("Cannot find worker information for _BaseDataLoaderIter with id %" "ld", key);

to avoid error with PRId64 on RHEL6.

Install pytorch/caffe2.

cd $SCRATCH/tmp/pytorch
rm -rf build  # remove previous attempts
python setup.py install

Go have lunch or something. This will take a while (just under 2 hours on terra).

Test.

cd  # don't run in pytorch directory (fails)

# To check if Caffe2 build was successful
python -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"

# To check if Caffe2 GPU build was successful
# This must print a number > 0 in order to use Detectron
python -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices())'

Install COCO API

Install the COCO API.

Install needed module(s).

pip install numpy

Build/install.

cd $SCRATCH/tmp/cocoapi/PythonAPI
make install

Build Detectron

Install the needed module(s).

On ada, install opencv-python.

# on ada, we install the binary version of opencv-python (listed in requirements.txt) first since building from source seems to have problems.
pip install --only-binary :all: opencv-python

Install other needed modules.

pip install -r $SCRATCH/tmp/Detectron/requirements.txt

Build.

cd $SCRATCH/tmp/Detectron
make install

Test Detectron.

python $SCRATCH/tmp/Detectron/detectron/tests/test_spatial_narrow_as_op.py

This will fail if pytorch/caffe2 were build without cuDNN/CUDA.

Detectron2

"Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up rewrite of the previous version, Detectron, and it originates from maskrcnn-benchmark." --Detectron2 site

It also includes support for Fast R-CNN, Faster R-CNN and other R-CNNs.

See the Directron2 site for using and training. For now, this page will only cover installation.

Installing Detectron2 in a Python virtual environment on HPRC clusters

foss/2019b

This is a basic/starter build. Note that this build does not include a CUDA-enabled OpenMPI so is limited to the GPUs on a single node.

Modules used include:

Make/3.15.3-GCCcore-8.3.0
Python/3.7.4-GCCcore-8.3.0
cuDNN/7.0.5-CUDA-9.0.176
(optional?) Graphviz/2.42.2-foss-2019b

Start with a clean module environment and install directory.

ml purge
rm -rf $SCRATCH/Detectron2-foss-2019b

Create and activate a Python VE to install into.

ml Python/3.7.4-GCCcore-8.3.0
python -m venv $SCRATCH Detectron2-foss-2019b

fosscuda/2018b

This build includes a CUDA-enabled OpenMPI for using multiple GPU nodes to speed up processing.

CMake/3.12.1-GCCcore-7.3.0
Python-3.6.6-fosscuda-2018b