Deep Networks for HEP
This page provides example code, datasets and recipes for running HEP Physics analyses using deep neural networks on Cori. The current scripts were those used for the CNN classification and timing studies reported at this ACAT talk.
These contain simulated data with an ATLAS-like detector. Data is available from http://portal.nersc.gov/project/mpccc/wbhimji/RPVSusyData/ . A README is provided in the directory.
Currently data binned into 64x64 images is provided. Unbined data will be provided in due course, together with additional documentation.
Convolutional Neural Network for Classification
This provides a network for classification (RPVSusy signal vs QCD background) on 3-channel (calorimeter + track) whole-detector images as presented this ACAT talk. It implements 3 convolution+pooling units with rectified linear unit (ReLU) activation functions. These layers output into two fully connected layers, with cross-entropy as the loss function and the ADAM optimizer.
Keras code to implement the a convolutional neural net is available at https://github.com/eracah/atlas_dl/tree/micky . This single script is fairly self explanatory and easily run at NERSC following recipes below.
Code for preselection of data as well as for Lasgne/Theano implementations is in the main branch of that repository.
Running at NERSC
An example batch script is given below. Loading the intel-head module sets a variety of KMP* environment variables for best performance as documented on the NERSC Tensorflow page. If using the intel-head module some other libraries may be missing so one can add them with pip --user: e.g. in this case pip install --user h5py. For this script and study it was also necessary to set OMP_NUM_THREADS differently to that in the module (to avoid thread exhaustion) - that may vary for other workloads.
#SBATCH -N 1 -C knl -t 90 -p regular
module load tensorflow/intel-head
python train.py --nb-epochs 10 --nb-events 999999999 --batch-size 512 train.h5 val.h5 CNN
Timings for the network above on Cori are given below. For more details please see the ACAT presentation. 'TF (Intel)' corresponds to what is now the tensorflow/intel-1.2 module while 'latest' is tensorflow/intel-head.