Login versus Compute Nodes
On Cray system login nodes, use Python as you normally would in any Unix environment, but be mindful of resource consumption since login nodes are shared by many users at the same time. To execute a Python script in the Edison or Cori batch/interactive environment (via sbatch/salloc) use srun:
srun -n 1 python ./hello-world.py
Of course, if the script has executable permission and contains "#!/usr/bin/env python" as its first line, the "python" can be omitted:
srun -n 1 ./hello-world.py
Matplotlib on Compute Nodes
Using Matplotlib to interactively plot on the login nodes is easy, especially if you use NX. But if you are running a Python script on compute nodes that imports Matplotlib (even if it doesn't make any plot files), it is important to specify a "backend." There are a few ways to do this, one is to simply tell Matplotlib to use a particular backend in your script as below:
matplotlib.use( "Agg" )
import matplotlib.pyplot as plt
The "Agg" backend is guaranteed to be available, but there are other choices. If a backend is not specified in some way, then Matplotlib will seek out an X11 connection on the compute nodes in your job and the result is that it your job may simply wait until the wall-clock limit is reached. More technical details are available in the Matplotlib FAQ, "What is a Backend?" and the matplotlib.use API documentation.
Parallelism in Python
Many scientists have come to appreciate Python's power for developing scientific computing applications. Creating such applications that scale in modern high-performance computing environments can be a challenge. There are a number of approaches to parallel processing in Python. Here we describe approaches that we know work for users at NERSC. For advice on scaling up Python applications, see this page.
Python's Multiprocessing Module
Python's standard library provides a multiprocessing package that supports spawning of processes. This can be used to achieve some level of parallelism within a single compute node. It cannot be used to achieve parallelism across compute nodes. For that, users are referred to the discussion on mpi4py below. If you are using the multiprocessing module, be sure to tell srun to use all the threads available on the node with the "-c" argument. For example, on Cori use:
srun -n 1 -c 64 python script-using-multiprocessing.py
This makes 32 physical cores and 32 hyperthreads available for use by multiprocessing.
MPI for Python (mpi4py, pyMPI)
These expose MPI standard bindings to the Python programming language. Documentation on mpi4py is available here and useful collection of example scripts can be found here. The similar pyMPI package is also provided by NERSC as a module, but is not as well-documented. An example of using mpi4py on an Edison compute node is shown below:
% cat mympi.py #!/usr/bin/env python from mpi4py import MPI me = MPI.COMM_WORLD.Get_rank() nproc = MPI.COMM_WORLD.Get_size() print me, nproc % cat runit #!/bin/bash #SBATCH -N 1
#SBATCH -n 24 #SBATCH -t 00:05:00 #SBATCH -p debug module load python module load mpi4py srun -n 24 python-mpi ./mympi.py % sbatch runit
Submitted batch job 929783
% cat slurm-929783.out ... 9 24
3 24 ... 0 24 ...