Run serial Python scripts on a login node, or on a compute node in an interactive session (started via salloc) or batch job (submitted via sbatch) as you normally would in any Unix-like environment. On login nodes, please be mindful of resource consumption since those nodes are shared by many users at the same time.
Parallel Python scripts launched in an interactive (salloc) session or batch job (sbatch), such as those using MPI via the mpi4py module, must use srun to launch:
srun -n 64 python ./hello-world.py
Matplotlib on Compute Nodes
Using Matplotlib to interactively plot on the login nodes is easy, especially if you use NX. But if you are running a Python script on compute nodes that imports Matplotlib (even if it doesn't make any plot files), it is important to specify a "backend." There are a few ways to do this, one is to simply tell Matplotlib to use a particular backend in your script as below:
matplotlib.use( "Agg" )
import matplotlib.pyplot as plt
The "Agg" backend is guaranteed to be available, but there are other choices. If a backend is not specified in some way, then Matplotlib will seek out an X11 connection on the compute nodes in your job and the result is that it your job may simply wait until the wall-clock limit is reached. More technical details are available in the Matplotlib FAQ, "What is a Backend?" and the matplotlib.use API documentation.
Parallelism in Python
Many scientists have come to appreciate Python's power for developing scientific computing applications. Creating such applications that scale in modern high-performance computing environments can be a challenge. There are a number of approaches to parallel processing in Python. Here we describe approaches that we know work for users at NERSC. For advice on scaling up Python applications, see this page.
Python's Multiprocessing Module
Python's standard library provides a multiprocessing package that supports spawning of processes. This can be used to achieve some level of parallelism within a single compute node. It cannot be used to achieve parallelism across compute nodes. For that, users are referred to the discussion on mpi4py below. If you are using the multiprocessing module, be sure to tell srun to use all the threads available on the node with the "-c" argument. For example, on Cori use:
srun -n 1 -c 64 python script-using-multiprocessing.py
This makes 32 physical cores and 32 hyperthreads available for use by multiprocessing.
MPI for Python (mpi4py)
This library exposes MPI standard bindings to the Python programming language. Documentation on mpi4py is available here and useful collection of example scripts can be found here. An example of using mpi4py on a Cori compute node (using Anaconda Python 2.7) is shown below.
% cat mympi.py #!/usr/bin/env python from mpi4py import MPI me = MPI.COMM_WORLD.Get_rank() nproc = MPI.COMM_WORLD.Get_size() print me, nproc % cat runit #!/bin/bash #SBATCH -N 1
#SBATCH -C haswell
#SBATCH -n 32 #SBATCH -t 00:05:00 #SBATCH -p debug module load python srun -n 32 python ./mympi.py % sbatch runit
Submitted batch job 929783
% cat slurm-929783.out ... 9 32
3 32 ... 0 32 ...