NERSCPowering Scientific Discovery Since 1974

Python

(documentation under active revision)

Description and Overview

Python is an interpreted, general-purpose high-level programming language. Various versions of Python are installed on NERSC systems, usually along with a number of scientific computing libraries like numpy and scipy, and visualization libraries like matplotlib.  

On Cori and Edison, Python is available either as a NERSC-built module or through the Anaconda distribution.  Both approaches require at least one "module load" command.  Using the system-provided Python (such as from /usr/bin) is not advised except for the simplest tasks, as it is generally a much older version of Python than provided by NERSC.

Python users should also be interested in the IPython/Jupyter notebook web applications hosted at NERSC on an experimental basis.

NERSC-built Python Modules

To use the NERSC-built Python installation, type:

% module load python

This loads the default version of the following useful core modules:

  • python_base: Python 2.x or 3.x module.
  • numpy: Defines the numerical array and matrix type and basic operations on them.
  • scipy: Uses numpy to do advanced math, signal processing, optimization, statistics and much more.
  • matplotlib: Plotting/visualization library.
  • ipython: Interactive python shell offering introspection, rich media, shell syntax, tab completion, and history.

For details on these packages and others mentioned on this page, users are referred to their respective documentation web sites.

In this scheme, other individual packages are installed in their own directories, separate from the base Python's site-packages directory. A module file is provided for each individual package to prepend its directory to PYTHONPATH, the search path for Python modules. This allows for multiple versions of each package that users can mix and match by choosing the best or most suitable version for their work with any base Python package.

The version numbers for a Python module and its underlying python_base module (i.e., x.y.z in python/x.y.z and python_base/x.y.z, respectively) are taken from the version number of the distributed Python software package (x.y.z in Python-x.y.z.tgz).  Users can switch a loaded default module to a different version by using a module swap command:

% module swap numpy numpy/1.7.1

There are many additional individual packages provided under this framework besides those listed above, all loadable as separate modules. The current list of modules available includes:

  • ase
  • cython
  • fireworks
  • lmfit
  • mpi4py
  • mysqlpython
  • netcdf4-python
  • numexpr
  • phonopy
  • pympi
  • pyyaml
  • scikit-learn

If you need a Python package installed that may be generally useful to the NERSC user community, please contact us and we will work to provide it as a module.  If you would like to install Python packages at NERSC yourself, see below.

Anaconda Python Distribution

Anaconda is a distribution of Python for large-scale data processing, predictive analytics, and scientific computing.  It includes a collection of over 150 open source packages.  Addtional packages are available through binstar. On Edison and Cori both Anaconda Python 2.7 and 3.4 are available through a module load command.  For example, to load the Python 2.7.10 Anaconda environment on Cori, type:

% module load python/2.7-anaconda

The Anaconda distribution provides access to conda, an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them.

NERSC can install Anaconda-provided packages for users upon request, simply contact us.  To see a list of packages installed under an Anaconda distribution, use the conda tool:

% conda list

You can create a "spec list" to construct an identical environment across machines by typing "conda list -e" --- for more information, see the conda documentation.

You can also install packages yourself using the conda tool to create an environment, switch to it, and populate it with packages.  An environment may be installed and referred to either by path or by name.  Here is how to create an environment by specifying a path (note at least one python package must be specified to create the environment):

% module load python/2.7-anaconda
% conda create -p $SCRATCH/nersc-rocks numpy
% source activate $SCRATCH/nersc-rocks
(/scratch2/scratchdirs/u/user/nersc-rocks)% conda install pyyaml

Alternatively, you may specify your own path for all your environments to be installed under, and refer to them by name.  This is somewhat less cumbersome but requires you to use the "conda config" command to set up a ~/.condarc file (documentation here):

% module load python/2.7-anaconda
% conda config --add envs_dirs $SCRATCH/envs
% conda config --add channels defaults
% conda config --set show_channel_urls yes
% conda create -n nersc-rocks numpy
(nersc-rocks)% conda install ...

An example .condarc file at NERSC might look like the listing below.  Here both the Edison and Cori scratch paths are listed and should not interfere with each other on either system.

envs_dirs:
- /global/cscratch1/sd/username/envs
- /scratch2/scratchdirs/username/envs
channels:
- defaults
show_channel_urls: yes

To leave a created environment behind...

(nersc-rocks)% source deactivate

Running Python Scripts

On the login nodes, use Python as you normally would in any Unix-like environment.  To execute a Python script in the Edison or Cori batch/interactive environment (via sbatch/salloc) use srun:

srun -n 1 python ./hello-world.py

Of course, if the script has executable permission and contains "#!/usr/bin/env python" as its first line, the "python" can be omitted:

srun -n 1 ./hello-world.py

Parallelism in Python

Many scientists have come to appreciate Python's power for developing scientific computing applications.  Creating such applications that scale in modern high-performance computing environments can be a challenge.  There are a number of approaches to parallel processing in Python.  Here we outline those that NERSC has been able to provide, and strategies for achieving scalability so users can make the most of what NERSC has to offer.

Python's Multiprocessing Module

Python's standard library provides a multiprocessing package that supports spawning of processes.  This can be used to achieve some level of parallelism within a compute node.  It cannot be used to achieve parallelism across compute nodes.  For that, users are referred to the discussion on mpi4py and other modules below.  If you are using the multiprocessing module, be sure to tell srun to use all the threads available on the node with the "-c" argument.  For example, on Cori use:

srun -n 1 -c 64 python script-using-multiprocessing.py

This makes 32 physical cores and 32 hyperthreads available for use by multiprocessing.

MPI for Python (mpi4py, pyMPI)

These expose MPI standard bindings to the Python programming language.  Documentation on mpi4py is available here and useful collection of example scripts can be found here.  The similar pyMPI package is also provided by NERSC as a module, but is not as well-documented.  An example of using mpi4py on an Edison compute node is shown below:

% cat mympi.py
#!/usr/bin/env python
from mpi4py import MPI
me = MPI.COMM_WORLD.Get_rank()
nproc = MPI.COMM_WORLD.Get_size()
print me, nproc

% cat runit
#!/bin/bash
#SBATCH -N 1
#SBATCH -n 24 #SBATCH -t 00:05:00 #SBATCH -p debug module load python module load mpi4py srun -n 24 python-mpi ./mympi.py % sbatch runit
Submitted batch job 929783
% cat slurm-929783.out ... 9 24
14 24
3 24 ... 0 24 ...

Scaling Python Package Imports

TBD: Python import can be slow at NERSC.  Leverage /usr/common as much as possible.  Use $SCRATCH for user-supported python software stacks.  Other solutions include DLFM, python-mpi-bcast, pyinstaller, and (soon) shifter.

The DLCache library is a set of functions that can be incorporated into a dynamically-linked application to provide improved performance during the loading of dynamic libraries when running the application at large scale on Cray XE6. To access this library, do

module load dlcache

Please read the user guide to use the tools provided by this library. This library is still in experimental stage, please report any problems, comments you see with this library to "consult at nersc dot gov".

Installing Your Own Python Package

Users can install their own python packages in their home directories. Here is an example installing a python package called setuptools:

    • Un-tar the package and cd into the package directory

% tar xvf setuptools-0.6c11.tar.gz
% cd setuptools-0.6c11/
    • Create a directory hierarchy for your python module libraries. Example: $HOME/python_modules/machine_name/lib/python

    • Set the PYTHONPATH environment variable

% setenv PYTHONPATH $HOME/python_modules/machine_name/lib/python
    • Install your python package with the default version of python in /usr/bin/python

% python setup.py install --home=~/python_modules/machine_name
    • OR Install your python package with alternate version of python. (There could be a more recent version of python as a module)

% module load python
% python setup.py install --home=~/python_modules/machine_name
    • Test your installation

% python
>> import setuptools
>> ...

There is a problem with using the pip command on Hopper or Edison because of a SSL certificates verification issue. For a workaround, create certificates, ~/.pip/cabundle by running the 'mk_pip_cabundle.sh' once:

% mk_pip_cabundle.sh

Then, use the pip command:

% pip --cert ~/.pip/cabundle install some_package

Or you can use the pip command without the '--cert ~/.pip/cabundle' option if the pip configure file, ~/.pip/pip.conf, points to this file:

% cat ~/.pip/pip.conf
[global]
cert = /global/homes/s/someone/.pip/cabundle

Availability

You can check the availability of python by using the following command:

module avail python

You can also refer to the table below for the version of installed computational tools along with python.

PackagePlatformCategoryVersionModuleInstall DateDate Made Default
Python babbage applications/ programming 2.7-anaconda python/2.7-anaconda 2015-06-04 2015-06-04
 Python programming language
Python babbage applications/ programming 3.4-anaconda python/3.4-anaconda 2015-06-04
 Python programming language
Python cori applications/ programming 2.7-anaconda python/2.7-anaconda 2015-10-02
 Python programming language
Python cori applications/ programming 2.7.10 python_base/2.7.10 2015-10-02 2015-10-02
 Base package for Python programming language
Python cori applications/ programming 2.7.10 python/2.7.10 2015-10-02 2015-10-02
 Python programming language
Python cori applications/ programming 3.4-anaconda python/3.4-anaconda 2015-10-02
 Python programming language
Python edison applications/ programming 2.7 python/2.7 2015-04-09
 Python programming language
Python edison applications/ programming 2.7-anaconda python/2.7-anaconda 2015-04-30
 Python programming language
Python edison applications/ programming 2.7.3 python/2.7.3 2013-01-17 2013-01-17
 Python programming language
Python edison applications/ programming 2.7.3 python_base/2.7.3 2013-01-17 2013-01-17
 Base package for Python programming language
Python edison applications/ programming 2.7.5 python_base/2.7.5 2013-11-06 2013-11-19
 Base package for Python programming language
Python edison applications/ programming 2.7.5 python/2.7.5 2013-11-06 2013-11-19
 Python programming language
Python edison applications/ programming 2.7.9 python_base/2.7.9 2015-01-09 2015-05-20
 Base package for Python programming language
Python edison applications/ programming 2.7.9 python/2.7.9 2015-05-19 2015-05-20
 Python programming language
Python edison applications/ programming 3.4 python/3.4 2015-04-10
 Python programming language
Python edison applications/ programming 3.4-anaconda python/3.4-anaconda 2015-04-30
 Python programming language
python genepool applications/ programming 2.7.10 python/2.7.10 2015-11-23
 Python Programming Language
python genepool applications/ programming 2.7.3 python/2.7.3 2012-07-05 2012-07-05
 Python Programming Language
python genepool applications/ programming 2.7.3_1 python/2.7.3_1 2012-07-20 2012-07-20
 Python Programming Language
python genepool applications/ programming 2.7.3_2 python/2.7.3_2 2013-01-24 2013-01-27
 Python Programming Language
python genepool applications/ programming 2.7.4 python/2.7.4 2013-04-17 2013-05-21
 Python Programming Language
python genepool applications/ programming 3.2.3_1 python/3.2.3_1 2012-07-28
 Python Programming Language
python genepool applications/ programming 3.4.3 python/3.4.3 2015-06-25
 Python Programming Language
python genepool_sl6 applications/ programming 2.7.4 python/2.7.4 2014-12-12 2014-12-12
 Python Programming Language
python genepool_sl6 applications/ programming 2.7.8 python/2.7.8 2014-12-12
 Python Programming Language
Python pdsf applications/ programming 2.6.2 python/2.6.2 2012-03-12
 Python built in sl44
Python pdsf applications/ programming 2.7 python/2.7 2012-03-12
 Python built in sl44
Python pdsf applications/ programming 2.7.1 python/2.7.1 2012-03-12 2012-03-12
 Python built in sl44
Python pdsf_sl6 applications/ programming 2.7.10 python/2.7.10 2015-08-07
 Python programming language. From anaconda installation, sanitized for NERSC
Python pdsf_sl6 applications/ programming 2.7.3 python/2.7.3 2012-10-08 2012-10-08
 Python programming language. Also includes numpy, matplotlib, mysqlpython.
Python pdsf_sl6 applications/ programming 2.7.3 python_base/2.7.3 2013-07-05 2013-07-05
 Python programming language. Loads only python, no extra modules.
Python pdsf_sl6 applications/ programming 2.7.6 python_base/2.7.6 2014-04-04 2014-07-07
 Python programming language. Loads only python, no extra modules.
Python pdsf_sl6 applications/ programming 2.7.6 python/2.7.6 2014-04-18 2014-07-07
 Python programming language. Also includes numpy, matplotlib, mysqlpython.
Python pdsf_sl6 applications/ programming 2.7.9 python/2.7.9 2015-05-11
 Python programming language. From anaconda installation, sanitized for NERSC
Python pdsf_sl6 applications/ programming 3.4.3 python/3.4.3 2015-08-19
 Python programming language. From anaconda installation, sanitized for NERSC
python phoebe applications/ programming 2.7.3 python/2.7.3 2013-06-18
 Python Programming Language
python phoebe applications/ programming 2.7.3_1 python/2.7.3_1 2013-06-18
 Python Programming Language
python phoebe applications/ programming 2.7.3_2 python/2.7.3_2 2013-06-18
 Python Programming Language
python phoebe applications/ programming 2.7.4 python/2.7.4 2013-06-18 2013-06-26
 Python Programming Language
python phoebe applications/ programming 3.2.3_1 python/3.2.3_1 2013-06-18
 Python Programming Language
Python3 scigate applications/ programming 3.4.2 python/3.4.2 2015-01-08
 Python programming language