Burst Buffer FAQ
The Burst Buffer is available for all to use - and is easy to use! This page is intended as a quick source of information for NERSC users interested in getting started with the Burst Buffer.
Burst Buffer Basics
What is a Burst Buffer?
The Burst Buffer is a layer of SSD storage in Cori that sits within the high-speed network, between the on-node storage and the Lustre and GPFS file systems. It offers high-performance I/O on a per-job or short-term basis.
Can I use the Burst Buffer?
Yes! The Burst Buffer is open to all NERSC users.
Can I use the Burst Buffer on Edison?
No, the Burst Buffer is only available on Cori, because the Burst Buffer nodes are integrated within the fabric of the Cori supercomputer.
Why should I use the Burst Buffer?
If you have an application that spends a lot of time in I/O (e.g. in checkpointing), or if you have an I/O pattern that is unfriendly to spinning-disk storage (e.g. read/write lots of small files, random reads from a large database).
What kind of reservation can I get on the Burst Buffer?
There are two types of reservation available:
- per-job or "scratch" reservation: this type of reservation is requested for the duration of a compute job and is not persisted beyond the end of the compute job. You can stage or copy data in and out of this reservation, and access the data in your batch job via an environmental variable that points to the mount point of the BB on your compute nodes.
- persistent reservation: this type of reservation is created and destroyed by the user, and will last as long as the user specifies. Multiple compute jobs can access a persistent reservation, and if the permissions are set appropriately, multiple users as well. Users can stage data in/out of a persistent reservation at any point in its lifespan. Note that we ask users to limit usage of a persistent reservation (see next question!).
How long can I keep a persistent reservation?
We ask that you remove your persistent reservation after 6 weeks. We may remove reservations older than 8 weeks (after warning you). If you have a compelling reason to retain your reservation for longer than 6 weeks, please let us know.
How much space can I use on the Burst Buffer?
Users have a quota of 50TB on the Burst Buffer. If you need more space than this, it may be available on a temporary basis for special cases - if this is the case, please talk to NERSC consultants.
How can I use the Burst Buffer interactively?
The burst buffer is not mounted on the login nodes, so it must be accessed via a compute node. If you want to work interactively with the Burst Buffer, you can use the interactive queue via salloc, by providing a configuration script.
How can I share my Burst Buffer persistent reservation with my collaborators?
If you set access permissions correctly, you can open your Burst Buffer persistent reservation with your collaborators. For example, if your reservation is called "myBBname" you can do: "chmod 770 $DW_PERSISTENT_STRIPED_myBBname/" (note that this opens the reservation to all NERSC users in your unix group). You collaborators can then access the directory via the variable $DW_PERSISTENT_STRIPED_myBBname in the usual way.
How can I tell if I've filled up my Burst Buffer reservation?
If you fill up your reservation on the Burst Buffer, you can run into some issues that can make your reservation inaccessible until a sys admin clears the problem. We advise requesting more space then you think you'll need to avoid this issue. The only way you can tell how much of your reservation you've used it to use the "du" command: e.g. "du -h $DW_JOB_STRIPED" within your compute job.
Will my code automagically look for all my data on the Burst Buffer if I request an allocation?
No, your code will not automagically know that you are using the Burst Buffer. If you request a Burst Buffer reservation and stage data to it, you still need to tell your code where to look for that data. The underlying DataWarp software only mounts your reservation on the compute nodes - it won't interact with your code in any way.
How can I tell what files I have on my persistent reservation?
Burst Buffer reservations can only be mounted on compute nodes - they cannot be accessed from login nodes. This means you need a compute job in order to see what files you have on your persistent reservation. An easy way to do this is to start a job in the interactive queue using a configuration file.
Why is my IO performance so good/bad in private mode?
Private mode has client-side caching enabled. This will result in an improved IO performance if you have small (kilobyte-scale) or random reads/writes. However, if you have larger (megabyte-scale) transaction sizes then you are likely to see worse IO performance in private mode. To avoid client-side caching in private mode, use access_mode=private(client_cache=no).
How can I put my data onto the Burst Buffer?
The most efficient way to transfer data onto the Burst Buffer is to use the stage_in command as part of your batch script. Using this command, DataWarp will create your BB reservation and transfer your data from $CSCRATCH to that reservation without going through a compute node, so it can stage in very quickly. Then when your compute job rises to the top of the queue and is reservation to compute nodes, your reservation can be mounted on those compute nodes with the data already in place. It also means you won't spend your valuable compute time copying data across the system. You can also copy your data to/from our file systems using "cp" in your compute job.
How can I remove my data from the Burst Buffer?
You have three options here:
- stage_out your data from the Burst Buffer to $CSCRATCH using the stage_out command. This will execute at the end of your compute job, and will transfer the data quickly without charging you valuable compute time. Note that data is not removed from the Burst Buffer if it is staged out - if you are staging out from a persistent reservation then the data will not be deleted.
- copy your data within your compute job using "cp". This is the only way to directly transfer data to your home or project directories.
- do nothing. At the end of a compute job that uses a "scratch" reservation on the Burst Buffer, the reservation will be torn down and all data deleted.
Can I stage_in or stage_out data in my /project or my home directory?
At present, you cannot stage data in/out of the Burst Buffer from anywhere other than $CSCRATCH. If you need to move data around from one of our other file systems, please use "cp" within your compute job.
Why is my stage_out so slow?
If you are staging out large files (>TB) to a new directory on $CSCRATCH, you may find your stage_out is very slow. Datawarp will not do any striping for you, so each file will be hitting one OST. If you create a folder on Lustre with appropriate striping, and stage_out to that directory, then your files will be striped over multiple OSTs and the write bandwidth will be split between those OSTs. For example, if you stripe over 2 OSTs then your write bandwidth will roughly double, and your stage_out time will be halved.
Monitoring the Burst Buffer
Why is my Burst Buffer job pending forever?
There are a number of reasons why this might happen - many mistakes in requesting a Burst Buffer reservation do not show up as a failed job, but as a job that won't start. Here are some examples:
- The queue is very busy, and your wait time is not related to the Burst Buffer.
- You have tried to stage_in a file that does not exist. Check for this error using the command "squeue -l -j jobid".
- You have tried to create a persistent reservation with the same name as an existing reservation. Check for this error using the command "squeue -l -j jobid".
- You are staging in vast quantities of data - more than your quota of 50TB, or many millions of files. Check for this error using the command "squeue -l -j jobid".
- You have another running job that is using the Burst Buffer such that the combined requested reservation is over your quota of 50TB. Subsequent jobs will wait in the queue until the earlier jobs have finished - the total amount of BB reservation in use at any one time cannot exceed 50TB per user.
"squeue -l" is the best way to spot if you have any Burst Buffer error messages associated with your job. These will not show up with "sqs".
How much free space is available on the Burst Buffer?
Who is using the Burst Buffer?
The command "scontrol show burst" will tell you everything going on with the Burst Buffer - how much space is used and available, and who is using what.