NERSCPowering Scientific Discovery Since 1974

Workflow Tools

peg

Supporting data-centric science often involves the movement of data across file systems, multi-stage analytics and visualization. Workflow technologies can improve the productivity and efficiency of data-centric science by orchestrating and automating these steps. NERSC provides support for the TaskFarmer, Swift and Fireworks tools. We also maintain other packages like Tigres that can help users build workflows. 

This page describes the current set of tools and services in the Workflow ecosystem at NERSC. The best tool for your workflow will depend strongly on your personal preference; we make the following general recommendations. 

  • If you need to write parallel scripts that run many copies of ordinary programs concurrently in various workflow patterns, consider using Swift or TaskFarmer.
  • If you have a larger number of single or multi-core jobs that need to run in parallel and may have varying wall times, consider TaskFarmer
  • If you have a large number of MPI jobs to orchestrate, consider using Swift or Fireworks. 
  • If you need to run a long-term campaign over diverse compute resources, consider Fireworks. 
  • If you need a complex or dynamic workflow (i.e. a dependent chain of tasks), consider Fireworks. 

 

Many workflow tools need to run on a login node for long periods of time while they monitor job execution and throughput. Long-running jobs are not encouraged on the login nodes, so on Cori NERSC has provisioned a set of dedicated nodes to run this kind of application. Note that these nodes are still subject to system-wide outages, but will usually see less contention of resources than on a login node. If you require access to a workflow node, please email consult@nersc.gov to make the request. 

 

 

logo swift

Swift

Swift is a scientific workflow scripting tool that enables scientists to run parallel workflows at scale at NERSC. Read More »

Fireworks

Fireworks is a flexible workflow tool that enables NERSC users to run complex, dynamic workflows, tracking metadata and provenance. It is particularly useful for high throughput workflows. Read More »

TaskFarmer

TaskFarmer is a utility developed in-house at NERSC to distribute the execution of tasks across compute nodes in a single large batch job. These tasks can be single- or multi-core tasks, but individual tasks cannot span nodes (e.g. multi-node MPI job). TaskFarmer tracks which tasks have completed successfully, and allows straightforward re-submission of failed or un-run jobs from a task list. Read More »

Other Workflow Tools

Other tools and services in the workflow ecosystem being used by other NERSC projects. This includes workflow software (Tigres, TaskFarmer, qdo, Galaxy), databases (Postgresql, MySQL, SciDB), and data analytics frameworks (Spark). Read More »