NERSCPowering Scientific Discovery Since 1974

Workflow Software


| Tags: Data

Hadoop is an open-source implementation of the popular MapReduce Programming Model. In addition to MapReduce and a dsitributed File System (HDFS), Hadoop has a rich ecosystem of high-level languages (i.e. Pig) and data storage models (HBase, Hive, etc). Read More »


Overview MySGE allows users to create a private Sun GridEngine cluster on large parallel systems like Hopper.  One the cluster is started, users can submit serial jobs, array jobs, and other through-put oriented workloads into the personal SGE scheduler.  The jobs are then run within the user private cluster. How it works When the user executes vpc_start, a job is submitted to the standard system scheduler (Moab).  The user can specify the requested time and number of cores using the normal… Read More »


Description and Overview QDO (kew-doo) is a toolkit for managing many many small tasks within a larger batch framework. QDO separates the queue of tasks to perform from the batch jobs that actually perform the tasks. This simplifies managing tasks as a group, and provides greater flexibility for scaling batch worker jobs up and down or adding additional tasks to the queue even after workers have started processing them. The qdo module provides an API for interacting with task queues. The… Read More »


  FireWorks is a free, open-source code for defining, managing, and executing scientific workflows. It can be used to automate calculations over arbitrary computing resources, including those that have a queueing system. Some features that distinguish FireWorks are dynamic workflows, failure-detection routines, and built-in tools and execution modes for running high-throughput computations at large computing centers. Fireworks is the primary workflow engine for the Materials Project, and is… Read More »