NERSCPowering Scientific Discovery Since 1974

Workflow Software

Hadoop

| Tags: Data

Hadoop is an open-source implementation of the popular MapReduce Programming Model. In addition to MapReduce and a dsitributed File System (HDFS), Hadoop has a rich ecosystem of high-level languages (i.e. Pig) and data storage models (HBase, Hive, etc). Read More »

MySGE

Overview MySGE allows users to create a private Sun GridEngine cluster on large parallel systems like Hopper.  One the cluster is started, users can submit serial jobs, array jobs, and other through-put oriented workloads into the personal SGE scheduler.  The jobs are then run within the user private cluster. How it works When the user executes vpc_start, a job is submitted to the standard system scheduler (Moab).  The user can specify the requested time and number of cores using the normal… Read More »

qdo

Description and Overview QDO (kew-doo) is a toolkit for managing many many small tasks within a larger batch framework. QDO separates the queue of tasks to perform from the batch jobs that actually perform the tasks. This simplifies managing tasks as a group, and provides greater flexibility for scaling batch worker jobs up and down or adding additional tasks to the queue even after workers have started processing them. The qdo module provides an API for interacting with task queues. The… Read More »