NERSCPowering Scientific Discovery Since 1974

Using Hadoop

Using Hadoop

Quick Start Guide

This guide describes how to

  1. Run a sample job.
  2. How to access the web-based status pages for the NERSC Hadoop cluster.

Please consult the Hadoop website for general information on using Hadoop.

Running an Example

First, load the following modules:

carver% module load tig
carver% module load testbed hadoop-tig

Then submit an interactive batch job. For example:

carver% qsub -I -l nodes=4:ppn=8 -lwalltime=01:00:00 -V -qregular

Once your batch job starts:

c0217% start_hadoop

Once this command returns you can run Hadoop jobs.

Next, let’s run a quick mapreduce job. Here we run teragen to create sample data for a sort.  This uses 32 tasks and creates about 10MB of data.

c0217% hadoop jar $HADOOP_HOME/hadoop-examples*.jar teragen -Dmapred.map.tasks=32 102400 file:$SCRATCH/htest1
3/01/22 18:15:40 INFO util.NativeCodeLoader: Loaded the native-hadoop library
Generating 102400 using 32 maps with step of 3200
13/01/22 18:15:40 INFO mapred.JobClient: Running job: job_201301221815_0001
13/01/22 18:15:41 INFO mapred.JobClient:  map 0% reduce 0%
13/01/22 18:15:46 INFO mapred.JobClient:  map 9% reduce 0%
13/01/22 18:15:47 INFO mapred.JobClient:  map 34% reduce 0%
13/01/22 18:15:48 INFO mapred.JobClient:  map 46% reduce 0%
13/01/22 18:15:49 INFO mapred.JobClient:  map 75% reduce 0%
13/01/22 18:15:50 INFO mapred.JobClient:  map 84% reduce 0%
13/01/22 18:15:51 INFO mapred.JobClient:  map 100% reduce 0%
13/01/22 18:15:51 INFO mapred.JobClient: Job complete: job_201301221815_0001
13/01/22 18:15:51 INFO mapred.JobClient: Counters: 12
13/01/22 18:15:51 INFO mapred.JobClient:   Job Counters
13/01/22 18:15:51 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=65503
13/01/22 18:15:51 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/01/22 18:15:51 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/01/22 18:15:51 INFO mapred.JobClient:     Launched map tasks=32
13/01/22 18:15:51 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/01/22 18:15:51 INFO mapred.JobClient:   FileSystemCounters
13/01/22 18:15:51 INFO mapred.JobClient:     FILE_BYTES_READ=53248
13/01/22 18:15:51 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=11617558
13/01/22 18:15:51 INFO mapred.JobClient:   Map-Reduce Framework
13/01/22 18:15:51 INFO mapred.JobClient:     Map input records=102400
13/01/22 18:15:51 INFO mapred.JobClient:     Spilled Records=0
13/01/22 18:15:51 INFO mapred.JobClient:     Map input bytes=102400
13/01/22 18:15:51 INFO mapred.JobClient:     Map output records=102400
13/01/22 18:15:51 INFO mapred.JobClient:     SPLIT_RAW_BYTES=2665