Using Hadoop
Using Hadoop
Quick Start Guide
This guide describes how to
- Run a sample job.
- How to access the web-based status pages for the NERSC Hadoop cluster.
Please consult the Hadoop website for general information on using Hadoop.
Running an Example
First, load the following modules:
carver% module load tig
carver% module load testbed hadoop
Then submit an interactive batch job. For example:
carver% qsub -I -l nodes=4:ppn=8 -lwalltime=01:00:00 -V -qregular
Once your batch job starts:
c0217% start_hadoop
Once this command returns you can run Hadoop jobs.
Next, let’s run a quick mapreduce job. Here we run teragen to create sample data for a sort. This uses 32 tasks and creates about 10MB of data.
c0217% hadoop jar $HADOOP_HOME/hadoop-examples*.jar teragen -Dmapred.map.tasks=32 102400 file:$SCRATCH/htest1
3/01/22 18:15:40 INFO util.NativeCodeLoader: Loaded the native-hadoop library
Generating 102400 using 32 maps with step of 3200
13/01/22 18:15:40 INFO mapred.JobClient: Running job: job_201301221815_0001
13/01/22 18:15:41 INFO mapred.JobClient: map 0% reduce 0%
13/01/22 18:15:46 INFO mapred.JobClient: map 9% reduce 0%
13/01/22 18:15:47 INFO mapred.JobClient: map 34% reduce 0%
13/01/22 18:15:48 INFO mapred.JobClient: map 46% reduce 0%
13/01/22 18:15:49 INFO mapred.JobClient: map 75% reduce 0%
13/01/22 18:15:50 INFO mapred.JobClient: map 84% reduce 0%
13/01/22 18:15:51 INFO mapred.JobClient: map 100% reduce 0%
13/01/22 18:15:51 INFO mapred.JobClient: Job complete: job_201301221815_0001
13/01/22 18:15:51 INFO mapred.JobClient: Counters: 12
13/01/22 18:15:51 INFO mapred.JobClient: Job Counters
13/01/22 18:15:51 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=65503
13/01/22 18:15:51 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/01/22 18:15:51 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/01/22 18:15:51 INFO mapred.JobClient: Launched map tasks=32
13/01/22 18:15:51 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/01/22 18:15:51 INFO mapred.JobClient: FileSystemCounters
13/01/22 18:15:51 INFO mapred.JobClient: FILE_BYTES_READ=53248
13/01/22 18:15:51 INFO mapred.JobClient: FILE_BYTES_WRITTEN=11617558
13/01/22 18:15:51 INFO mapred.JobClient: Map-Reduce Framework
13/01/22 18:15:51 INFO mapred.JobClient: Map input records=102400
13/01/22 18:15:51 INFO mapred.JobClient: Spilled Records=0
13/01/22 18:15:51 INFO mapred.JobClient: Map input bytes=102400
13/01/22 18:15:51 INFO mapred.JobClient: Map output records=102400
13/01/22 18:15:51 INFO mapred.JobClient: SPLIT_RAW_BYTES=2665


