NERSCPowering Scientific Discovery Since 1974

Do You Hadoop?

August 2, 2011

Magellan investigators Lavanya Ramakrishnan and Shane Canon presented a Hadoop tutorial at SciDAC 2011 held July 10 – 14 in Denver, Colorado.

Hadoop is the open source implementation of MapReduce, a programming model that is gaining traction in the scientific community for addressing the needs of data-focused scientific applications. However, scientific applications’ needs differ significantly from the Web 2.0 applications that have  traditionally used Hadoop.

In this tutorial, Ramakrishnan and Canon presented an overview of Hadoop technologies, discussed some science use cases and outlined programming challenges involved when using Hadoop for legacy applications. Participants also experimented with the Hadoop system at NERSC in the hands-on component of the tutorial.

»Download the presentation, PDF | 3.6 MB

» Download the exercises, PDF | 3 MB

A NERSC training event on Hadoop is being planned for October 2011.  Details and registration information will be announced on the NERSC site and here when they become available.

About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high-performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, the NERSC Center serves more than 6,000 scientists at national laboratories and universities researching a wide range of problems in combustion, climate modeling, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. DOE Office of Science. »Learn more about computing sciences at Berkeley Lab.