NERSCPowering Scientific Discovery for 50 Years

NERSC Initiative for Scientific Exploration (NISE) 2009 Awards

Accelerating Microbial Genomics with High-Performance Computing

Edward Rubin, DOE Joint Genome Institute

Sponsoring NERSC Project: Optimizing Genomic Data Storage for Wide Accessibility (m342), Principal Investigator: Edward Rubin, DOE Joint Genome Institute

NISE Award: 1,000,000 Hours
Award Date: October 2009

The Integrated Microbial Genomes (IMG) system is a complex data management system that integrates the Joint Genome Institute’s microbial genome data with publicly available microbial genome data and thus provides a powerful comparative context for microbial genome analysis.

One of the more computationally demanding steps in the IMG pipeline involves comparing newly sequenced genes against existing reference genes. BLAST (Basic Local Alignment Search Tool) is used to perform this comparison. This requires comparing over 12 million genes against nearly 7 million genes. This work is typically performed on local clusters with around 200 cores and takes around two weeks to complete. Using the Franklin system can reduce turnaround time to one day or less.

The request for NISE time is to finish porting the entire IMG pipeline to Franklin and prepare a computational framework for upcoming sequencers that will quickly overwhelm the computational resources at sequencing centers like JGI.