Performance Benchmarking of Scientific AI Workloads for Next-Generation HPC Systems

decorative abstract programming illo

Science/CS domains

Machine learning, HPC performance analysis, and optimization

Project description

Scientific AI workloads are growing rapidly in scale and complexity, especially with the rise of foundation models and large-scale inference/training pipelines. 

This internship project will benchmark and analyze the performance of representative scientific AI workloads (such as materials characterization or weather forecasting) on NERSC systems, with emphasis on Perlmutter and relevance to future Doudna-class platforms. 

Project tasks

The intern will evaluate end-to-end performance across compute, communication, and I/O; study scaling behavior under different parallelism strategies; and profile bottlenecks in both training and inference settings.

Candidate workloads may include science-focused models and service-oriented stacks (e.g., vLLM-based inference), selected in collaboration with mentors and aligned with DOE/NERSC priorities, including the Genesis Mission.

The project outcome will be actionable benchmark results and analysis that help guide workload readiness, optimization priorities, and system design decisions for next-generation supercomputing.

Desired skills/background

Required

  • Python
  • Experience with ML workflows and/or performance analysis
  • Software engineering fundamentals

Nice to have

  • Distributed training/inference
  • Profiling tools
  • GPU communication and I/O optimization
  • Familiarity with HPC environments

Apply to join this project

To apply or ask a question about this project:

Email Steven Farrell.

Project mentors

Steven Farrell

Group Lead (Acting)

National Energy Research Scientific Computing Center (NERSC)

Science Engagement & Workflows Dept.

Data & AI Services Group

Meet Steven

Shashank Subramanian

Computer Systems Engineer 4

National Energy Research Scientific Computing Center (NERSC)

Science Engagement & Workflows Dept.

Data & AI Services Group

Meet Shashank

Corneel Casert

Machine Learning Engineer

National Energy Research Scientific Computing Center (NERSC)

Science Engagement & Workflows Dept.

Data & AI Services Group

Meet Corneel