Spin Workload Analysis

Science/CS domains

High performance computing, distributed computing, workflow analysis, data science, observability

Project description

Spin is NERSC’s container service platform for running persistent services such as science gateways, workflow managers, databases, and API endpoints that integrate with NERSC systems and storage. It is Kubernetes-based (managed through Rancher) and has been in production for more than eight years. Users typically build container images, publish them to a registry, and deploy workloads in project namespaces, where services can be operated and maintained for scientific collaborations.

For NERSC’s next-generation HPC system, Doudna, we expect a Spin-like capability on the HPC system itself and many similar user workloads.

 

This project will analyze current Spin workloads, starting with storage I/O behavior and possibly extending to network I/O, CPU, and memory utilization. The outcome will be an evidence-based characterization of workload patterns, bottlenecks, and resource requirements to guide capacity planning, platform design, and operational readiness for Doudna.

Project tasks

This project will focus on the following tasks:

  • Collecting and curating workload telemetry from Spin (storage I/O, and optionally network, CPU, and memory metrics)
  • Building a workload taxonomy by application type, usage pattern, and resource profile
  • Identifying common bottlenecks and performance anti-patterns across user workloads
  • Producing dashboards and summary reports that highlight trends, peaks, and long-tail behaviors
  • Translating findings into recommendations for Doudna platform sizing, tuning, and service priorities
  • Documenting the operation and scripts for dashboard and summary reports

Desired skills/background

  • Familiarity with Linux systems, Kubernetes, and performance monitoring tools.
  • Experience with data analysis and visualization libraries.
  • Basic understanding of I/O performance concepts in distributed systems.
  • Interest in HPC operations, observability, and workload characterization.

Apply to join this project

To apply or ask a question about this project:

Email Pingfei Ding.

Project mentors

Lisa Gerhardt

Deputy Of Operations (Acting)

National Energy Research Scientific Computing Center (NERSC)

Science Engagement & Workflows Dept.

Data & AI Services Group

Meet Lisa

Pengfei Ding

Data Science Workflows Architect

National Energy Research Scientific Computing Center (NERSC)

Science Engagement & Workflows Dept.

Data Science Engagement Group

Meet Pengfei