Spin Workload Analysis
Science/CS domains
High performance computing, distributed computing, workflow analysis, data science, observability
Project description
Spin is NERSC’s container service platform for running persistent services such as science gateways, workflow managers, databases, and API endpoints that integrate with NERSC systems and storage. It is Kubernetes-based (managed through Rancher) and has been in production for more than eight years. Users typically build container images, publish them to a registry, and deploy workloads in project namespaces, where services can be operated and maintained for scientific collaborations.
For NERSC’s next-generation HPC system, Doudna, we expect a Spin-like capability on the HPC system itself and many similar user workloads.
This project will analyze current Spin workloads, starting with storage I/O behavior and possibly extending to network I/O, CPU, and memory utilization. The outcome will be an evidence-based characterization of workload patterns, bottlenecks, and resource requirements to guide capacity planning, platform design, and operational readiness for Doudna.
Project tasks
This project will focus on the following tasks:
- Collecting and curating workload telemetry from Spin (storage I/O, and optionally network, CPU, and memory metrics)
- Building a workload taxonomy by application type, usage pattern, and resource profile
- Identifying common bottlenecks and performance anti-patterns across user workloads
- Producing dashboards and summary reports that highlight trends, peaks, and long-tail behaviors
- Translating findings into recommendations for Doudna platform sizing, tuning, and service priorities
- Documenting the operation and scripts for dashboard and summary reports
Desired skills/background
- Familiarity with Linux systems, Kubernetes, and performance monitoring tools.
- Experience with data analysis and visualization libraries.
- Basic understanding of I/O performance concepts in distributed systems.
- Interest in HPC operations, observability, and workload characterization.
Apply to join this project
To apply or ask a question about this project:
Project mentors
Lisa Gerhardt
Deputy Of Operations (Acting)
National Energy Research Scientific Computing Center (NERSC)
Science Engagement & Workflows Dept.
Data & AI Services Group
Pengfei Ding
Data Science Workflows Architect
National Energy Research Scientific Computing Center (NERSC)
Science Engagement & Workflows Dept.
Data Science Engagement Group