NERSCPowering Scientific Discovery Since 1974

NERSC Summer Internships

NERSC hosts a number of internships every summer. Applicants must be students, actively enrolled in undergraduate or graduate programs. These are paid internships, but we are unable to provide additional support for housing. Desired technical qualifications are specified with each project description. This page will be updated with more projects, so check back for further additions. To see a list of the previous year's internship projects, click hereIn addition to the projects below, NERSC hosts other projects via the Lab's CS Summer Student Program. For all summer positions, including those at NERSC, the hourly wages are up to $21.16 for undergraduate students and up to $38.40 for graduate students, depending upon years of education completed. Applicants are responsible for housing and travel expenses.

To apply for one of the internships below, please reach out to the listed NERSC mentors directly and send your CV/resume.

Summer 2023 Internship Projects


Data and Analytics

Accelerating realtime data processing for the DIII-D fusion experiment

Science/CS domain: magnetic confinement fusion, data analysis, code optimization

Project description: The goal of this project is to speed up the charge exchange data processing for the DIII-D tokamak at NERSC. The charge-exchange data is required to run a between-shot workflow that can reconstruct the DIII-D plasma profiles every ~10 minutes; this workflow cannot start until the charge-exchange data analysis is complete. We will profile the current charge-exchange data analysis code and determine opportunities for achieving speedup, likely through MPI parallelization and/or code optimization. We will implement these improvements and check to ensure that the overall application correctness is preserved. Once we have achieved some speedup, we will examine the full realtime equilibrium reconstruction workflow, including the cost of transferring the raw input data from DIII-D to NERSC. We will evaluate whether the full workflow is faster with NERSC-based data processing or DIII-D-based data processing. In either case, achieving speedup will be critical to achieving between-shot equilibrium reconstruction. 

Desired Skills/Background: experience with C++ and/or FORTRAN and MPI

Nice: code profiling, code optimization, writing and running unit tests for correctness

NERSC/DAS mentor: Laurie Stephey ([email protected])

 


 

Scalable Deployment of Data Services with Helm

Science/CS domain: DevOps, Backend Development, Data Management

Project Description: Linux containers have become an immensely popular software development paradigm due to their ability to provide lightweight and reproducible software runtime encapsulation as well as extremely portable and scalable application deployment.  At NERSC, the Data & Analytics services team uses containers to deploy a variety of data services to over 8000 active users.  The goal of this project is to migrate NERSC science database and data portal services from monolithic containers to helm-templated microservices.  This will enable several improvements, including easier and more frequent version upgrades and the possibility for users to self-administer these services on NERSC's Kubernetes-as-a-Service platform, Spin.

Desired Skills/Background: Some scripting or programming experience and experience with or interest in using container technology (e.g. Docker, Podman).

Nice: Any experience with Kubernetes, Helm, CI/CD in Gitlab/Github, web server configuration (NGINX/Apache), databases (Mongo/Postgres/MySQL), web app or microservice development.

NERSC/DAS mentor: Dan Fulton ([email protected])


 

Understanding usage of I/O libraries in supercomputer production workloads

Science/CS domain: Data Analytics

Project Description: Efficiently storing and retrieving data in supercomputers is a tricky problem due to inter-dependencies among multiple layers of input/output (I/O) software, including high-level I/O libraries (e.g., HDF5, netCDF, ROOT, etc.), MPI-IO, POSIX, and file systems. Despite having metrics transparently collected by profiling tools such as Darshan, it is unclear how those I/O libraries are used in production workloads and how their I/O accesses are translated before reaching the storage system. In this project, we seek to dive into a year’s worth of data to understand the usage of I/O libraries and the I/O characteristics of the applications relying on those high-level libraries. This would lead us to identify I/O performance bottlenecks and devise tuning strategies, as well as design new features in I/O libraries that are beneficial to the ever-changing application landscape on supercomputers. 

Required Skills/Background: Python, Jupyter Notebook, knowledge of statistics, plotting tools 

Nice: Experience with I/O, HPC, MPI

NERSC mentors

ATGStephen Simms ([email protected]) and DASLisa Gerhardt ([email protected])


Machine Learning

Deep learning for climate simulations

Science/CS domain: Scientific Machine learning, HPC, Weather/Climate

Project Description: Simulating the earth’s climate with high fidelity in high resolution requires significant computational resources. Today, with the advent of deep learning models and availability of large volumes of simulation and observational data, data-driven models have the enormous potential to augment traditional numerical models by providing orders of magnitude speedup in compute and, hence, enabling the use of massive ensembles to predict low likelihood and high impact extreme events under different climate warming scenarios. In this project, we aim to use state-of-the-art Fourier forecasting networks (based on Transformers) and HPC software to understand and characterize the performance of deep learning models in simulating the physical behavior of Earth’s atmospheric processes and associated extreme events. This will involve exploration of climate simulation data, development of model architectures and underlying foundational aspects of such models, and using NERSC HPC compute resources on the Perlmutter supercomputer to train and analyze these large models.

Desired Skills/Background: Python, deep learning, PyTorch/TensorFlow, interest in climate science

Nice: Distributed training of ML models

NERSC/DAS mentor: Shashank Subramanian ([email protected]), Peter Harrington ([email protected])