NERSCPowering Scientific Discovery Since 1974

2021 NERSC Summer Internships

NERSC hosts a number of internships every summer. Applicants must be students, actively enrolled in undergraduate or graduate programs. These are paid internships, but we are unable to provide additional support for housing. Desired technical qualifications are specified with each project description.

2021 Projects are currently in progress and will appear here for each group as they become available. Please check back to see any new additions. To apply for one of these internships, please reach out to the listed NERSC mentors directly and send your CV/resume.


Cybersecurity

Log Analysis for Cybersecurity Operations

CS Domain: Cybersecurity, Center Status

This position has been filled for Summer 2021.

Project Description: NERSC systems generate vast amounts of log data that can be difficult to quickly extract meaningful information from. The NERSC cybersecurity team is exploring ways in which we can improve cybersecurity operations through data visualization and in-depth, automated analysis of application logs. This project aims to use machine learning and data mining techniques for automated log review and anomaly detection. The student will collaborate with members of the NERSC cybersecurity team to develop a framework for evaluation and visualization of log data that is relevant to cybersecurity operations. The student will apply data mining and machine learning techniques to uncover patterns and identify anomalies in network and application logs, create visualizations of log data, and assist with developing a security dashboard.

Desired Skills/Background: Familiarity with Linux environments, basic knowledge of statistical techniques (including machine learning and/or data mining), Python and libraries relevant to data science and visualization (i.e. scikit-learn, Pandas, and matplotlib).

NERSC Mentor: Tiffany Connors ([email protected])


Hardware Evaluation

Evaluating Next Generation Compute in Network

Project Description: High performance computing has moved towards heterogeneous architectures (e.g., CPUs, GPUs, FPGAs, Data Processing Units (DPUs)) to meet the needs of applications. Simultaneously, there is increased demand on the network edge for processing in order to provide faster response times and alleviate pressure through the data center. All of this has resulted in a proliferation of new hardware with vastly different capabilities. The challenges include: (1) identifying which device sets of code perform best on and (2) determining where to physically execute the code in the data center. For this project, the student will explore the performance and energy tradeoffs of representative codes on network and edge computing architectures. Project tasks: *

  • Develop benchmarks to evaluate the benefits and drawbacks of new DPU/SmartNIC accelerators.
  • Work with lab staff to develop the appropriate models of DPU architectures.
  • Leverage models to understand how changes in architecture will affect workloads.
  • Work with staff to develop recommendations for upcoming system requirements.
  • [Stretch Goal] Develop your ideas and results into a workshop or conference publication.

Desired Skills/Background: Ability to develop benchmarks in the C programming language, solid understanding of basic algorithms, knowledge of computer operating systems, Linux, computer networks, shell scripting and/or Python. Motivated to learn new technologies. Knowledge of distributed computing is a plus. Understanding of scientific codes and workflows is a plus. Computer architecture and performance modeling experience such as LogP is a plus.

NERSC mentor: Taylor Groves, ([email protected])


Data and Analytics

An Easy Overview of Resource Utilization

CS domain: Application Workflows and Analytics, Center Status

Project Description: Scaling up programs to run at the scale of a modern high-performance computing (HPC) center can be a daunting task. One of the first questions developers ask is: “is my program using all the hardware I have given it?” Many tools can extract detailed performance data on your application. But the level of detail that these tools deliver comes at a cost: significant effort and time has to be invested in collecting and analyzing performance data. But to answer a question like “am I using all 4 GPUs per node?”, this level of detail is overkill. So this project aims to help new developers get a quick-and-dirty answer to their program’s hardware utilization at a glance. In this project, the student will work with NERSC staff to accomplish the following:

  • Assess how much of this data is already being collected by the center’s monitoring and logging systems. And find the least invasive tool with which to capture any data that is not already being collected.
  • Develop a light-weight program that collects hardware utilization data from the center’s monitoring and logging systems at regular intervals.
  • Build a web application that displays this information to NERSC users; or work with NERSC staff to integrate the collected data into NERSC's existing web platform -- for example via my.nersc.gov

Desired Skills/Background: Familiarity with Linux environments, high-level programming languages, full-stack web development (eg. PHP, JavaScript, Python, mySQL). This project is intended as an opportunity to learn about HPC center operations, therefore knowledge of high-performance is not required.

NERSC Mentor: Johannes Blaschke ([email protected])

Tracking and Managing Data Provenance

Science/CS domain: Data Management, Application Workflows, Storage & I/O

Project Description: Tracking the data life-cycle in scientific workflows and documenting it provides opportunities for optimizing it. There are several unknowns in managing the tracked provenance and changes to the data, etc. While there are numerous standards and protocols for storing provenance and managing versions of data, many are limited to a specific application or science domain. In this project, we target capturing provenance and managing versions of data in a data life cycle from a file format perspective, i.e., HDF5. We anticipate that this approach will lead to developing generalized methods and tools that a broad set of applications and science domains can take advantage of.

Students interested in this project will perform the following R&D tasks:

  • Find 3 application teams that have complex enough workflows or data versioning to work with
    • Ideally, one each of: AI, experimental & observational data (EOD), and HPC simulation
  • Build descriptions of their data processing workflow provenance
  • Experiment with mechanisms for tracking and choosing versions of data / metadata
  • Develop efficient data structures for managing provenance and version control techniques for storing in self-describing scientific file formats (such as HDF5)
  • Write a research paper around this work

Desired Skills/Background: Scientific workflows, version control systems, storage & I/O, HDF5.

NERSC/DAS mentor: Quincey Koziol ([email protected])

A New Way of Representing the HPC Workload

CS Domain: Data visualization, Center status

Project Description: Supercomputers like NERSC's system Cori are typically running thousand of compute jobs at any one time - everything from short debugging jobs to long-running massive full-machine jobs. The only way NERSC staff and users can assess the full workload on Cori is by looking at a simple list of jobs, which limits the insight we can gain into our workload. This project will look at innovative and artistic ways to represent a supercomputer workload, giving NERSC staff and users new insight into the science being performed on our system. This may be through music (eg representing different categories of jobs by tones) or through visual representation (eg representing different categories of jobs by colours). The end result could be made available to NERSC users on our website. This project would suit someone with basic coding skills and an interest in alternative and artistic representations of data. Required experience: coding for data representation (eg using python or R).

Desired Skills/Background: Previously completed projects in data visualization, experience in designing web interfaces.

NERSC mentor: Debbie Bard ([email protected])


Machine Learning

Scientific Deep Learning Benchmarking and Performance Analysis

CS domain: deep learning, benchmarking

Project Description: Scientific Deep Learning applications are a transformative emerging workload for supercomputers, and it is critical for HPC centers to have robust methodologies and benchmarks for characterizing this new workload, evaluating system performance, and to drive innovation in hardware and system design. Organizations like MLPerf are pushing on this front with state of the art Deep Learning benchmark applications for industry and recently for HPC and science workloads as well. We are looking for an enthusiastic candidate to analyze and optimize the performance of scientific AI benchmarks on supercomputers such as Perlmutter, the upcoming system at NERSC designed to support emerging AI and analytics workloads with NVIDIA A100 GPUs.

Desired Skills/Background:

Required: Python, machine learning, experience with PyTorch or TensorFlow

Nice: distributed deep learning, GPU profiling, hyper parameter optimization, model parallelism

NERSC/DAS mentor: Steven Farrell ([email protected])

Deep Learning Anomaly Detection for Fundamental Physics

Science/CS domain: Deep learning-based anomaly detection for discovery in fundamental physics

Project Description: This is an exciting time in fundamental physics, with many current or planned experiments producing complex data. There are many experimental and theoretical hints for new phenomena (such as dark matter), but we do not yet have any significant evidence for new particles or forces of nature since the discovery of the Higgs Boson in 2012. This could be because our experiments are not sensitive enough, that the new particles are rare, or that we are not looking in the right place. The goal of this project is to investigate this last possibility. We have developed a variety of deep learning methods to automatically explore the high-dimensional data with as little model bias as possible (“less than supervised”). This project will involve developing and integrating deep learning-based anomaly detection techniques to a variety of physical systems including collider physics (e.g. the Large Hadron Collider) and indirect dark matter detection (e.g. Gaia space observatory) and deploying these on NERSC supercomputers such as the upcoming Perlmutter system.

Desired Skills/Background:

Required: Python, GPU, deep learning

Nice: Basic statistics, high energy physics / astrophysics

NERSC/DAS mentor: Wahid Bhimji ([email protected]) co-mentor: Benjamin Nachman ([email protected])

Application Deadline: April 15, 2021

AI for I/O

Science/CS domain: AI/ML/DL, Storage & I/O, Performance

Project Description: Parallel I/O performance tuning is difficult, tedious, and error-prone, due to interdependencies among multiple software layers (high-level I/O libraries, parallel I/O (MPI-IO), parallel file system, etc.) and multiple levels of storage hardware (memory, node-local storage, parallel file system, etc.). In this project, we would like to explore AI methods for finding tuning parameters to achieve superior performance, setting those parameters from high-level libraries (HDF5) and data management runtime systems (Proactive Data Containers (PDC)) and measuring the benefit to applications.

Students interested in this project will perform the following R&D tasks:

  • Determine what AI techniques work well for incorporating within I/O middleware like HDF5 and PDC
    • Which parts of I/O middleware best benefit from AI techniques?
    • Cache entry eviction, parameter setting (parallel file system block size, stripe width), data compression, feature extraction, etc.
  • Determine how to measure improvement
    • Performance, memory usage, space savings with reduced data, etc.
  • Implement prototype(s) for
    • Monitoring I/O libraries, storage systems, etc.
    • AI algorithms
  • Apply prototypes in HDF5 and in PDC
  • Write a research paper describing the observations, implementations, and improvements.

Desired Skills/Background: Machine learning, storage & I/O, HDF5, performance tuning.

NERSC/DAS mentor: Quincey Koziol ([email protected])


Data Infrastructure

Enhancing Jupyter Capabilities and Infrastructure at NERSC

CS domain: Sofware Engineering, Data Engineering

Project description: Scientists love Jupyter because it combines text, visualization, data analytics, and code into a document they can share, modify, and even publish. What about using Jupyter to control experiments in real-time, or steer complex simulations on a supercomputer, or even combining  aspects of both workflows—what would it take? We are looking for Python, Jupyter, and JavaScript enthusiasts to help us find ways to expose NERSC's high-performance computing and storage systems through Jupyter, making supercomputing more literate and more user friendly. The project will involve developing software to extend Jupyter’s core capabilities, to take advantage of High Performance Computing and Data.

Desired Skills/Background: Python, Jupyter, JavaScript

NERSC/DAS mentor: Shreyas Cholia ([email protected]), Rollin Thomas ([email protected])

Building a Multi-Petabyte Data Portal

CS domain: Software development, web front end, databases

Project description: Scientists at NERSC have PBs of data they need to manage and search across multiple storage layers of the file system. We would like to work with a motivated intern to help develop a responsive and intuitive web portal that can help scientists manage millions of files and guide them to the appropriate destination. The ideal intern would be able to code for the full web stack, from front-end html/css to back-end data wrangling.

Desired Skills/Background: react, d3, python, spark or dask or similar, databases

NERSC/DAS mentors: Annette Greiner ([email protected]), Lisa Gerhardt ([email protected])


Application Performance

GPU Accelerated Sequence Alignment Software Suite

Domain: GPU programming, C++ programming, Bioinformatics Algorithms

Project Description: ADEPT is a GPU accelerated implementation of a dynamic programming-based sequence alignment algorithm that performs local alignments on short DNA reads. ADEPT provides a framework to extend and build other similar algorithms (global alignments, semi-global alignments, long read alignments, alignments with CIGAR) that are frequently needed in bioinformatics software. Task would be to extend ADEPT into a complete suite of alignment algorithms that is easy to use and customize based on the needs of different developers and bioinformatics softwares. This project will help develop skills in low level GPU programming and optimization along with knowledge of different profiling and performance analysis tools. This project would also provide an insight into the world of high performance bioinformatics software development.

Desired Skills/Background:

Required: C++ programming, git version control, knowledge of Graphics Processing Units (GPUs).

Good to have but not required: GPU programming experience, knowledge of computational bioinformatics algorithms

NERSC/APG mentor: Muaaz Awan ([email protected])

Not currently accepting new applications. 

Possible publishable out comes:

  • As an application paper in bioinformatics journals.
  • Integration of developed tool in an existing bioinformatics software and demonstrating new scientific possibilities.
  • Comparative study with some of the existing CPU suites.

NERSC Proxy Suite/ Programming Models

Domain: Accelerator programming, programming languages, performance measurement, algorithm/architecture analysis

Project Description: Contribute to the NERSC proxy application suite development effort. This is a new project and so far some examples have been made public at https://gitlab.com/NERSC/nersc-proxies . Evaluating the new SYCL 2020 standard on NVIDIA GPUs is of particular interest. Potential activities include:

  • Extracting simplified motifs and kernels from science applications
  • Porting new or existing proxies to new programming models
  • Performance measurement and analysis on CPUs and GPUs
  • Test and evaluate the latest features in OpenMP, OpenACC, C++, Fortran, kokkos, MPI, SYCL, HIP, HPX and more

Desired Skills/Background: GPU programming, C++

NERSC/APG mentor: Brandon Cook ([email protected])

Not currently accepting new applications.

Performance Metric Collection and Analysis

Domain: analytics, big data, architecture analysis, visualization, machine learning

Project Description: NERSC operates a sampling framework taking measurements continuously across all compute nodes in Cori. Currently there are multiple petabytes of this data available. We are extending this framework for GPU based systems in support of Perlmutter and working to extra insights from this dataset. Potential activities include:

  • defining and validating metrics and collection methods relevant to GPUs
  • Time-series analysis / Machine Learning for classification, anomaly/ fault detection
  • combining data streams from multiple sources for workload analysis
  • analysis of system wide architectural efficiency
  • incorporating analysis results into continuous pipeline feeding NERSC's web portal (e.g. click a jobid and get a Roofline plot, memory bandwidth analysis, etc)

Desired Skills/Background: Performance tools, performance measurement, CPU/GPU metrics, data analysis

NERSC/APG mentor: Brandon Cook ([email protected])

Not currently accepting new applications.

Enhancing NERSC's Performance and Regression Testing Framework

Domain: Statistics, testing, analytics

Project Description: NERSC uses ReFrame to monitor our systems and software stack for regressions. NERSC’s documentation (docs.nersc.gov) hosts many examples. The goal of this project is to extend NERSC’s documentation such that e.g. example jobs are also ReFrame tests, thus ensuring that the documented examples are kept up to date with respect to any system software changes.

ReFrame includes basic performance monitoring, but due to variability and the undesirability of false positives tolerances must be kept high. However, such high tolerances lead to “small” but non-trivial performance regressions going undetected. This effort would be to extend the ReFrame framework to support more advanced statistics based detection of regressions

Desired Skills/Background: python, shell scripting, statistics

NERSC/APG mentor: Brandon Cook ([email protected]), Brian Friesen ([email protected]

Not currently accepting new applications.


Machine Learning for Workload Characterization

Project Description:  Berkeley Lab’s National Energy Research Scientific Computing Center (NERSC) has an opening for a Data Scientist intern. NERSC operates a sampling framework that takes measurements continuously across all compute nodes in Cori. Currently there are multiple petabytes of this data available. The person in this position will help analyze this data to help increase application performance and throughput and characterize the NERSC workload and analyzes the characteristics of scientific application codes and their usage on HPC systems and monitors NERSC system utilization and capability usage. He/she is responsible for managing user data collected from NERSC High Performance computational and data systems and assisting with operational and system-level data.

What You Will Do:
  • Participate in a team that collects and stores data collected from NERSC HPC systems relating to applications and their performance, job scheduling, and systems operations.
  • Apply statistical methods to data collected from NERSC HPC systems to draw inferences (e.g. detect anomalies, identify correlations, optimize scheduling and job placement) that guide how NERSC configures systems, makes policy decisions, and acquires systems.
  • We have time series data of all compute nodes of Cori. We are looking into extracting insight from this data using Time Series Analysis, Signal processing.
  • Implement and maintain application performance monitoring methods and detect and report anomalies and changes.
What is Desired:
  • Statistical methods for data analysis, including machine learning and time series.
  • Ability to work with databasesAbility to produce insightful inferences from dataand produce clear reports and summaries.
  • Strong communication and interpersonal skills are required as is an ability to work productively in groups.

NERSC/UEG mentor:  Kadidia Konate ([email protected])