NERSC Hosts Virtual GPU Hackathon
Three-day Event Takes Coding Collaborations in New Directions
September 11, 2020
Like most of the conferences, seminars, and workshops taking place this year across the country and around the world, the recent GPU hackathon hosted by NERSC was a fully virtual affair. Held July 13-15 in conjunction with NVIDIA, the Oak Ridge Leadership Computing Facility, and OpenACC as part of the GPU Hackathons series, the event served as an innovative model for what could be the next generation of HPC hackathons.
Traditional hackathons – often referred to as “dungeon sessions” - typically bring together groups of programmers who hunker down in a room for eight hours or more over 3 to 5 days to collaborate on finding new ways to improve the performance of their codes (one perk: food and beverages are provided). But given the current pandemic, the GPU Hackathons are now using a variety of online platforms, including Zoom and Slack, to create a new way of collaborating without the need to leave one’s home.
“It’s the exact opposite of a traditional dungeon session in that you're in your own home, you have your own food, you can get up and move around as you want, you’re not jet-lagged and staying in a hotel for five days – so there’s definitely a positive to that flexibility,” said Kevin Gott, Application Performance Specialist at NERSC, who not only helped organize and manage the July hackathon but also provides continued support to the organizers for other events in the series. Other NERSC participants and mentors at the July hackathon included Jack Deslippe, Mustafa Mustafa, Hugo Brunie, and Brandon Cook.
At present the format for the virtual hackathons features a central Zoom meeting with multiple “breakout rooms” and Slack channels for additional discussions and questions. Mentors are paired with teams based on the needed expertise and are available both in the team room and via the Slack channels, chiming in when they see a question or conversation that they feel they can help address. There is an introductory meeting a week before the event begins, and the actual hackathon takes place over three days.
“That first day is held the week before and is designed to start everyone working on their code. The full week gives participants ample time to address any issues, ensure they have access to the compute resources and tools needed, and develop a good plan for the other three days that follow,” Gott said. “Feedback from our surveys show that many people really like that.”
With the much-anticipated installation of Perlmutter, NERSC’s next supercomputer, Gott and others want to ensure that the hackathons continue to support the transition to GPU systems.
“At last year’s event, the focus was on NESAP and ECP teams, but this year the theme was ‘HPC for open science’ because we wanted to open it up to a broader set of science applications,” said Gott. “It’s only a few months before Perlmutter comes online, so we wanted to get the entire community together talking about GPUs and collaborating.“
For the July hackathon, NERSC hosted nine teams, with three to seven members per team and up to three mentors per team. Participants hailed from across the U.S. and as far away as Australia.
With participants located across the globe, the time zone variation provides some challenges. But the virtual platform also enables more mentors to participate and make themselves available throughout the day on an as-needed basis.
“Mentors could participate part-time or pop in as needed and follow up with input in other formats, such as offline and in the Slack channels,” Gott said. “As a result, we're getting a lot more participation from experts in all these events than in previous years.”
This year’s hackathon yielded a range of excellent results, including:
A cumulative 4x speedup in the main kernel of ASGarD’s 6D high-fidelity solver
Accelerating NIMROD’s SuperLU solver to work 40x faster on a V100 than on 16 POWER9 cores
Speeding up EQSIM’s RAJA kernels 15.7x to become 16% faster than their CUDA implementation
Overall, the teams unanimously agreed that the hackathon was a valuable experience.
NERSC has hosted GPU hackathons targeting NESAP application teams since 2019, where the traditional week of full-time hacking is supplemented by six weeks of preparation together with engineers from Hewlett-Packard Enterprise, NVIDIA, and NERSC. While the first virtual hackathon was considered a success, NERSC staff are using the lessons of the first fully virtual hackathon to ensure even more effective and work-from-home friendly events in the future.
For the remaining NESAP hackathons in 2020, NERSC is modifying this format to expand the six-week-long, high-intensity preparation effort into a several-month-long, slightly lower intensity effort, and removing the culminating hackathon at the end. This longer engagement among application teams and performance engineers will function more like a performance collaboration than a hackathon “sprint.”
Any teams interested in working with vendor and community experts to port to GPUs, improve their GPU performance, or explore new GPU paradigms are encouraged to go to the GPU hackathon website and apply.
Looking ahead, Gott believes the virtual format could open up new opportunities for similar events in a number of focused areas.
“You could do this throughout the year for different topics – say, hold an OpenMP hackathon in January, and an OpenACC hackathon in April, and then repeat them at other times of the year,” he said. “I think it would work extremely well, especially for highly focused topics. Doing it virtually means almost everyone, anywhere, can participate.”
About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high-performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, the NERSC Center serves more than 7,000 scientists at national laboratories and universities researching a wide range of problems in combustion, climate modeling, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.