A paradigm shift in deep learning for chemistry and materials science
May 26, 2021
By Brandon Wood, Postdoctoral Researcher, NESAP
Deep learning is becoming more prevalent in scientific domains because datasets continue to increase in size. As a NESAP postdoc at NERSC, I have spent the last year helping to create a large catalysis dataset called Open Catalyst 2020, or OC20, and an associated community challenge with Zack Ulissi’s group at Carnegie Mellon University and researchers at Facebook AI Research (FAIR). Our paper describing the dataset and baseline models was just published in ACS Catalysis. More recently, we have been developing, tuning, and scaling graph neural network (GNN) models trained on the OC20 dataset.
During my time as a postdoc, one thing has become obvious: The way we chemists and materials scientists think about and utilize GPU resources is rapidly changing.
Chemistry and materials science have generally been data-poor domains, at least compared to other fields such as image processing or natural language processing. When I started working with Zack’s group, we had a dataset of around 50,000 examples, and models such as crystal graph convolutional neural network (CGCNN) with 1000s of parameters could be trained to convergence on 1 GPU in a day. With the largest OC20 training dataset (~130M examples) and the largest version of the DimeNet++ model we reported in our paper (~10M parameters), it took 256 GPUs the better part of a week to train (1000s of GPU days). DimeNet++ — a GNN that incorporates angular information through directional message passing — is memory intensive, which limits the local batch size and results in using many GPUs to achieve a sufficiently large global batch size. This massive change in compute requirements quickly outpaced the resources we had available and heightened our anticipation for Perlmutter, with its 6,000+ NVIDIA A100 GPUs each with 40GB of memory.
One of our goals as a NESAP project (CatalysisDL) is to gain an understanding of the limits of deep learning on a large materials dataset. Currently, we see improved model performance when the number of trainable parameters is increased and we are not yet overfitting our data. Perlmutter will allow us to explore questions about model size that have been prohibitively expensive for academic researchers in the past. Going beyond just training a number of large models, the Perlmutter ecosystem will enable other necessary aspects of modern deep learning workflows such as model iteration, retraining of models with new data, training state-of-the-art models, hyperparameter optimization, putting models into production, and incorporating other data science tools such as JupyterHub and Dask.
Moving forward, openly available GPU resources such as Perlmutter are going to be incredibly valuable for the academic community interested in deep learning for science. The trend toward larger datasets and consequently larger models is not unique to our project and is representative of the deep learning field. That is not to say all AI/ML for science is going to move towards larger datasets and larger models, but deep learning is an exciting sub-field.
Overall, the impact of scale remains to be seen for chemistry and materials science applications, along with other domains of science. Our team is excited to investigate these and other questions on Perlmutter.
A special thanks to Zack Ulissi and Javier Heras-Domingo.