NERSC Helps Scientists Build Public Health Datasets from Location Services
Research focuses on human interaction patterns to understand drivers of the COVID-19 spread and mitigation
September 7, 2023
By Keri Troutman
Location-based cellular services can give researchers a sense of human traffic patterns, something that scientists were particularly interested in during the height of the COVID pandemic. However, measuring social proximity within a certain area or business, which is key to monitoring disease spread, is not as simple; it requires analysis of terabytes of data, a computational bottleneck that isn’t easily solved without access to large supercomputing resources.
The data we currently have about people's movement and social distancing is limited. Some datasets provide general information about how far people move around in a region (such as a county or province), while others give us more detailed traffic patterns. However, none of these datasets directly tell us how many times people come into close contact with one another. To address these challenges, a group of scientists used resources at the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory (Berkeley Lab) to process location-based cell phone data and produce much more detailed datasets that better estimate fine-scaled social contacts within public spaces – datasets that can be used, among other things, to understand the impact of public health interventions and messaging on social distancing.
“In light of public policy around COVID disease spread, we were thinking about how you can change policies and change layouts of buildings so that with the same amount of traffic, people don’t come very close to one another,” said paper co-author Ashok Srinivasan, a computer scientist and professor at the University of West Florida. “You can look at theoretical models, but if you want to really know what’s happening in the real world, you have to know more precisely where people are.”
Fine-scaled Interaction Details
To gain a more precise perspective about how people move around within a space, the scientists initially used anonymous data from location-based services to create a dataset showing daily social contact intensity of clusters of people at different types of points of interest (POI) by zip code in Florida and California. They aggregated fine-scaled details of interactions of people at a spatial resolution of 10 m, which was then normalized as a social contact index.
“NERSC was key in setting this project up,” said Srinivasan. His research group was granted a NERSC director’s reserve allocation of 1.5 million CPU hours on the Cori supercomputer and storage for 100 terabytes of data, which allowed them to process their vast amount of data and build a public dataset.
Related works that use traffic or colocation at a POI to measure contacts between people don’t consider the differences in population density within that location. POIs with low traffic densities may have high interactions – for example, high interactions at airport security check areas even when overall airport traffic is low.
“Our contribution lies in accounting for these finer-scaled variations and also correcting for geographic biases in the raw mobility data,” said paper co-author Suren Byna, who was a computer scientist in the Scientific Data Division at Berkeley Lab during this research and is now a professor at The Ohio State University and a visiting faculty member at Berkeley Lab. “We figured people could potentially use it for lots of other applications, so we created a dataset that’s publicly available.”
Geographic and Demographic Biases
The research team found that the geographic biases they saw reflect demographic biases to some extent, and correcting for these is essential for modeling social trends accurately.
“One of the distinguishing features of our dataset is that we found a way to correct for underserved populations,” said Srinivasan. “You have to correct for it, or you miss these demographic groups, so we used census data and compared it to our location data, which measures people in census block groups. We can then see where people are underrepresented and make sure to correct for this.”
Byna and Srinivasan point out that researchers can use the social contact index and cluster characterization they created from their data analysis for a variety of purposes that require social interaction patterns – trends in social interactions at different types of businesses as they relate to public health interventions, for example, or in modeling infection outbreaks at events. These types of analyses can also suggest interventions at specific locations that can reduce social interactions, such as changes in queue design at airport security gates.
“What we found is that peoples’ behavior patterns as a response to policy mattered more than the policy itself,” said Srinivasan. “Having this level of data analysis helped us understand that at a deeper level.”
About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high-performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, the NERSC Center serves almost 10,000 scientists at national laboratories and universities researching a wide range of problems in climate, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.