Taylor Groves is a member of NERSC's Advanced Technology Group, where his focus is on networks and modeling/analysis of distributed systems. Prior to joining NERSC, he was a graduate researcher at Sandia National Laboratories Center for Computing Research.
For a complete CV see TGroves-cv.pdf .
Most up to date publications on Google Scholar
Personal site: http://taylorgroves.com
Kurt Ferreira, Ryan E. Grant, Michael J. Levenhagen, Scott Levy, Taylor Groves, "Hardware MPI Message Matching: Insights into MPI Matching Behavior to Inform Design", Concurrency and Computation Practice and Experience, December 1, 2018,
Taylor Groves, Ryan Grant, Aaron Gonzales, Dorian Arnold, "Unraveling Network-induced Memory Contention: Deeper Insights with Machine Learning", Transactions on Parallel and Distributed Systems, November 21, 2017, doi: 10.1109/TPDS.2017.2773483
Remote Direct Memory Access (RDMA) is expected to be an integral communication mechanism for future exascale systems enabling asynchronous data transfers, so that applications may fully utilize CPU resources while simultaneously sharing data amongst remote nodes. In this work we examine Network-induced Memory Contention (NiMC) on Infiniband networks. We expose the interactions between RDMA, main-memory and cache, when applications and out-of-band services compete for memory resources. We then explore NiMCs resulting impact on application-level performance. For a range of hardware technologies and HPC workloads, we quantify NiMC and show that NiMCs impact grows with scale resulting in up to 3X performance degradation at scales as small as 8K processes even in applications that previously have been shown to be performance resilient in the presence of noise. Additionally, this work examines the problem of predicting NiMC's impact on applications by leveraging machine learning and easily accessible performance counters. This approach provides additional insights about the root cause of NiMC and facilitates dynamic selection of potential solutions. Lastly, we evaluated three potential techniques to reduce NiMCs impact, namely hardware offloading, core reservation and network throttling.
Abhinav Bhatele, Jayaraman J. Thiagarajan, Taylor Groves, Rushil Anirudh, Staci A. Smith, Brandon Cook, David Lowenthal, "The Case of Performance Variability on Dragonfly-based Systems", IPDPS 2020, May 21, 2020,
George Michelogiannakis, Yiwen Shen, Min Yee Teh, Xiang Meng, Benjamin Aivazi, Taylor Groves, John Shalf, Madeleine Glick, Manya Ghobadi, Larry Dennison, Keren Bergman, "Bandwidth Steering in HPC using Silicon Nanophotonics", International Conference on High Performance Computing, Networking, Storage and Analysis (SC'19), November 17, 2019,
Sudheer Chunduri, Taylor Groves, Peter Mendygral, Brian Austin, Jacob Balma, Krishna Kandalla, Kalyan Kumaran, Glenn Lockwood, Scott Parker, Steven Warren, Nathan Wichmann, Nicholas Wright, "GPCNeT: Designing a Benchmark Suite for Inducing and Measuring Contention in HPC Networks", International Conference on High Performance Computing, Networking, Storage and Analysis (SC'19), November 16, 2019,
Network congestion is one of the biggest problems facing HPC systems today, affecting system throughput, performance, user experience and reproducibility. Congestion manifests as run-to-run variability due to contention for shared resources like filesystems or routes between compute endpoints. Despite its significance, current network benchmarks fail to proxy the real-world network utilization seen on congested systems. We propose a new open-source benchmark suite called the Global Performance and Congestion Network Tests (GPCNeT) to advance the state of the practice in this area. The guiding principles used in designing GPCNeT are described and the methodology employed to maximize its utility is presented. The capabilities of GPCNeT evaluated by analyzing results from several world’s largest HPC systems, including an evaluation of congestion management on a next-generation network. The results show that systems of all technologies and scales are susceptible to congestion and this work motivates the need for congestion control in next-generation networks.
Tiffany Connors, Taylor Groves, Tony Quan, Scott Hemmert, "Simulation Framework for Studying Optical Cable Failures in Dragonfly Topologies", Workshop on Scalable Networks for Advanced Computing Systems in conjunction with IPDPS, May 17, 2019,
Nathan Hjelm, Matthew Dosanjh, Ryan Grant, Taylor Groves, Patrick Bridges, Dorian Arnold, "Improving MPI Multi-threaded RMA Communication Performance", ACM International Conference on Parallel Processing (ICPP), August 1, 2018,
Kurt Ferreira, Ryan E. Grant, Michael J. Levenhagen, Scott Levy, Taylor Groves, "Hardware MPI Message Matching: Insights into MPI Matching Behavior to Inform Design", ExaMPI in association with SC17, November 12, 2017,
Taylor Groves, Yizi Gu, Nicholas J. Wright, "Understanding Performance Variability on the Aries Dragonfly Network", HPCMASPA in association with IEEE Cluster, September 1, 2017,
Matthew GF Dosanjh, Taylor Groves, Ryan E Grant, Ron Brightwell, Patrick G Bridges, "RMA-MT: a benchmark suite for assessing MPI multi-threaded RMA performance", Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium on, IEEE, September 1, 2016, 550--559,
Taylor Groves, Ryan E Grant, Dorian Arnold, "NiMC: Characterizing and eliminating network-induced memory contention", Parallel and Distributed Processing Symposium, 2016 IEEE International, January 1, 2016, 253--262,
Taylor Groves, Ryan E Grant, Scott Hemmer, Simon Hammond, Michael Levenhagen, Dorian C Arnold, "(SAI) Stalled, Active and Idle: Characterizing Power and Performance of Large-Scale Dragonfly Networks", Cluster Computing (CLUSTER), 2016 IEEE International Conference on, January 1, 2016, 50--59,
Taylor Groves, Samuel K Gutierrez, Dorian Arnold, "A LogP Extension for Modeling Tree Aggregation Networks", Cluster Computing (CLUSTER), 2015 IEEE International Conference on, 2015, 666--673,
Joshua D Goehner, Taylor L Groves, Dorian C Arnold, Dong H Ahn, Gregory L Lee, "An Optimal Algorithm for Extreme Scale Job Launching", Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on, 2013, 1115--1122,
Taylor Groves, Dorian Arnold, Yihua He, "In-network, Push-based Network Resource Monitoring: Scalable, Responsive Network Management", Proceedings of the Third International Workshop on Network-Aware Data Management, 2013, 8,
Xiao Chen, Jian Shen, Taylor Groves, Wu Jie, "Probability Delegation Forwarding in Delay Tolerant Networks", Computer Communications and Networks, 2009. ICCCN 2009. Proceedings of 18th Internatonal Conference on, IEEE, January 1, 2009,
Ryan E. Grant, Taylor Groves, Simon Hammond, K. Scott Hemmert, Michael Levenhagen, Ron Brightwell, "Handbook of Exascale Computing: Network Communications", (ISBN:978-1466569003 Chapman and Hall: January 1, 2017)
Taylor Groves, Networks, Damn Networks and Aries, NERSC CS/Data Seminar, October 6, 2017,
Presentation of the performance of the Cori Aries network. Highlights of monitoring and analysis efforts underway.