NERSCPowering Scientific Discovery Since 1974

Taylor Groves

XBD201703 00049060
Taylor Groves Ph.D.
HPC Performance Engineer
Advanced Technology Group
National Energy Research Scientific Computing Center
1 Cyclotron Rd
Mailstop: 59R4010A (office 59-3072B)
Berkeley, California 94720 US

Biographical Sketch

Taylor is a staff member in the Advanced Technology Group (ATG).  His research is focused on HPC networks, performance analysis, benchmarking and simulation.  Prior to joining NERSC, Taylor was a graduate researcher at Sandia National Laboratories Center for Computing Research.

He earned his BS in Computer Science from Texas State University.  His MS and PhD were both earned at the University of New Mexico as a part of the Scalable Systems Laboratory under Prof. Dorian Arnold.

Journal Articles

Taylor Groves, Ryan Grant, Aaron Gonzales, Dorian Arnold, "Unraveling Network-induced Memory Contention: Deeper Insights with Machine Learning", Transactions on Parallel and Distributed Systems, November 21, 2017, doi: 10.1109/TPDS.2017.2773483

Remote Direct Memory Access (RDMA) is expected to be an integral communication mechanism for future exascale systems enabling asynchronous data transfers, so that applications may fully utilize CPU resources while simultaneously sharing data amongst remote nodes. In this work we examine Network-induced Memory Contention (NiMC) on Infiniband networks. We expose the interactions between RDMA, main-memory and cache, when applications and out-of-band services compete for memory resources. We then explore NiMCs resulting impact on application-level performance. For a range of hardware technologies and HPC workloads, we quantify NiMC and show that NiMCs impact grows with scale resulting in up to 3X performance degradation at scales as small as 8K processes even in applications that previously have been shown to be performance resilient in the presence of noise. Additionally, this work examines the problem of predicting NiMC's impact on applications by leveraging machine learning and easily accessible performance counters. This approach provides additional insights about the root cause of NiMC and facilitates dynamic selection of potential solutions. Lastly, we evaluated three potential techniques to reduce NiMCs impact, namely hardware offloading, core reservation and network throttling.

Conference Papers

Kurt Ferreira, Ryan E. Grant, Michael J. Levenhagen, Scott Levy, Taylor Groves, "Hardware MPI Message Matching: Insights into MPI Matching Behavior to Inform Design", ExaMPI in association with SC17, November 12, 2017,

Taylor Groves, Yizi Gu, Nicholas J. Wright, "Understanding Performance Variability on the Aries Dragonfly Network", HPCMASPA in association with IEEE Cluster, September 1, 2017,

Matthew GF Dosanjh, Taylor Groves, Ryan E Grant, Ron Brightwell, Patrick G Bridges, "RMA-MT: a benchmark suite for assessing MPI multi-threaded RMA performance", Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium on, IEEE, September 1, 2016, 550--559,

Taylor Groves, Ryan E Grant, Dorian Arnold, "NiMC: Characterizing and eliminating network-induced memory contention", Parallel and Distributed Processing Symposium, 2016 IEEE International, January 1, 2016, 253--262,

Taylor Groves, Ryan E Grant, Scott Hemmer, Simon Hammond, Michael Levenhagen, Dorian C Arnold, "(SAI) Stalled, Active and Idle: Characterizing Power and Performance of Large-Scale Dragonfly Networks", Cluster Computing (CLUSTER), 2016 IEEE International Conference on, January 1, 2016, 50--59,

Taylor Groves, Samuel K Gutierrez, Dorian Arnold, "A LogP Extension for Modeling Tree Aggregation Networks", Cluster Computing (CLUSTER), 2015 IEEE International Conference on, 2015, 666--673,

Joshua D Goehner, Taylor L Groves, Dorian C Arnold, Dong H Ahn, Gregory L Lee, "An Optimal Algorithm for Extreme Scale Job Launching", Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on, 2013, 1115--1122,

Taylor Groves, Dorian Arnold, Yihua He, "In-network, Push-based Network Resource Monitoring: Scalable, Responsive Network Management", Proceedings of the Third International Workshop on Network-Aware Data Management, 2013, 8,

Xiao Chen, Jian Shen, Taylor Groves, Wu Jie, "Probability Delegation Forwarding in Delay Tolerant Networks", Computer Communications and Networks, 2009. ICCCN 2009. Proceedings of 18th Internatonal Conference on, IEEE, January 1, 2009,

Book Chapters

Ryan E. Grant, Taylor Groves, Simon Hammond, K. Scott Hemmert, Michael Levenhagen, Ron Brightwell, "Handbook of Exascale Computing: Network Communications", (ISBN:978-1466569003 Chapman and Hall: January 1, 2017)

Presentation/Talks

Taylor Groves, Networks, Damn Networks and Aries, NERSC CS/Data Seminar, October 6, 2017,

Presentation of the performance of the Cori Aries network.   Highlights of monitoring and analysis efforts underway.

Doug Jacobsen, Taylor Groves, Global Aries Counter Collection and Analysis, Cray Quarterly Meeting, July 25, 2017,

Taylor Groves, Characterizing Power and Performance in HPC Networks, Future Technologies Group at ORNL, January 10, 2017,

Taylor Groves, Characterizing and Improving Power and Performance in HPC Networks, Advanced Technology Group -- NERSC, January 8, 2017,

Taylor Groves, Improving Power and Performance in HPC Networks, AMD Research - Austin, June 10, 2016,

Reports

Taylor Groves, Ryan Grant, "Power Aware, Dynamic Provisioning of HPC Networks", Sandia National Labs report, 2015,

Taylor Groves, Kurt B Ferreira, "BALANCING POWER AND TIME OF MPI OPERATIONS", CCR, 2014,

Taylor Groves, Jeff Knockel, Eric Schulte, "BFS vs CFS scheduler comparison", 2009,

Posters

Taylor Groves, "Characterizing and Improving Power and Performance in HPC Networks (Doctoral Showcase)", Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 1, 2016,

Taylor Groves, Ryan Grant, Dorian Arnold, "Network-induced Memory Contention.", Salishan Conference on High Speed Computing, Gleneden Beach, OR,, April 1, 2016,