NERSCPowering Scientific Discovery Since 1974

Woo-Sun Yang

Woo-Sun-Yang.jpg
Woo-Sun Yang , Ph.D.
HPC Consultant
Phone: (510) 486-5735
Fax: (510) 486-4316
1 Cyclotron Road
Mail Stop 943-256
Berkeley, CA 94720

Conference Papers

Wendy Hwa-Chun Lin, Yun (Helen) He, and Woo-Sun Yang, "Franklin Job Completion Analysis", Cray User Group 2010 Proceedings, Edinburgh, UK, May 2010,

The NERSC Cray XT4 machine Franklin has been in production for 3000+ users since October 2007, where about 1800 jobs run each day. There has been an on-going effort to better understand how well these jobs run, whether failed jobs are due to application errors or system issues, and to further reduce system related job failures. In this paper, we talk about the progress we made in tracking job completion status, in identifying job failure root cause, and in expediting resolution of job failures, such as hung jobs, that are caused by system issues. In addition, we present some Cray software design enhancements we requested to help us track application progress and identify errors.

 

Presentation/Talks

Richard A. Gerber, Zhengji Zhao, Woo-Sun Yang, Debugging and Optimization Tools, Presented at UC Berkeley CS267 class, February 2014, February 19, 2014,

Woo-Sun Yang, Debugging Tools, February 3, 2014,

Woo-Sun Yang, Debugging and Performance Analysis Tools at NERSC, BOUT++ 2013 Workshop, September 3, 2013,

Yun (Helen) He, Wendy Hwa-Chun Lin, and Woo-Sun Yang, Franklin Job Completion Analysis, Cray User Group Meeting 2010, May 2010,