Application Porting and Performance
Many applications will need code modifications in order to run efficiently on Cori's Intel Xeon Phi "Knights Landing" manycore processors. Applications need to have good thread scalability to take advantage of the 68-core Xeon Phi processor, a data structure layout that can effectively use the 16 GB of onboard MCDRAM fast memory, and loop structures that exploit the 512-bit vector units. In the web pages that follow we document strategies that can help you improve your application's performance. While achieving good performance on Cori may take some work, the good news is that optimizations made for Cori will very likely improve your code's performance on other architectures.
The purpose of this page is to get you started thinking about how to optimize your application for the Knights Landing (KNL) Architecture that will be on Cori. This page will walk you through the high level steps and give an example using a real application that runs at NERSC. How Cori Differs From Edison There are several important differences between the Cori (Knight's Landing) node architecture and the Edison (Ivy Bridge) node architecture that require special attention from application… Read More »
NERSC staff along with engineers have worked with NESAP applications to prepare for the Cori-Phase 2 system based on the Xeon Phi "Knights Landing" processor. We document the several optimization case studies below. Our presentations at ISC 16 IXPUG Workshop can all be found: https://www.ixpug.org/events/ixpug-isc-2016 Other pages of interest for those wishing to learn optimization strategies of Cori Phase 2 (Knights Landing): Getting Started Measuring Arithmetic Intensity Measuring and… Read More »
Enabling your application to take advantage of vectorization is an important component of achieving high performance on today's supercomputers. Vectorization allows you to execute a single instruction on multiple data objects in parallel within a single CPU core, thus improving performance. Read More »
Deslippe, Jack, Brian Austin, Chris Daley, Woo-Sun Yang. “Lessons Learned from Optimizing Science Kernels for Intel's “Knights Corner" Architecture.” Computing in Science & Engineering, 17(3), pp.30-42. 2015 - http://scitation.aip.org/content/aip/journal/cise/17/3/10.1109/MCSE.2015.28 Zhao, Zhengji, Martijn Marsman, "Estimating the Performance Impact of the MCDRAM on KNL Using Dual-Socket Ivy Bridge nodes on Cray XC30", https://cug.org/, London, UK, May 11, 2016 -… Read More »