BassiLogging In Accounts/Charges File Storage Programming Running Jobs Software AIX Environment IBM Manuals Detailed Specs Node Network Map Bassi Timeline Benchmark Codes Relative Performance Performance Monitoring Links Status & StatsUP Wed 10/31 14:51NERSC MOTD Announcements Known Problems Current Queue Look Completed Jobs List Job Stats |
Bassi PerformanceThis page contains performance measurements on a sample of well-known benchmarks codes made on Bassi and performance comparisons as a function of changing compiler and run-time parameters. Comparisons are made to other NERSC machines: Seaborg, an IBM SP Power3 System (375 MHz) that was decommissioned in January 2008; Jacquard, a 2.2 GHz Opteron/Infiniband cluster, and Franklin, a Cray XT4, with dual-core 2.6 GHz Opteron processors. Some useful performance measurement tools are available on Bassi. Bassi performance measurements and dependencies
Benchmark ResultsNAS Parallel Benchmarks (NPB)NPB 2.3 CLASS B SerialSerial benchmarks are measured on a "packed node." On Bassi 8 simultaneous instanaces of the benchmark are executed on a single node. The numbers in this table are based on averages over many ongoing measurements.
NPB 2.4 CLASS D ParallelParallel benchmarks are run on "packed nodes." On Bassi 8 tasks are executed on a each node. The numbers in this table are based on averages over many ongoing measurements.
MemrateA facsimile of the STREAMS/StarSTREAM benchmark. The Single TRIAD measures memory bandwidth from one processor; the Multi TRIAD mesures the bandwidth per processor from all 8 processors simultaneously.
MPI Test64 tasks on 8 nodes.
PIORAW Disk Read/Write Bandwidth TestThe NERSC PIORAW benchmark tests read and write bandwidth, using one file per MPI task. The standard Bassi test uses 32 tasks, a buffer size of 2 MB and individual file sizes of 2.5 GB.
Application Benchmarks
Performance dependence on configuration parametersBeing compiled.Large Page Memory vs. Small Page Memory
HPS Bulk Transfer (RDMA)POWER 5 on HPS has a "bulk transfer" or "RDMA" (Remote Direct Memory Access) mode that improves message passing bandwidth for large messages. It is enabled in LoadLeveler script with the keyword: #@bulkxfer=yes The graph below shows the point-to-point bandwidth as a fuction of message size with the default settings of MG_EAGER_LIMIT=32K and MP_BULK_MIN_MSG_SIZE=4096. RDMA will never be used for a message size of 4KB or less.
Click on graph for larger PDF version.
MPI LatencyWhen running parallel codes that do not explicitly create threads (multiple MPI tasks are OK), set the environment variable MP_SINGLE_THREAD to improve HPS latency.
Performance dependence on compiler optionsPresentation from June 2006 NUG meeting (PDF Format). IBM Publications | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
Page last modified: Wed, 09 Apr 2008 17:26:30 GMT Page URL: http://www.nersc.gov/nusers/systems/bassi/perf.php Web contact: webmaster@nersc.gov Computing questions: consult@nersc.gov Privacy and Security Notice |
![]() |