NERSC Bassi update

From: Richard Gerber (ragerber_at_lbl_dot_gov)
Date: 12/21/2007

  • Next message: Jonathan Carter: "Franklin login node down"
    Dear Bassi users,
    
    Holiday Greetings! We hope Bassi continues to be a productive machine 
    for you. The system was
    recently in uninterrupted service for more than three months before last 
    week's brief system reboot.
    
    We recycled the machine on Dec. 13 to clear "stuck" memory segments on 
    many of the nodes.
    Some memory on the affected nodes was unavailable to your applications. 
    While most codes were
    probably unaffected, those with extreme memory bandwidth requirements, 
    or those trying to use
    all available memory, may have had problems. All nodes are back to 
    normal and we are
    working with IBM to keep this from happening again.
    
    This issue was identified by NERSC's performance monitoring efforts, the 
    results of which are
    available online at 
    https://www.nersc.gov/nusers/systems/bassi/monitor.php. Examine the 
    "memrate"
    results to see how an extremely memory intensive routine was affected by 
    the recent problems.
    
    We have been saving memory "snaphots" of each node every 15 minutes and 
    this data
    for each job is available from the NERSC completed jobs page at 
    https://www.nersc.gov/nusers/status/jobs/index.php
    If your job ran for 30 minutes or more you can click on the Job ID 
    (Bassi jobs only) and get a look at
    how your job used memory on each node (not each MPI task) as a function 
    of time. If you ran using the
    NERSC IPM performance utility, the memory snapshots will appear on the 
    same page with your IPM results.
    We hope these web pages automatically provide you with valuable 
    information about your run with little to no
    effort on your part. If you are unfamiliar with IPM, please see 
    http://www.nersc.gov/nusers/resources/software/tools/ipm.php
    
    Finally, if you are moving from Seaborg to Bassi, please review the 
    Quick Start Guide for Seaborg Users at
    http://www.nersc.gov/nusers/systems/bassi/quick.php
    
    Regards,
    Richard Gerber
    
    -- 
    Richard Gerber, Ph.D.                      ragerber_at_lbl_dot_gov  
    NERSC                                      phone: 510-486-6820
    Lawrence Berkeley National Lab             fax:   510-486-4316 
    Berkeley, CA 94720
    

  • Next message: Jonathan Carter: "Franklin login node down"

    This archive was generated by hypermail 2.1.6 : 12/21/2007 PST