PDSF Users Meeting 9/14/10
Attending: Eric and Jay from PDSF and users Jeff P., Joanna and Marjorie.
Cluster status: Cluster is well utilized, primarily by STAR and ALICE. Discussed ALICE memory requirements of 4GB for now.
Outages: Some problems with jobs using up kernel buffers - mainly ALICE - which requires a reboot. The fix has been identified (kernel patch) and is being done.
Upcoming downtimes: Nothing scheduled but will do new home and common at some point.
New hardware: Getting new hardware in - new nodes, ATLAS storage, network equipment and ALICE storage.
SL302 retirement: Scheduled for the end of October.
- There was some misuse of the "other" queue. Users should not use it for bulk computing and we will work on restricting its use.
- Discussed testing of a new batch system (torque/maui).
- Marjorie noted that they got some excellent support from ops over the weekend.