PDSF users meeting 2/1/11
Attending: Eric, Katie and Jay from NERSC and users Jeff P. and Andre.
Cluster Status/Utilization: Cluster has been filled to capacity. There was some odd scheduling (icecube had far too many jobs running) which was fixed by doing an SGE reboot.
Outages: No major outages. SRM needed to be restarted and we discussed getting Iwona out of the loop.
Upcoming downtimes: There will be a reboot of each of the interactives on Thursday between noon and two. This is necessary in order to retire sl302.
Procurements/New Hardware: kamland disk will be in this week and it will be necessary to have an all day maintenance on /eliza5.
- We took a look at the new PDSF pages, in particular the announcements feature.
- cvmfs - not much testing done yet, will do more before the next meeting.
- torque test - not much done yet
- Bad job submissions - Especially STAR people seem to submit a lot of jobs without testing first and sometimes they fail. Jeff P. to tallk to the STAR users.
- SGE job limits: 5k jobs/user, 30k jobs total are in place and documented.
- Project accounts: There was a meeting to discuss reviving the effort. Although a decision was not made it seemed positive and the effort will at least be reassessed.
- hardware retirement: Made plans to make eliza12 and eliza13 read only 2/15 in preparation for retirement.
- Quark matter is in May so it is expected that the cluster will be very busy until then. Abstracts are due 4/8.