NERSCPowering Scientific Discovery Since 1974

2014 PDSF User Meeting Minutes

September 2

Attending

Craig, Mike, Daniel, Iwona, Lisa

Outages/Downtime

8/12 (all day) Migration to global homes

Upcoming Downtime

Eliza18 outage 9/7 - 9/8

Other Issues

Old PDSF homes will be retired soon. Any script that references /u/<user_name> or /home/<user_name> will fail once that happens. SL53 software stack must also be migrated, we don't expect any problems but we will send around an email once we've done it.

Carver has moved to chos. You can control which chos you get on carver using .chos-carver.

Slides

The slides shown at the meeting can be found here.

August 5

Attending

Alex, Xu, Jeff, Iwona, Lisa

Outages/Downtimes

July 14 and 16: Global scratch outage (1 hour each)

Upcoming Downtime

August 12: 7:00 am to 11:00 pm, PDSF will be unavailable. During this maintenance, we will be switching home directories to NERSC global homes (the old PDSF home hardware will be retired). Only data from active users will be copied over. Users who already use NERSC global homes will have their data copied into a directory in their home directory called pdsf_old_home. Users who are not currently using global homes will have their data copied directly into global homes. We're working to make this transistion as smooth as possible, please bear with us while we address any problems.

Other Issues

We will be switching the default chos from Scientific Linux 5.3 to 6.4 on August 12th. If you would like to continue using Scienitific Linux 5.3, please put "sl53" into a .chos file in your home directory.

Slides

The slides shown at the meeting are here.

July 1

Attending

Alex, Mike, Zach, Craig, Jeff, Lisa

Outages/Downtimes

June 2 - 24: Global scratch at 60% of peak

June 28 - ongoing: Project at 80% of peak

June 30 (2 hours): Global scratch outage

Upcoming Downtimes

July 10: pdsfdtn2 offline for disk replacement

Other Issues

SL 6x python and java modules will be upgraded to newer versions on 7/7.

Please let me know if there's any software you'd like installed.

ATLAS requested an update on the status of the new storage.

Jeff inquired about XRootD upgrade.

SL53 will probably be retired in the not too distant future.

Slides

The slides show at the meeting can be seen here.

June 10

Attending

Alex, Mike, Jeff, Lisa

Outages/Downtimes

May 13: Maintenance

June 2 - ongoing: Global scratch degraded

Upcoming Downtimes

None

Other Issues

NERSC is moving to the new bulding at LBNL in Q3 or Q4 of 2015.

Jobs requesting multiple cores are now possible on PDSF.

Slides

The slides from the meeting can be found here.

May 6

Attending

Alex, Mike, Craig, Lisa

Outages/Downtimes

April 22 - May 1: Rolling reboot of Mendel compute nodes

April 23 (1 hour): Rolling reboot of login nodes

Upcoming Downtimes

There will be an all day maintenance on May 13 from 8:00 am to approx 5:00 pm. The cluster will begin draining at 5:00 pm the day before.

Other Issues

Please don't ssh directly to pdsf[1-4]. These nodes are used for testing and may be rebooted at any time, use at your own risk.To logon to PDSF, please go to pdsf.nersc.gov.

Still looking for NERSC global homes testers.

Slides

The slides from the meeing can be found here.

April 1

Attending

Alex, Mike, Zach, Iwona, Lisa

Outages/Downtimes

March 14 (morning): Project outage

March 22 (10 hours): LDAP outage

March 22: Project outage

Upcoming Downtimes

None

Other Issues

The long running job cron has been revived. Users will get automated emails if jobs run too long. So far, jobs are not automatically killed at any point.

Debug queue is available for testing (add "-l debug=1" to submit command). Please use it!

NERSC global homes are mounted on the new interactives. If you want to use it, please add "-l gscratchio=1" to your submit commands. Also, we are looking for volunteers to test NERSC global homes as their PDSF home directory.

Slides

The slides shown at the meeting can be found here.

March 4

Attending

Mike, Alex, Simon, Lisa

Outages/Downtimes

February 5 (1 hour): Project outage

February 11 (all day): NERSC center wide outage

February 21 (1 hour): Load Balancer outage

Upcoming Downtimes

None

Other Issues

New login nodes were deployed mid February. If you encounter any issues, please email consult@nersc.gov. You will be able to access the old interactives by sshing directly to pdsf[1-4].
Global scratch is mounted on the new interactives and the Mendel compute nodes. Remember that global scratch is purged every 12 weeks. It is intended for temporary storage of data. Jobs that access global scratch need to request global scratch IO resources with "-l gscratchio=1".
The global NERSC homes will be mounted shortly so that users can test. PDSF homes will go out of warranty after the move to the hill, we will need to decide if we're going to purchase new homes or use the global NERSC ones. Testers will be appreciated.

Slides

The slides shown at the meeting can be seen here.

February 4

Attending

Mike, Zach, Alex, Brian, Iwona, Lisa

Outages/Downtimes

December 27 - January 10: Eliza2 multiple disk replacements

Upcoming Downtimes

February 11: Center wide NERSC outage, including PDSF. Jobs requesting IO resources will be blocked at 6:00 pm on 2/10. All running jobs will be killed at 8:00 am on 2/11.

Other Issues

New interactives coming online at the end of the week. If you encounter any issues, please email consult@nersc.gov. You will be able to access the old interactives by sshing directly to pdsf[1-4].

Global scratch will be mounted on the new interactives and the Mendel compute nodes after the 2/11 maintenance. Remember that global scratch is purged every 12 weeks. It is intended for temporary storage of data. Jobs that access global scratch need to request global scratch IO resources with "-l gscratchio=1".

The global NERSC homes will be mounted shortly so that users can test. PDSF homes will go out of warranty after the move to the hill, we will need to decide if we're going to purchase new homes or use the global NERSC ones. Testers will be appreciated.

Slides

The slides shown at the meeting can be seen here.

January 7

Attending

Jeff, Iwona, Lisa

Outages/Downtimes

December 10 - 16: Eliza18, several disk replaced

December 20: pdsfdtn2 disks replaced

December 21: /common filled up, 1 hour job submission interrruption

December 27 - now: Eliza2 disk failures

Upcoming Downtimes

None

Other Issues

New interactives are open to beta testers. Please ssh to pdsf6, pdsf7, or pdsf8. Well become new interactives around 2/4/14.

Elizas 3, 8, and 9 have been retired.

Please clean up /common.

AFS 'mother' cell is now run out of BNL.

Slides

You can find the slides shown at the meeting here.