The general guiding principle of working on PDSF is that the bulk of the workload should be shifted to the compute nodes.
When a user first logs into PDSF, they end up on one of four computers: pdsf1, pdsf2, pdsf3, or pdsf4. These four computers are called the interactive nodes. If a job is run that requires a lot of computational resources this can slow down the node for everyone. Whenever possbile, jobs should be run on the compute nodes.
Run your job on the compute nodes if:
- It will consume more than 10% of the CPU
- It will take more than 1 hour to finish
If you want to know how much cpu your job is taking, you need to log into the same interactive node in new window. You can do this by typing ssh pdsf[1-4] in a new terminal. Once you are logged in you can see how much CPU your job is taking by typing top. If your job is taking too much CPU, please kill it and restart it on the compute nodes.
Running on the Compute Nodes
You run on the compute nodes by submitting an 'interactive' job to the queue (instructions are here). This will give you a terminal that will act just like the interactive nodes, except it will be on a compute node. You can run jobs that require a lot of CPU without slowing things down for everyone else. You could even run jobs longer than 1 hour, but for longer jobs you may want to think about writing a script and submitting it to the batch queue.
Any time you are working with data from the eliza disks, it is better to use one of pdsf's data transfer nodes (their names are pdsfdtn1 and pdsfdtn2). This includes any time you are doing a cp, mv, or rm with data on the eliza disks. This also includes doing a remote copy (via scp) from another system. The data transfer nodes have much faster connections, so your data will get where you want it much faster. If you're running one of these on an interactive node, it will slow the node down for everyone else.
Deleting Your Jobs
By accident, one of your jobs is taking up too much memory and you need to kill it. You can find your job id by typing
ps -efl | grep <your_user_name>
This command should be run on the same interactive node where the job is running. You'll see a line that looks like this:
0 S usgweb <pid> 31189 0 80 0 - 6247 poll_s 18:08 pts/15 00:00:00 python
Where <pid> is a number that represents the process ID of the job. You can kill this job by typing:
kill -9 <pid>