Faucets schedules jobs on a first come first served basis until the nodes are completely allocated. For queued jobs the scheduler will favor jobs belonging to a user without currently running jobs as nodes become available. Research into fairer resource allocation schemes which maintain good utilization is ongoing. Jobs with runtimes longer than 24 hours will be limited to one third of the cluster.
A rough approximation of cluster usage can be found at Cluster Viewer.
fsub [program_name] [options] [arguments] A typical single processor batch submission line on the architecture cluster would be:
ufsub mysimscript.sh -stdout myoutput.out -time "4:0:0"
program_name is the name of your compiled executable or script,
You should NOT provide an mpirun or a charmrun on the command line.
All jobs should have a predicted completion time. As this will facilitate more efficient scheduling of resources. If you do not supply a time the scheduler will assume 12 hours. Running jobs will be terminated after their time completes.
Options
If a processor range needs to be provided for the job
#Processors allocated to the job varies between these bounds
default email address used is userid@arch-gw.cs.uiuc.edu
frun [program_name] [options] [arguments]
frun interactively runs the job on the parallel machine (cool cluster)
program_name is the name of your compiled executable or script,
you should NOT provide an mpirun or a charmrun
Options
If a processor range needs to be provided for the job
#Processors allocated to the job varies between these bounds
default email address used is userid@cool2.cs.uiuc.edu
For Example :
frun +n2 +ppn2 ./hello -time 1:0:0
runs the charm program hello interactively on 2 nodes and 4 processors for 1 hour.
ufrun /bin/ls,
runs /bin/ls on one processor. For single processor jobs ufrun should be used. frun +n2 +ppn2 ./hello -time 1:0:0
frun +n2 +ppn2 ./hello_mpi -time 1:0:0
runs the mpi program hello_mpi interactively on 2 nodes and 4 processors for 1 hour.
fkill [job_id]
The architecture cluster occasionally suffers from transient network faults which make job launching difficult. TSG is studying the problem. The scheduler may appear unresponsive (i.e. hung) as it tries to filter out those nodes. Workarounds to improve scheduler response time under adverse conditions are under development.