Topics Covered
1. PBS Pro Configuration at ANURAG
a. PBS Pro HA Complex
b. Compute Nodes
c. Queue Configuration
d. Scheduling Policy
e. Access Control List
f. Job submission model
2. How to submit Jobs
3. User and Admin commands
PBS Pro Configuration at ANURAG
Pcicture1: PBS Pro complex configuration at ANURAG
PBS Pro HA Complex
PBS Pro is configured in high availability mode.
Failure of one host doesnt affect the users jobs, the secondary server (passivemode) will become active in case of primary failure.
idc-master1 is primary PBS Pro server.
idc-master2 is secondary PBS Pro server.
PBS Pro server and scheduler daemons run on these systems whichever is active.
PBS Pro Compute Nodes
Compute nodes from cs1-cn1 to cs1-cn49 are configured as execution hosts.
PBS MoM daemon runs on these nodes. PBS MoM talks to PBS Server and vice
versa for user program execution and usage reporting etc.,
Users jobs are assigned with one or more of these nodes based on the policy in
force and based on the users resource request.
Note: Nodes from cn1-cn5 have 32 cores and nodes from cn6-cn49 have 16 cores.
Queue Configuration
The following queues are configured
Ilight Queue
The ilight queue is intended to submit jobs that require very little CPU
resource. So the nodes in this queue are over subscribed.
cs1-cn6 has actual 16 physical cores but it is configured to use 320 slots.
Iheavy Queue
The iheavy queue is intended to submit jobs that require more CPU
resource but not 100% per core. So the nodes in this queue are only 50%
over subscribed i.e., 2 slots / physical core.
Nodes cs1-cn7 to cs1-cn10 are assigned to this queue.
128 slots per node.
Bnormal Queue
This is the default queue.
Rest of the nodes are assigned to this queue.
The nodes in this queue are not over subscribed. 1 slot / core
The jobs that require high CPU resources are submitted to this.
Scheduling Policy
Fairshare is configured at ANURAG.
Concept of fairshare
A fair method for ordering the start times of jobs, using resource usage
history
A scheduling tool which allocates certain percentages of the system to
specified users or collections of users
Ensures that jobs are run in the order of how deserving they are
Basic outline of how fairshare works
Scheduler collects usage from the server at every scheduling cycle
The resource whose usage is tracked for fairshare is ncpus*walltime
Scheduler chooses which fairshare entity is most deserving
The job to be run next is selected from the set of jobs belonging to the
most deserving entity, and then the next most deserving entity, and so on
The fairshare is configured to have equal shares for all users at ANURAG.
Example of fairshare model:
Access Control List
The queues are configured with group level access control list.
Only users belongs to VLSI are authorized to submit jobs to ilight, iheavy and
bnornal Queues.
However this can be reconfigured based on need.
User job submission model
Terminal Servers:
idc-tcserver1
idc-tcserver2
Terminal servers are installed with PBS Pro commands.
Users at ANURAG connect to Terminal servers for job submission.
PBS Pro commands talks to PBS Server for job submission, job query, job
monitoring etc.,
User Commands
How to Submit Jobs
To use PBS, you create a batch job, usually just called a job, which you then hand off, or
submit, to PBS. A batch job is a set of commands and/or applications you want to run on
one or more execution machines, contained in a file or typed at the command line. You
can include instructions which specify the characteristics such as job name, and
resource requirements such as memory, CPU time, etc., that your job needs. The job file
can be a shell script under UNIX, a cmd batch file under Windows, a Python script, a Perl
script, etc.
For example, here is a simple PBS batch job file which requests one hour of time,
400MB of memory, 4 CPUs, and runs my_application:
To submit the job to PBS, you use the qsub command, and give the job script as an
argument to qsub. For example, to submit the script named my_script:
#!/bin/sh
#PBS -l walltime=1:00:00
#PBS -l mem=400mb,ncpus=4
./my_application
qsub my_script
Command
Purpose
pbsnodes a
qsub
qdel
qstat
qstat -B
qstat -Bf
qstat -Q
qstat -Qf
tracejob
xpbs
xpbsmon
Show status of nodes in cluster
Command to submit a job
Command to delete a job
Show job, queue, & Server status
Briefly show PBS Server status
Show detailed PBS Server status
Briefly list all queues
Show detailed queue information
Extracts job info from log files
PBS graphical user interface
PBS graphical monitoring tool
qsub:
The command qsub allows to submit jobs to the batch-system. qsub uses the
following syntax:
qsub [options] job_script [ argument ... ]
where job_script represents the path to a script containing the commands to be
run on the system using a shell.
There are two ways to submit a job.
Method 1:
You may add the options directly to the qsub command, like:
qsub -o pbs_out.dat -e pbs_err.dat job_script [ argument ... ]
Method 2:
The qsub options can also be written to the job description file job_script, which is
then passed on to qsub on the command line:
qsub job_script [ argument ... ]
The contents of job_script may look like the following:
#!/bin/bash
#PBS -o pbs_out.dat
#PBS -e pbs_err.dat
./your_commands
Please note that any PBS directives after the first shell command are ignored by the
system. So keep PBS directives at the beginning of your job file.
Description of the most commonly used options to qsub:
Input/Output
-o path
standard output file
-e path
standard error file
-j oe
join standard error to standard output
Notification
-M email address
notifications will be sent to this email address
-m b|e|a|n
notifications on the following events:
begin, end, abort, no mail (default)
Do not forget to specify an email address (with
-M) if you want to get these notifications.
Resources
-l
walltime=[hours:minutes:]seconds
requested real time; the default (maximum)
depends on the system and, if applicable, the
chosen queue
-l select=N:ncpus=NCPU
request N times NCPU slots (=CPU cores) for the
job (default for NCPU: 1)
-l select=N:mem=size
request N times size bytes of memory for the
job
-W depend=afterok:job-id
start job only if the job with job id job-id has
finished successfully
Other useful options
-N name
optional name of the job
Sequential jobs
qsub job_script
where the contents of job_script may look like this:
#!/bin/bash
# Redirect output stream to this file.
#PBS -o pbs_out.dat
# Redirect error stream to this file.
#PBS -e pbs_err.dat
# Send status information to this email address.
#PBS -M abc@xyz.com
# Send an e-mail when the job is done.
#PBS -m e
# Change to current working directory (directory where qsub was executed)
cd $PBS_O_WORKDIR
# For example an additional script file to be executed in the current
# working directory. In such a case assure that script.sh has
# execution rights (chmod +x script.sh).
./script.sh
Parallel MPI jobs
qsub job_script
where the contents of job_script may look like this:
#!/bin/bash
# Redirect output stream to this file.
#PBS -o pbs_out.dat
# Join the error stream to the output stream.
#PBS -j oe
# Send status information to this email address.
#PBS -M abc@xyz.com
# Send me an e-mail when the job has finished.
#PBS -m e
# Reserve resources for 16 parallel processes
# (Note: the default ncpus=1 could also be omitted)
#PBS -l select=1:ncpus=16
# Change to current working directory (directory where qsub was executed)
cd $PBS_O_WORKDIR
NSLOTS=`cat $PBS_NODEFILE | wc -l`
mpirun -np $NSLOTS ./your_mpi_executable [extra params]
Parallel OpenMP jobs
qsub job_script
where the contents of job_script may look like this:
#!/bin/bash
# Redirect output stream to this file.
#PBS -o pbs_out.dat
# Join the error stream to the output stream.
#PBS -j oe
# Send status information to this email address.
#PBS -M abc@xyz.com
# Send me an e-mail when the job has finished.
#PBS -m e
# Reserve resources for 16 threads
#PBS -l select=1:ncpus=16
# Change to current working directory (directory where qsub was executed)
cd $PBS_O_WORKDIR
# OMP_NUM_THREADS is automatically set to 16 with the above select statement
./your_openmp_executable
Submitting interactive jobs
The submission of interactive jobs is useful in situations where a job requires some
sort of direct intervention. This is usually the case for X-Windows applications or in
situations in which further processing depends on your interpretation of immediate
results. A typical example for both of these cases is a graphical debugging session.
Note: Interactive sessions are particularly helpful for getting acquainted with the
system or when building and testing new programs.
Starting interactive sessions within PBS only requires specifying the option -I to
the qsub command, so in the simplest case
qsub I
This will bring up a further Bash session on the system. To ensure X forwarding to
the X-server display indicated by your actual DISPLAY environment variable, add
the option -v DISPLAY to your qsub command. All qsub options can be specified as
usual on the command line, or they can be provided within an optionfile with
qsub -I optionfile
A valid option file might contain the following lines (only #PBS directives are parsed
within this file):
# Name your job
#PBS -N myname
# Export some of your current environment variables,
#PBS -v var1[=val1],var2[=val2],...
# e.g. your current DISPLAY variable for graphical applications
#PBS -v DISPLAY
# Reserve resources for 8 MPI processes
#PBS -l select=8
# Specify one hour runtime
#PBS -l walltime=1:0:0
When your session was started prepare your environment as needed, e.g. by
loading all necessary modules and then start your programs as usual. For your
parallel applications refer to the script.sh files for parallel MPI, respectively parallel
OpenMP batch jobs above.
Note: Make sure to end your interactive sessions as soon as they are no longer
needed!
Observing jobs (qstat)
To get information about running or waiting jobs use
qstat [options]
Useful qstat options :
uuser prints all jobs of a given user
prints full information of the job with the given job-id. Note that information
such as resources_used.ncpusare on resources allocated to the job by
fjobPBSPro, not actual resource consumption. Use top -u $USER to observe your
id
program's behaviour in real time.
-aw
prints more information in a wider display
-help prints all possible qstat options
In case of pending jobs, you might get a hint on when your job will be started, by
executing
qstat -awT
In this case the runtime of waiting jobs is replaced with their estimated start time.
Don't be alarmed, though, this only provides a worst case estimation.
Deleting a job (qdel)
To delete a job with the job identifier job-id, execute
qdel job-id
Input/output staging
The two streams stdout and stderr are directed to a special directory
(naming: pbs.$PBS_JOBID.x8z) subordinate to your home directory and copied to
the job's working directory only at the end of the job.
Note: If your application writes large amounts of data to stdout and/or stderr,
please redirect this output directly to a location within your $SCRATCH space.
Otherwise your jobs might be terminated for exceeding your $HOME quota. You can
redirect both the output to stdout and to stderr of your application my_executable to
a file job_output.datwithin your current working directory with:
./my_executable >job_output.dat 2>&1
PBS umask setting
Per default PBS jobs run under the uncommon umask setting 0077 (i.e. all
read/write/execute permissions for group members or others are removed). If you
need another umask setting for your job runs (e.g., if you want to share
information with group members) please specify your desired setting explicitly
within your job scripts. For example
umask 0022
sets the usual Linux bit mask, removing write permissions for group and others.
For more information please refer PBS Professional
Administrators Guide
User Guide
Reference guide
These can be downloadable from User Area.