KEMBAR78
Using A CPU Farm | PDF | Compiler | Scripting Language
0% found this document useful (0 votes)
59 views22 pages

Using A CPU Farm

„ This document provides information about CPU farms and how to use them. A CPU farm consists of a front-end machine where users log in, disk storage, and multiple farm nodes that do processing work. Jobs are submitted from the front-end using commands like qsub and can be monitored with qstat. The local disks on nodes should be used for temporary files to reduce network usage. Scheduling many jobs requires scripts and cronjobs to control resource usage.

Uploaded by

irzelindo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views22 pages

Using A CPU Farm

„ This document provides information about CPU farms and how to use them. A CPU farm consists of a front-end machine where users log in, disk storage, and multiple farm nodes that do processing work. Jobs are submitted from the front-end using commands like qsub and can be monitored with qstat. The local disks on nodes should be used for temporary files to reduce network usage. Scheduling many jobs requires scripts and cronjobs to control resource usage.

Uploaded by

irzelindo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Using a CPU farm

Last Time and Todayy

„ Last time we discussed strings in the context


of PERL.
‰ Other advanced scripting languages have
similarly powerful tools.

„ Today we talk about CPU farms.


farms
‰ To maximise their power we will take advantage
of advanced scripting
scripting.
What is a CPU farm.

„ A CPU ffarm iis a collection


ll ti off processors th
thatt
can be used to process many jobs in parallel.

„ It is not strictlyy speaking


p gpparallel p
processing
g
as the jobs can be carried out in series.

„ CPU farms can allow you to carry out much


more intensive analysis than if possible if you
just has a single CPU.
What does a farm possess.
p

„ A front
f t end
d
‰ This will be the machine that you log onto.
„ Disk
‰ There should be “a lot” of disk space available that you can
access from the front end and the nodes.
„ Many farm nodes.
‰ These are the CPUs that do the work.
‰ They will often also have their own local disk.
„ A network
‰ The nodes will be connected to the front end by a network.
‰ The network capacity can be the limiting factor.
The front end
„ When using g the farm yyou will spend
p most of yyour
time on the front end.
„ Typically this will have the same OS as the nodes.
‰ You can compile code here.
„ You submit jobs from the front end to the nodes.
„ You manage the disk on the front end
„ You might take a quick look at the output here
here.
„ Remember the front end will have many other users,
so try and be as undisruptive as possible
possible.
The nodes
„ The nodes are where your CPU time occurs.
„ Usually they will have local disk.
‰ Using this will cut down on network traffic.
„ Improves farm performance
performance.
‰ Be careful about how much space is available.
„ On some farms the same box may be several
nodes.
‰ Dual CPU machines
‰ Hyperthreading.
„ They will have high memory, but watch out for
programs with
ith very hi
high
h memory usage, th
they may
not play well together.
JJobs on a farm

„ JJobsb on a ffarm may be


b th
thought
ht off a bit lik
like
files on a file system.
„ There are commands that can
‰ list them
‰ delete them
„ Theyy have an owner
„ You cannot
‰ copy them
‰ do (many) operations on them.
An example.
p

„ To illustrate the use of a farm I will use an


example.
„ The CPU farm at RAL PPD
„ Generally you will be able to get an account
here if you need it (hep).
„ The commands are one implementation of
the grid engine.
‰ Most farms use the same commands.
‰ All farms have commands that do the same thing.
g
Submittingg a job
j

„ You use the command qsub

„ This will return a report to the screen, that


includes a unique job id for the job you just
submitted.
‰ Y may need
You d thi
this jjob
b id llater.
t

„ qsub my_job.scr
Listingg the jobs
j submitted to a farm.
„ To list the jjobs use the command q
qstat
„ This will tell you
‰ The job name
‰ The job ID
‰ It’s status
‰ The running time
‰ The owner.
„ Use qstat –u username to see the jobs belonging to
a particular user.
„ There are other useful switches
‰ See the man page.
p g
JJob Status
„ Running
‰ A job that is currently running on a node.
‰ You will be able to see how long it’s running with qstat
„ Queued
‰ A job that is waiting for a free node.
„ Terminated
‰ A job that is finished. You won’t see these in qstat
„ Suspended/Error
‰ Something has happened to the job and it’s in a error state.
„ This is probably your fault.
„ But it could be a system error, so it’s worth restarting these
once.
once
Deletingg a job
j

„ You can use the qdel command

‰ qdel jobid

„ The job id can be obtained from qsub or


qstat.
t t
„ You will onlyy be able to delete yyour own jjobs.
‰ Unless you have manager privileges.
Different queues
q

„ Some farms (like RAL) have different queues.


„ These are to separate
p resources for different
groups (experiments) from the main queue.
„ There may also be a fast queue
‰ A queue with few nodes and a short maximum
duration
‰ Good for testing.
„ The queue is specified in the qsub command.
Maximum duration
„ Some q
queues expect
p a maximum duration.
‰ When exceeded the job will be terminated.
„ Set using qsub
‰ At ral qsub –l cput=24:00:00 myjob.scr
„ The duration you set can control which queue
you use and the resources available.
„ Be careful when setting the maximum
duration try and keep it short
duration, short, but long
enough that your job will finish.
The local disk

„ Use the local disk whenever possible


‰ Copy
y data files to it at the beginning.
g g
‰ Use it for output and temp files.
‰ Copy output to main disk at the end.
„ Take care however not to fill the disk.
„ O many systems
On t the
th llocall di
diskk iis available
il bl
as $TMPDIR
General advice
„ Use a shell script to control your job
job.
‰ Don’t directly submit your executable.
‰ Thi gives
This i you more control.
t l
„ Use an advanced script to control your queue
usage.
‰ Write command files
‰ Write job shell scripts.
‰ Submit jobs
jobs.
„ Don’t submit too many jobs at once.
A typical
yp job
j script.
p

#!/bin/tcsh Setup my environment

source ~/env
/env_script.csh
script.csh
Use the local disk.
cd $TMPDIR
Copy needed data to local disk
cp ~/input_data/my_data .
$SNO_CODE/snoman.exe
_ –c mycmd.cmd
y
Run my analysis code
cp result.ntp ~/output_data/
Copy my results back to
the data disk
JJob Master Script
p
„ You
ou ca
can pu
put much
uc that
a we eddiscussed
scussed in the
e
last two lectures into action.
‰ Writing
t g multiple
u t p e co
command
a d files
es a
and
d sshell
e scscripts.
pts
‰ Running system programs and analysing their
output.
‰ Examining the output of your analysis programs.
„ You can put limits on the number of jobs in
two ways.
‰ Using the sleep command when too many jobs
are submitted.
‰ Usingg a cronjob
j
Cronjobs
j

„ A cronjob
j b iis a jjob
b th
thatt runs att a scheduled
h d l d
time.
„ Your cronjobs are controlled by your crontab.
‰ Not allowed on all systems (including RAL).
„ To edit your crontab use
‰ crontab –e
‰ You will use your $EDITOR variable to decide the
editor
‰ You need to exit the editor for the change to take
effect.
A typical
yp contrab. Redirect output.
p
Otherwise you’ll get
Time to run job an email

# My program
0 * * * * my_program
y_p g > /dev/null 2>> /dev/null
#End of crontab.
Comment to end crontab. Need a newline
at the end of each command
„ Time is specified by five variables.
‰ mhdwm
‰ * is a wild card that means any
y
‰ When the system time equals this time the job will
run.
The GRID

„ The GRID is an extension of single site


farms.
‰ A farm of farms.
„ It will be used extensively in LHC and other
experiments.
„ Th use off this
The thi iis b
beyond
d th
the scope off thi
this
course.
‰ When the time comes to use it you can talk to
your collaborators.
Exercises

„ Adapt your multiple command file script to


control job submission to a farm.

„ Write a program to send you a simple email


email,
and add it to your crontab so it will send it at
midnight.
midnight

You might also like