Home

Scone user manual

image

Contents

1. 1 0 marfc 1 13 10 04 0 00 00 20 R O 0 0 ustone6 32 0 marfc 1 13 10 04 0 00 00 01 R O 0 0 ustone6 2 jobs O idle 2 running O held This shows that there are two jobs on scone with job number 31 and 32 owned by user marfc running at the moment To delete a job use the condor_rm command on the machine from which you submitted the job The full sequence for submitting listing and removing a job is shown below user scone benchmarks gt condor_submit ustone6 cmd Submitting job s Logging submit event s 1 job s submitted to cluster 33 user scone benchmarks gt condor_q Submitter scone private2 maths bris ac uk lt 172 16 80 65 33772 gt ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 33 0 marfc 1 13 10 07 0 00 00 03 R O 0 0 ustone6 1 jobs O idle 1 running O held user scone benchmarks gt condor_rm 33 0 Job 33 0 marked for removal
2. Scone user manual March 2012 Contents 1 What is Scone 1 1 1 Software available 2 2 020220 0000 4 2 1 2 Things to note ave a8 2h Rh ak ee ee ee 3 2 Logging in 3 2 1 Logging in to Scone from the University Network 3 2 1 1 Linux Machines e a a i e E a e E E a 3 2 1 2 Windows machines aooo a 4 2 2 Logging in to the Scone nodes a aooo 4 3 Getting your files on scone 4 SI Via Sp SEUPr os fuk EY 4 3 2 Via samb k 2 2 2 aa PS eee ee ee ee ee eS 5 4 NAG Libraries 5 AL AbOrts mites Bead ile a oe A ge ees ec eted SA SAE Me Ae 5 AD CLOVE AMY ek AP Eas RE Ee i ath hy the Es ee a 6 5 Using the condor queue 6 5 1 Submittingajob a e amsi e e e e e aa aai a i 6 5 1 1 Writing the condor file a aoaaa a 7 5 1 2 Submitting and managing the condor jobs 7 1 What is Scone Scone is a cluster of Linux servers designed to fulfill the High Performance Computing needs of the department of Mathematics This consists of 30 64 bit machines all of which run Linux 2 6 28 There are eight machines for general use on the scone system There are twelve machines for use by the statistics group of which seven run under a condor job submission scheme There are also nine machines for the applied group which do not use job submission software and of which seven are for the use of the fluids group only Finally there is one for the use of the pure group which again does not use job submission software A summa
3. core lmkl_lapack liomp5 lpthread To use multithreading it is also necessary to declare the environment variable OMP_NUM_THREADS as the number threads to be used for instance export OMP_NUM_THREADS 8 4 2 gfortran NAG libraries for use with the gfortran are installed in the directory usr local nag fil6a22df1 The symbolic link usr local lib libnag gfortran a pointing to usr local nag fll6 a22dfl lib libnag_nag a has been created for simplification Note that to use these libraries the variable NAG _KUSARI_ FILE has to be exported to point to the appropriate license file located in usr local nag fll6a22df1 license license gfortran dat to do so run the command below on the machine before executing your program export NAG _KUSARI_FILE usr local nag fll6a22dfl license license gfortran dat 5 Using the condor queue Remember if you are not in the statistics group you currently have very low priority on the condor queue 5 1 Submitting a job When submitting jobs via condor it is not necessary to actually login to any of the compute servers There are two stages to submitting a job 1 Write a file that describes the job to be submitted 2 Submit the job via condor_ submit 5 1 1 Writing the condor file Please note there is a restriction on condor jobs Chico harpo groucho and barker are short job machines with a two week computation limit whereas morecambe wise and zeppo are
4. e vml barker pr LINUX x86_64 Unclaimed Idle 0 000 2031 0 00 45 17 vm2 barker pr LINUX x86_64 Unclaimed Idle 0 000 2031 0 00 45 14 vm3 barker pr LINUX x86_64 Unclaimed Idle 0 000 2031 0 00 45 11 vm4 barker pr LINUX x86_64 Unclaimed Idle 0 000 2031 0 00 45 08 vml chico pri LINUX x86_64 Unclaimed Idle 0 000 2031 0 00 45 17 vm2 chico pri LINUX x86_64 Unclaimed Idle 0 000 2031 0 00 45 14 vm3 zeppo pri LINUX x86_64 Unclaimed Idle 0 000 2031 0 00 28 04 vm4 zeppo pri LINUX x86_64 Unclaimed Idle 0 000 2031 0 00 28 01 oooooo c o The fields have the following meanings e Name Lists the name of the processor machine combination So vm1 groucho private2 maths bris the first processor on the machine groucho e OpSys The operating system e Arch The CPU architecture Currently all nodes run AMD opteronsx86_ 64 In the future this may change as more machines are added In a multi ar chitecture array jobs may be submitted requesting a specificarchitecture e State Lists the current state of the machine as far as from theviewpoint of condor scheduling This may be one of the following Owner The machine is being used by the owner of the machine for example a member of the appropriate research group and or is not available to run Condor jobs When the machine first starts up it begins in this state Matched The machine is available to run jobs and it has been matched to a specific job Condor has has not yet claimed th
5. gned a home directory in scone the default location is in home local UOB user name This space is intended to hold user files relevant for HPC purposes 3 1 Via scp sftp From Linux you can copy files to scone using the commands scp or sftp Please see the manual pages for these commands for details on their usage Windows clients can make use of the file transfer utility included with the ssh client 3 2 Via samba Home directories in Scone are shared on the network via samba Linux clients in the maths department are configured to automatically mount Scone on the user s home directory Scone files should be available in the scone directory within the user home Alternatively the share can be mounted using a command like mount t cifs o user user name scone maths bris ac uk homes On windows machines the location scone maths bris ac uk homes should direct the user to his home directory 4 NAG Libraries 4 1 ifort NAG libraries for the ifort compiler can be found in the directory usr local nag fll6i22dcl In order to run programs compiled with the NAG libraries a valid license file is required This file is specified using the variable NAG KUSARI_ FILE By default this is set to NAG_KUSARI_FILE usr local nag f16i22dcl license license ifort dat Please note that if you want to use a different version of the nag libraries a license will be required and the NAG_KUSARI_ FILE re dec
6. he following ssh scone If you are logged in using a local account try ssh username scone The first time you log in to scone you will get the next message The authenticity of host scone 137 222 80 37 can t be established RSA key fingerprint is de 8b 77 8d d7 af 07 d2 8f 6e 64 4c 6f ec 2b cd Are you sure you want to continue connecting yes no Please verify the fingerprint matches the one above and proceed by answering rs yes 2 1 2 Windows machines From Windows open your sshclient 2 2 Logging in to the Scone nodes Once you re logged in to Scone you will see the Linux command prompt You can now login to any of the nodes available to your group via ssh To avoid entering your password every time you can create a ssh keypair To do so you can execute the next commands user scone ssh keygen When prompted for the file location to save the key just hit enter You will also be prompted for a passphrase you can leave this empty for convenience but keep in mind that this means anyone with access to your private key can login as yourself Then copy your public key into the authorized _ keys file if you don t have such a file in your ssh director the easiest way is to just copy it with mahfrv scone cp ssh id_rsa pub ssh authorized_keys After this you should be able to login to the node machines using the passphrase provided 3 Getting your files on scone Every user is assi
7. ids No 12 x 2 6 GHz Xeon 48GB 1 1 Software available The software below is available in Scone Requests can be made via service desk bristol ac uk e Gnu C C and gfortran compilers These may be invoked by the com mands gcc g and gfortran For more details type man gcc man g or man gfortran e Matlab currently R2010a e Maple 12 e The R statistical Package e Python numpy and scipy e Gnuplot e gsview e Java sun jdk 1 6 0 15 e Mathematica 7 0 only on node1 e Nag libraries e ifort compiler e 295 e ghc 1 2 Things to note e Do not run any jobs on the head machine Scone You may compile and run very small test programs Anything else will be killed without notice e Any user may submit a job through condor However members of the statistics group have a vastly increased priority e Only members of the appropriate group may log on to their machines directly 2 Logging in Access to Scone is done via ssh Authentication is based on the UOB user password provided by the University You must first log in to the head machine scone maths bris ac uk before accessing other machines Remember to enable X forwarding on your ssh client if you want to use a the graphical user interface GUI provided by such programs as matlab 2 1 Logging in to Scone from the University Network 2 1 1 Linux Machines From Linux machines assuming you logged in to your machine using your UOB user you can simply do t
8. is ma chine In this state the machine is unavailable for further matches Claimed The machine has been claimed by condor No further jobs will be allocated by condor to this machine until the current job has ended Preempting The machine was claimed but is now preempting that claim This is most likely because someone has logged on to the machine and is running jobs directly e Activity Lists what the machine is actually doing The details depend upon the condor State but in general they can be summarised as below Idle The machine is not doing anything that was initiated by condor Busy The machine is running a job that was initiated by condor Suspended The current job has been suspended This is most likely because of a user logging on to the machine and running jobs directly Killing The job is being killed e LoadAv Lists the load average on the machine e Mem Lists the memory per CPU on the machine e ActvtyTime Node activity time So in the above example we can see that the whole cluster is quiet except for two processors on the machine groucho which are running jobs The state of the condor queue can also be examined by the command condor_q local to current machine or condor_q global across all machines as below user scone benchmarks gt condor_q Submitter scone private2 maths bris ac uk lt 172 16 80 65 33772 gt ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 3
9. lared The next example shows how to compile and run the test program cO6bafe f available at http www nag co uk numeric FL nagdoc_ f122 examples source cO6bafe f ifort cO6bafe f usr local nag fl6i22dcl lib libnag_nag a a out CO6BAF Example Program Results Estimated Actual I SEQN RESULT abs error error 1 1 0000 1 0000 0 18E 00 2 0 7500 0 7500 0 72E 01 3 0 8611 0 8269 0 45E 02 4 0 7986 0 8211 0 26E 00 0 14E 02 5 0 8386 0 8226 0 78E 01 0 12E 03 6 0 8108 0 8224 0 60E 02 0 33E 04 7 0 8312 0 8225 0 15E 02 0 35E 05 8 0 8156 0 8225 0 16E 03 0 85E 06 9 0 8280 0 8225 0 37E 04 0 10E 06 10 0 8180 0 8225 0 45E 05 0 23E 07 NOTE A symbolic link has been created in usr local lib libnag ifort a pointing to usr local nag fl6i22dcl lib libnag_nag a for easier usage The compilation line above is equivalent to ifort cO6bafe f lnag ifort Multicore programs with ifort It is possible to compile a program to make use of multiple cores with the NAG libraries To do so it is necessary to link against the MKL version of the libraries located in usr local nag f116i22dcl mkl_em64t The variables below are an example of how the libraries can be used in a Makefile COMPILER F77 ifort COMPILE FLAGS 03 i_ dynamic mcmodel medium LIBRARY FLAGS usr local nag fll6i22dcl lib libnag_mkl a fltconsistency L usr local nag fll6i22dcl mkl_em64t lmkl_intel_1p64 Imkl_intel_ thread lmkl_
10. long job machines without this hard limit If you think your job will last for more than two weeks you must specify to run it on a long job machine in your condor job submission file using Requirements Machine morecambe private2 maths bris ac uk Machine wise private2 maths bris ac uk Machine zeppo private2 maths bris ac uk IF YOU DO NOT SPECIFY THIS AND YOUR JOB LASTS FOR MORE THAN TWO WEEKS IT WILL BE KILLED Here is an example of condor file This runs a command from the current directory HEHHHHHHHHEHHHHHH HES HH Test Condor command file 4 HEHHHHHHHHEHHHHHH HES executable ustone6 Universe vanilla error ustone6 err output ustone6 out log ustone6 log Queue This file tells condor that the executable ustone6 is to be run Standard output from this executable is to go into the file ustone6 out and standard error is to go into the file ustone6 err The file ustone6 log contains any messages from the condor system job status any error and so on 5 1 2 Submitting and managing the condor jobs Once you condor file is ready then run the condor_ submit command user scone gt condor_submit ustone6 cmd Submitting job s Logging submit event s 1 job s submitted to cluster 28 user scone gt The condor_ status command lists the status of the condor clusteras below marfc scone gt condor_status Name OpSys Arch State Activity LoadAv Mem ActvtyTim
11. ry of the situation is shown below Machine Group Condor CPU RAM nodel General No 8 x 2 6 GHz Opteron 32GB node2 General No 8 x 2 6 GHz Opteron 32GB node3 General No 8 x 2 6 GHz Opteron 32GB node4 General No 8 x 2 6 GHz Opteron 32GB noded General No 8 x 2 6 GHz Opteron 64GB node6 General No 12 x 2 6 GHz Xeon 48GB node7 General No 12 x 2 6 GHz Xeon 48GB node amp General No 12 x 2 6 GHz Xeon 48GB zeppo Statistics Yes 4 x 2 2 GHz Opteron 8GB chico Statistics Yes 4 x 2 2 GHz Opteron 8GB harpo Statistics Yes 4 x 2 2 GHz Opteron 8GB groucho Statistics Yes 4 x 2 2 GHz Opteron 8GB barker Statistics Yes 4 x 2 2 GHz Opteron 8GB morecambe Statistics Yes 2 x 2 6 GHz Opteron 8GB wise Statistics Yes 2 x 2 6 GHz Opteron 8GB jake Statistics Yes 8 x 2 3 GHz Opteron 16GB elwood Statistics Yes 8 x 2 3 GHz Opteron 16GB suilven Statistics Yes 12 x 3 GHz Xeon 48GB quinag Statistics Yes 12 x 3 GHz Xeon 48GB canisp Statistics Yes 12 x 3 GHz Xeon 48GB kelvin Fluids No 4 x 2 6 GHz Opteron 8GB reynolds Fluids No 4 x 2 6 GHz Opteron 8GB riemann Applied No 4 x 2 6 GHz Opteron 16GB darcy Fluids No 4 x 3 GHz Opteron 12GB rayleigh Fluids No 4 x 3 GHz Opteron 12GB hardy Applied No 4 x 2 6 GHz Opteron 16GB bernoulli Fluids No 4 x 3 GHz Opteron 16GB taylor Fluids No 8 x 2 6 GHz Opteron 32GB stokes Flu

Download Pdf Manuals

image

Related Search

Related Contents

Presenter User Guide  TRANSERENITE essai.indd  TFX-11 MANUAL    Polaroid Digital Camera HD Action Camera Extreme Edition User's Manual  2. Preparations - Pro Video Systems    Viking E-1600-55A User manual  Announce Availability  SCHOOL INDOOR ROWING GUIDE  

Copyright © All rights reserved.
Failed to retrieve file