Home
GLITE 3.0 USER GUIDE
Contents
1. 1 rows rgma gt As shown the CLI reports the location of the registry which holds pointers to all the R GMA producers for all sites and VOs Queries will collect information from the appropiate producers wherever they are located The syntax of all the commands available in the R GMA interface can be obtained using the help command to get a list of the supported commands and typing help lt command gt to get information on a particular command A list of the most important commands is as follows CERN LCG GDEIS 722398 Manuals Series Page 61 rg Se rg Se rg Command Description help lt command gt Display information general or about a specific command exit quit CTRL D Exit the R GMA command line shell show tables producers of lt table gt Show the tables in the schema or the current pro ducers for a given table describe lt table gt Show the column names and types for the specified table select Query R GMA SQL syntax set query latest continuous Set the type of subsequent queries history insert Insert a tuple into a primary producer SQL syntax secondaryproducer lt table gt Declare a table to be consumed and republished by a secondary producer set secondary producer latest Set the supported query type for the primary or sec continuous history ondary producer set timeout maxage lt timeout gt S
2. In order to restrict the search a filter of the form attribute operator value can be used The operator is one of the defined in the following table Operator Description Entries whose attribute is equal to the value gt Entries whose attribute is greater than or equal to the value lt Entries whose attribute is less than or equal to the value Entries that have a value set for that attribute Entries whose attribute value approximately matches the specified value Furthermore complex search filters can be formed by using boolean operators to combine constraints The boolean operators that can be used are AND amp OR and NOT The syntax for that is the following CERN LCG GDEIS 722398 Manuals Series Page 51 E or or filterl filter2 Example of search filters are amp Name Smith Age gt 32 GlueHostMainMemoryRAMSize lt 1000 In LDAP a special attribute objectClass is defined for each directory entry It indicates which object classes are defined for that entry in the LDAP schema This makes it possible to filter entries that contain a certain object class The filter for this case would be objectclass lt name gt Apart from filtering the search a list of attribute names can be specified in order to limit the values returned As shown in the next example only the value of the specified attrib
3. 1 srm castorsrm cern ch castor cern ch grid dteam testSRM test_1 rfio 1xfs5614 shift 1xfs5614 data03 cg stage test_1 172962 Waiting for the file to be staged in READY srm castorsrm cern ch castor cern ch grid dteam testSRM test_1 Copying srm castorsrm cern ch castor cern ch grid dteam testSRM test_1 to file tmp srm_gfal_retrieved Source URL srm castorsrm cern ch castor cern ch grid dteam testSRM test_1 File size 2331 Source URL for copy gsiftp castorgrid cern ch 2811 shift 1xfs5614 data03 cg stage test_1 172962 Destination URL file tmp srm_gfal_retrieved streams 1 Transfer took 550 ms Where the 1 file status means that the file is already in disk CERN LCG GDEIS 722398 Manuals Series Page 147 APPENDIX G THE GLUE SCHEMA As explained earlier the GLUE Schema describes the Grid resources information that is stored by the Information System This section gives information about the MDS namely the LDAP implementation of the GLUE Schema which is currently used in the LCG 2 IS For information on the abstract GLUE Schema definition please refer to 27 First of all the tree of object classes definitions is shown Then the attributes for each one of the objectclasses where the data are actually stored are presented Some of the attributes may be currently empty even if they are defined in the schema Furthermore some new attributes may be published although not yet collect
4. GlueServicaAocessControlRule GlueSubCluster E ERA TEINER AA a DY Ksrapstor 8 koreae P emacs A O ONE Figure 9 The R GMA Web Interface 5 2 3 The R GMA CLI An R GMA CLI is available on every UI and WN This interface allows the user to perform queries and also to publish new information It includes a consumer and can initiate both primary and secondary producers although it does not provide all the detailed options available in the APIs The user can interact with the CLI directly from the command line by using the c option rgma c select Web from GlueSite where UniqueId lcgmon01l gridpp rl ac uk If you simply type rgma an interactive shell is started CERN LCG GDEIS 722398 Manuals Series Page 60 Welcome to the R GMA virtual database for Virtual Organisations Your local R GMA server is https lcgmon01 gridpp rl ac uk 8443 R GMA You are connected to the https lcgic0l gridpp You are connected to the https lcgic0l gridpp Type help for a list following R GMA Registry services rl ac uk 8443 R GMA RegistryServlet following R GMA Schema service rl ac uk 8443 R GMA SchemaServlet of commands rgma gt select Web from GlueSite where Uniqueld 1cgmon01 gridpp rl ac uk Web http www gridpp ac uk tierla
5. 14 GlueHostRemoteFileSystem 15 GlueHostStorageDevice 16 GlueHostFile 17 GlueLocation 2 Attributes 1 Attributes for GlueCluster 17 Attributes for GlueLocation 3 MyObjectClass 4 MyAttributes GlueSETop 1 ObjectClass gl GlueSE 2 GlueSEState 3 GlueSEAccessProtocol 4 GlueSEControlProtocol 2 Attributes 1 Attributes for GlueSE 4 Attributes for GlueSEControlProtocol 3 MyObjectClass CERN LCG GDEIS 722398 Manuals Series Page 150 4 MyAttributes 5 GlueSLTop ObjectClass 1 GluesL 2 GlueSLLocalFileSystem 3 GlueSLRemoteFileSystem 4 GlueSLFile 5 GlueSLDirectory 6 GlueSLArchitecture 7 GlueSLPerformance 2 Attributes 1 Attributes for GlueSL 7 Attributes for GlueSLPerformance 3 MyObjectClass 4 MyAttributes 6 GlueSATop ObjectClass 1 GlueSA 2 GlueSAPolicy 3 GlueSAState 4 GlueSAAccessControlBase 2 Attributes 1 Attributes for GlueSA 4 Attributes for GlueSAAccessControlBase CERN LCG GDEIS 722398 Manuals Series Page 151 3 MyObjectClass 4 MyAttributes G 2 GENERAL ATTRIBUTES This group includes some base top object classes
6. 5 3 1 GridICE The GridICE monitoring service is structured in a five layer architecture The resource information is obtained from the gLite 3 0 Information Service namely MDS The information model for the retrieved data is an extended GLUE Schema where some new objects and attributes have been added to the original model Please refer to the documentation presented in for details on this GridICE not only periodically retrieves the last information published in MDS but also collects historical monitoring data in a persistent storage This allows the observation of the evolution in time of the published data In addition GridICE will provide performance analysis usage level and general reports and statistics as well as the possibility to configure event detection and notification actions though these two functionalities are still at an early development stage NOTE All the information retrievable using GridICE including the extensions of the GLUE schema is also obtainable through R GMA by defining the proper archives This represents an alternative way to get that information The GridICE web page that shows the monitoring information for gLite 3 0 is accessible at the following URL also linked in the GOC web site http gridice2 cnaf infn it 50080 gridice site site php In the initial page site view a summary of the current status of the computing and storing resources in a per site basis is presented This includes the lo
7. Error UI_NO_NS_CONTACT Unable to contact any Network Server it means that there are authentication problems between the UI and the network server check your proxy or have the site administrator check the certificate of the server Many options are available to edg job submit If the user s proxy does not have VOMS extensions he can specify his virtual organization with the vo lt vo_name gt option otherwise the default VO specified in the standard configuration file SGLITE_WMS_LOCATION etc glite_wmsui_cmd_var conf for a gLite UI or SEDG_WL_LOCATION etc edg_wl_ui_cmd_var conf for a LCG UI is used Note The above mentioned configuration file can leave the default VO with a value of unspecified In that case if the vo option is not used with edg job submit AND a proxy without VOMS extension is used the command will return an error message In case of a gLite UI Error UIL NO VOMS Unable to determine a valid user s VO The useful o lt file_path gt option allows users to specify a file to which the jobId of the submitted job will be appended This file can be given to other job management commands to perform operations on more than one job with a single command The r lt CE_Id gt option is used to directly send a job to a particular CE The drawback is that the match making functionality see Section 6 3 3 will not be carried out That is the BrokerInfo file which provides information about the evol
8. Example 7 6 3 3 Removing files from SRM CASTOR Vs d cache The current interface of SRM in LCG gLite does not offer a functionality for physical file deletion in a homo geneous way with respect of the back end CASTOR dcache DPM The srm advisory delete command is currently used for removing files om SRM based storages Its behavior is however depending on the configuration and settings of the storage it self By design this command is not intended for this purpose but rather for asking SRM server to prepare for deleting a given file Users willing to scratch their files on the grid achieve different results by issuing the command against different implementations of SRM The command works as expected if run against d cache and DPM storages It removes the entry from the name space of the storage server which will take care of removing physical entries on its disk pools If srm advisory delete is however run against CASTOR the file is kept on the name space and not physical removal is done on the disk pools Any file is however over writable in CASTOR The following examples help understanding this difference between the d cache and CASTOR SRM imple CERN LCG GDEIS 722398 Manuals Series Page 115 mentations The file zz_globus is present on the name space of a CASTOR storage edg gridftp ls v gsiftp castorsrm cern ch castor cern ch grid lhcb gt rw rw r 1 lhcb001 z5 2317 Aug 15 2005 test acsmith 009 dat gt mrw r r
9. Example 7 7 1 Specifying input data in a job If a job requires one or more input file stored in an LCG Storage Element the InputData JDL attribute list can be used Files can be specified by both by LFN and GUID An example of JDL specifying input data looks like Executable bin hostname StdOutput sim out StdError sim err DataCatalog http lfc lhcb ro cern ch 8085 InputData lfn grid lhcb test_roberto preproduction DataAccessProtocol rfio gsiftp gsidcap OutputSandbox sim err sim out The InputData field may also be specified through guids This attribute is used only during the match making process to match CEs and SEs It has nothing to do with the real access to files that the job can do while running However it is obviously reasonable that files listed in the attribute are really accessed by the job and vice versa The DataAccessProtocol attribute is used to specify the protocols that the application can use to access the file and is mandatory if InputData is present Only data in SEs which support one or more of the listed protocols are considered The Resource Broker will schedule the job to a close CE to the SE holding the largest number of input files requested In case several CEs are suitable they will be ranked according to the ranking expression Warning A Storage Element or a list of Storage Elements is published as close to a given CE via the GlueCESEBindGroupSEU
10. 00 cc cece eee cece teen e een eens 131 TIME LEFT UTILITY LCG GETJOBSTATS LCG_JOBSTATS PY 00eeeeeeee 131 INFORMATION SYSTEM READER LCG INFO 0 0 00 cc ccc eee e eee cent nen eens 132 E 1 E 2 E 3 E 4 F DATA MANAGEMENT AND FILE ACCESS THROUGH AN APPLICATION PRO GRAMMING INTERFACE INTRODUCTION siii ii bade enw wade eee oa ala 133 FREEDOM OF CHOICE FOR RESOURCES 0 0 0 cece cece cece cece eee eee neas 133 G THE GLUE SCHEMA cinco ts 148 G 1 G 2 G 3 G 4 G 5 G 6 E 148 A 152 153 tao 157 ee ee 160 cad 161 CERN LCG GDEIS 722398 Manuals Series Page 7 1 INTRODUCTION 1 1 ACKNOWLEDGMENTS This work received support from the following institutions e Istituto Nazionale di Fisica Nucleare Roma Italy e Ministerio de Educaci n y Ciencia Madrid Spain e Particle Physics and Astronomy Research Council UK 1 2 OBJECTIVES OF THIS DOCUMENT This document gives an overview of the gLite 3 0 middleware It helps users to understand the building blocks of the Grid and the available interfaces to the Grid services in order to run jobs and manage data This document is neither an administration nor a developer guide 1 3 APPLICATION REA This guide is addressed to WLCG EGEE users and site administrators who would like to work with the gLite middleware 1 4 DOCUMENT EVOLUTION PROCEDURE The guide reflects the current status of the gLite middle
11. GlueSECont rolProtocolLocalID network endpoint for this protocol GlueSEControlProtocolCapability function supported by this control protocol e Storage Library objectclass GlueSL GlueSLName Glues GlueSLServi human readable name of the storage library LUniqueID unique identifier of the machine providing the storage service ice unique identifier for the provided storage service e Local File system objectclass GLueSLLocalFileSystem GlueSLLocal GlueSLLocal GlueSLLocal lFileSystemRoot path name or other information defining the root of the file system FileSystemName name of the file system FileSystemType file system type e g NFS AFS etc CERN LCG GDEIS 722398 Manuals Series Page 158 GlueSLLocalFileSystemReadOnly true is the file system is read only GlueSLLocalFileSystemSize total space assigned to this file system GlueSLLocalFileSystemAvailableSpace total free space in this file system GlueSLLocalFileSystemClient unique identifiers of clients allowed to access the file system re motely e Remote File system objectclass GLueSLRemoteFileSystem Glues Glues Glues Glues Glues Glues Glues LRemoteFileSystemRoot path name or other information defining the root of the file system LRemoteFileSystemSize total space assigned to this file system LRemoteFileSystemAvailableSpace total free space in this file
12. In particular the different servers from which the information can be obtained are discussed These are the local GRISes the site GIISes BDIIs and the global or top BDIIs Of them the BDIT is usually the one queried since it contains all the interesting information for a VO in a single place But before the procedure to query directly the IS elements is described two higher level tools 1cg infosites and 1cg info are presented These tools should be enough for most common user needs and will usually avoid the necessity of raw LDAP queries though these are very useful for more complex or subtle requirements As explained in Chapter B the data in the IS of WLCG EGEE conforms to the LDAP implementation of the GLUE Schema although some extra attributes not initially in the schema are also being published and actually queried and used by clients of the IS For a list of the defined object classes and their attributes as well as for a reference on the Directory Information Tree used to publish those attributes please check Appendix G As usual the tools to query the IS shown in this section are command line based There exist however graphical tools that can be used to browse the LDAP catalogs As an example the program gq is open source and can be found in some Linux distributions by default Some comments on this tool are given in Section 5 1 1 Icg infosites The 1cg infosites command can be used as an easy way to retrieve informa
13. WORLDWIDE LHC COMPUTING GRID GLITE 3 0 USER GUIDE MANUALS SERIES Document identifier EDMS id Version Date Section Document status Author s File CERN LCG GDEIS 722398 722398 0 1 May 19 2006 Experiment Integration and Distributed Analysis PRIVATE Stephen Burke Simone Campana Anto nio Delgado Peris Flavia Donno Patricia M ndez Lorenzo Roberto Santinelli Andrea Sciaba gLite 3 UserGuide Abstract This guide is an introduction to the WLCG EGEE Grid and to the gLite 3 0 middleware from a user s point of view Document Change Record Issue Item Reason for Change 20 04 06 v1 0 First Draft Files Software Products User files PDF https edms cern ch file 722398 1 gLite 3 UserGuide pdf PS https edms cern ch file 722398 1 gLite 3 UserGuide ps HTML https edms cern ch file 722398 1 gLite 3 UserGuide html CERN LCG GDEIS 722398 Manuals Series Page 2 CONTENTS 1 INTRODUCTION iria 8 1 1 1 2 1 3 1 4 1 5 1 6 Acevedo Pare sna yee s Re y s PEES P A E ETT s PETTE AET s es S oca 12 1 61 Glossatyj sson cia a ed a a a a ee a A a 12 2 EXECUTIVE SUMMARY cescescesececcecceeceeeaeeaeeaeeaeeaeeeateae 15 A E EAE 16 3 1 3 2 ita 17 3 1 1 Code Development e 17 3 1 2 Troubleshooting e 17
14. 3 1 3 User and VO utilities a 17 td 18 32 1 gt S CUrty se a a a le A week la a la a G a 18 82 2 User Interface lt sps a aca we Oe e a a 19 3 2 3 Computing Element is esre kosar ee er aa i e i ea e a 19 9 24 Storage Element s acosta nos wia a a we ad a ee a 20 3 2 5 Information Service ee 21 3 2 6 Data Management 25 CERN LCG GDEIS 722398 Manuals Series Page 3 3 3 JOB FLOW A 26 3 3 1 JobSubmission 2 0 ee 26 8 3 2 Other operations lt cover 40 Ba ba Baka ba eee dee a aes 28 4 GETTING STARTED ccccceccceceseeseeececeececeeseeseeaeeseeateae 29 5 4 1 OBTAINING A CERTIFICATE 00 o 29 ALL X09 Certincates we OG de ee a Baca ar baw he ae OG de we ws 29 NN 29 o ia ar ta a 30 a Oo Gok Seed a ee Bates da 31 4 2 REGISTERING WITH WLCGIEGEE i is iiieoo 31 4 2 1 The Registration Service oaoa aaa ee 31 A 32 4 3 SETTING Up THE USER ACCOUNT 00cc0ccccececeeecseeteteetteeseeenee 32 4 3 1 The User Interface 2 c soet eca ra e E e a e 32 Pupil e bards Hace bane dak Bue N a 33 A PRO CERTIFICATES callada 35 4 4 1 Proxy Certificates s s si anp a as a e doaa koa a 35 yee Oe RES ASR Sa eR we HG 37 4 4 3 Proxy Renewal aos dade da Ba wed Ba oa ba ede daeGa aa wes 40 Dll Ices IniOsites s oe doa eae a wack Hae OS dos Oa Yes 43 512 ICSAINTO x ok ae ea ee a Se a a A ee ae a 47 CERN LCG GDEIS 722398 Manuals Series Pag
15. GlueServiceURI Idap lxn1194 cern ch 2136 E GlueSEUniquelD lxn1183 cern ch F GlueSLUniquelD lxn1183 cern ch Fin lxn1181 cern chfsiteinto F GlueCEUniquelD lxn1164 cern ch 2119 jobmanager Icglsf grid 5 G Jni r E GlueSubClusterUniquel n1164 cern ch GlueHostRemoteFileSystemName exportlocal linux GlueHostRemoteFileSystemName opt exp_software E GlueCESEBindGroupCEUniquelD Ixn1 184 cern ch 211 9 jobmanager Icglsf grid eceseandseUntulo wecn cern ch GlueSchemaVersion GlueKey GlueSchemaVersionMinor GlueMonitoringServiceURL GlueCESEBindSEUniquel acdr002d cern ch GlueCESEBindSEUniquelD lxn1183 cemn ch F GlueCEUniquelD lxn1181 cern ch 2119 jobmanager Icgpbs infinite GlueCEUniquelD lxn1181 cern ch 2119 jobmanager Icgpbs short F GlueCEUniquelD lxn1181 cern ch 2119 jobmanager Icgpbs long E GlueClusterUniquelD lxn1181 cern ch GlueCESEBindGroupCEUniquelD lxn1181 cern ch 2119 jobmanager Icgpbs infinite GlueCESEBindGroupCEUniquelD Ixn1 181 cern ch 211 9 jabmanager Icgpbs short GlueCESEBindGroupCEUniquelD Ixn1 181 cern ch 211 9 jabmanager Icgpbs long H Mds Vo name caynicg2 Poh Mido Uo namo ueclca Figure 8 The LDAP directory of an gLite 3 0 BDII CERN LCG GDEIS 722398 Manuals Series Page 55 The BDII can be interrogated using the same base name as in the case of the GRIS mds vo name local o grid but using the BDII port 2170 The sub tree corresponding to a particular site appears under an entry with a
16. LLL H ldap 1xn1187 cern ch 2170 b o grid objectclass GlueCESEBind GlueCESEBindCEUniqueID GlueCESEBindSEUniqueID dn GlueCESEBindSEUniqueID castor grid sinica edu tw GlueCESEBindGroupCEUnique ID tb009 grid sinica edu tw 2119 jobmanager lcgpbs atlas mds vo name resource mds vo name Taiwan PPS mds vo name local o grid GlueCESEBindSEUniquelD castor grid sinica edu tw GlueCESEBindCEUniqueID tb009 grid sinica edu tw 2119 jobmanager lcgpbs atlas dn GlueCESEBindSEUniqueID castor grid sinica edu tw GlueCESEBindGroupCEUnique ID tb009 grid sinica edu tw 2119 Jobmanager 1cgpbs dteam mds vo name resource mds vo name Taiwan PPS mds vo name local o grid GlueCESEBindSEUniquelD castor grid sinica edu tw GlueCESEBindCEUniqueID tb009 grid sinica edu tw 2119 jobmanager lcgpbs dteam dn GlueCESEBindSEUniqueID dpm01 grid sinica edu tw GlueCESEBindGroupCEUniquel D tb009 grid sinica edu tw 2119 3jobmanager 1cgpbs biomed mds vo name resource mds vo name Taiwan PPS mds vo name local o grid GlueCESEBindSEUniquelD dpm01 grid sinica edu tw CERN LCG GDEIS 722398 Manuals Series Page 56 GlueCESEBindCEUniqueID tb009 grid sinica edu tw 2119 3jobmanager 1cgpbs biomed dn GlueCESEBindSEUniquelD castor grid sinica edu tw GlueCESEBindGroupCEUnique ID tb009 grid sinica edu tw 2119 Jobmanager 1cgpbs biomed mds vo name resourc e mds vo name Taiwan PPS mds vo name l1ocal o grid GlueCESEBindSEUniquelD castor grid sinica edu
17. SE to register entries in the File Catalog and replicate files between SEs NOTE Up to the LCG release 2 3 0 the edg replica manager command also in abbreviated edg rm form provided the same functionality than the current lcg_utils offer For performance reasons the edg rm was dismissed in favor of the Icg_utils The current edg replica manager command is just a wrapper script around Icg_utils In this way it ensures the performance and functionalities of lcg_utils maintaining the interface of the old Java CLI More information about the edg replica manager wrapper script can be found in The name and functionality overview of the available commands is shown in the following table Replica Management lcg cp Copies a Grid file to a local destination download leg cr Copies a file to a SE and registers the file in the catalog LFC or LRC upload lcg del Deletes one file either one replica or all replicas lcg rep Copies a file from one SE to another SE and registers it in the catalog LFC or LRC replicate leg gt Gets the TURL for a given SURL and transfer protocol lcg sd Sets file status to Done for a given SURL in an SRM s request File Catalog Interaction lcg aa Adds an alias in the catalog LFC or RMC for a given GUID lcg ra Removes an al as in the catalog LFC or RMC for a given GUID lcg rf Registers in the the catalog LFC or LRC RMC a file residing on an SE leg uf Unregisters in the the catalog L
18. The names of the tags relative to the software installed in site is printed together with the corre sponding CE It groups together the information provided by ce and se is If not specified the BDII defined in default by the variable LCG_GFAL_INFOSYS will be queries How ever the user may want to query any other BDII without redefining this environment variable This is possible specifying this argument followed by the name of the BDII which the user wants to query All options admits this argument f Giving as input the name of the site the chosen local service is provided Example 5 1 1 1 Obtaining information about computing resources The way to get the information relative to the computing resources for a certain VO is lcg infosites vo alice ce A typical output is as follows kkkkkkxkxkxkxkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkxkxk These are the related data for alice in terms of queues and CPUs KKEKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK CPU Free Total Jobs Running Waiting ComputingElement CERN LCG GDEIS 722398 Manuals Series Page 44 2 2 0 0 0 globus it uom gr 2119 blah pbs alice 12 12 0 0 0 pprod03 lip pt 2119 jobmanager pbs alice 4 4 0 0 0 zeus76 cyf kr edu pl 2119 blah pbs alice 4 3 0 0 0 1xb2087 cern ch 2119 jobmanager lcgpbs alice 2 2 0 0 0 pps ce fzk gridka de 2119 jobmanager lcgpbs pps 3903 132 0 0 0 1xb2055 cern ch 2
19. Users can get information about the history of a job by querying the LB service 3 3 2 Other operations While the Input and Output Sandboxes are a mechanism for transferring small data files needed to start a job or to check its results large data files should be read and written from to SEs and registered in a File Catalog and possibly replicated to other SEs The LCG Data Management client tools are available for performing these tasks In general the user should not directly interact with the File Catalog instead she should use the LCG tools or the POOL interface see Section 7 9 Users can interrogate the information system to retrieve static or dynamic information about the status of WLCG EGEE resources and services Although site GUSes BDIIs or even GRISes can be directly queried it is recommended to query only a central BDII or R GMA Details and examples on how to interrogate GRIS GIIS and BDII are given in Chapter 5 CERN LCG GDEIS 722398 Manuals Series Page 28 4 GETTING STARTED This section describes the preliminary steps to gain access to the WLCG EGEE Grid Before using the WLCG EGEE Grid the user must do the following a Obtain a Cryptographic X 509 certificate from a Certification Authority CA recognized by WLCG EGEE b Get registered with WLCG EGEE by joining one of the WLCG EGEE Virtual Organisations c Obtain an account on a machine which has the WLCG EGEE User Interface software installed
20. bin sh cat GLITE_WL _RB_BROKERINFO in a submitted shell script Then the file can be accessed locally with the f option commented above 6 3 4 Interactive Jobs NOTE Interactive jobs are not supported in WLCG EGEE and that functionality is not part of the official distri bution of the current release Interactive jobs in the gLite 3 0 middleware have never been tested and will not be described here Any site installing or using it will do it only under its own responsibility This section gives an overview of how interactive jobs should work in the LCG Workload Management System Interactive jobs are specified setting the JDL JobType attribute to Interactive When an interactive job is submitted the edg job submit command starts a Grid console shadow process in the background which listens on a port for the job standard streams Moreover the edg job submit command opens a new window where the incoming job streams are forwarded The port on which the shadow process listens is assigned by the Operating System OS but can be forced through the ListenerPort attribute in the JDL CERN LCG GDEIS 722398 Manuals Series Page 80 As the command in this case opens an X window the user should make sure the DISPLAY environment variable is correctly set an X server is running on the local machine and if he is connected to the UI node from a remote machine e g with ssh secure X11 tunneling is enabled If this is not possible the
21. etworkServer result START source UserInterface timestamp on May 15 15 14 55 2006 CEST Event Accepted source etworkServer timestamp on May 15 15 15 42 2006 CEST Event Transfer destination etworkServer result OK source UserInterface timestamp on May 15 15 15 43 2006 CEST Event EnQueued result OK source NetworkServer timestamp Mon May 15 15 15 44 2006 CEST Event DeQueued source WorkloadManager timestamp Mon May 15 15 15 44 2006 CEST Event Match CERN LCG GDEIS 722398 Manuals Series Page 78 dest_id cert ce 03 cnaf infn it 2119 blah lsf pps source WorkloadManager timestamp Mon May 15 15 15 45 2006 CEST ites 6 3 3 The Brokerlnto The BrokerInfo file is a mechanism by which the user job can access at execution time certain information concerning the job for example the name of the CE the files specified in the InputData attribute the SEs where they can be found etc The BrokerInfo file is created in the job working directory that is the current directory on the WN for the executable and is named BrokerInfo Its syntax is as in job description files based on Condor ClassAds and the information contained is not easy to read however it is possible to get it by means of a CLL whose description follows NOTE Remember that as explained previously if the option r is used when submitting a job in order to make the j
22. include lt sstream gt for the integer to string conversion include lt unistd h gt for the sleep function include lt fstream gt for the local file access extern C include gfal_api h tinclude lcg_util h using namespace std main int argc char argv Check arguments if arge lt 2 argc gt 2 cerr lt lt Usage lt lt argv 0 lt lt SURL n exit 1 CERN LCG GDEIS 722398 Manuals Series Page 143 Try to get the file stage in int srm_get int nbfiles char surls int nbprotocols char protocols int regid char token struct srm_filestatus filestatuses int timeout struct srm_filestatus xX char surl w char turl int fileid int status int nbreplies number of replies returned int nbfiles 1 number of files char surls array of SURLs int nbprotocols number of bytes of the protocol array char protocols rfio protocols int reqid request ID char token 0 unused struct srm_filestatus filestatuses status of the files int timeout 100 Set the SURL and the nbprotocols surls argv 1 nbprotocols sizeof protocols sizeof char Make the call if nbreplies srm_get nbfiles surls nbprotocols protocols amp reqid 0 amp filestatuses timeout lt 0 perror Error in srm_get exit 1 Show the retrieved information c
23. list of LCG sites is available in the GOC database 16 As an example the CERN LXPLUS service can be used as UI as described in 28 This use could be extended to other non LXPLUS machines mounting AFS Once the account has been created the user certificate must be installed For that it is necessary to create a directory named globus under the user home directory and put the user certificate and key files there naming them usercert pem and userkey pem respectively with permissions 0444 for the former and 0400 for the latter A directory listing should give a result similar to this ls 1 HOME globus CERN LCG GDEIS 722398 Manuals Series Page 32 total 13 IW YI Y 1 doe xy 4541 Aug 23 2005 usercert pem e 1 doe xy 963 Aug 23 2005 userkey pem 4 3 2 Checking a Certificate To verify that a certificate is not corrupted and print information about it the Globus command grid cert info can be used from the UI The openssl command can be used instead to verify the validity of a certificate with respect to the certificate of the certification authority that issued it The command grid proxy init can be used to check if there is a mismatch between the private key and the certificate Example 4 3 2 1 Retrieving information on a user certificate With the certificate properly installed in the HOME globus directory of the user s UI account issue the command grid cert info Tf the certificate is properly formed th
24. vo dteam guid 27523374 6f60 44af b311 baa3d29f84la gt sfn 1xb0710 cern ch flatfiles SE00 dteam generated 2004 07 13 file42ff7086 8063 414d 9000 75c459b71296 edg gridftp exists gsiftp 1xb0710 cern ch flatfiles SE00 dteam generated 2004 07 13 f ile42ff7086 8063 414d 9000 75c459b71296 edg gridftp exists gsiftp 1xb0710 cern ch flatfiles SE00 dteam generated 2004 07 13 my _fake file gt error gsiftp 1xb0710 cern ch flatfiles SE00 dteam generated 2004 07 13 my_fake_file does not exist edg gridftp ls gsiftp 1xb0710 cern ch flatfiles SE00 dteam generated 2004 07 13 gt file42 7086 8063 414d 9000 75c459b71296 Example 7 6 2 2 Copying a file with globus url copy The globus url copy command can be used to copy files between any two Grid resources and from to a non grid resource Its functionality is similar to that of 1cg cp but source and destination must be specified as TURLs globus url copy gsiftp 1xb0710 cern ch flatfiles SE00 dteam generated 2004 07 13 file42 7086 8063 414d 9000 75c459b71296 file pwd my_file CERN LCG GDEIS 722398 Manuals Series Page 113 7 6 3 Low Level Data Management Tools SRM The following commands originally intended for interacting with SRM servers running upon d cache storages offer a general way for interacting with any SRM server on LCG gLite srmcp Copies a file on a SRM SE srm get metadata Gathers information about a file stored into a SRM SE
25. 0 perror gfal_close exit 1 cout lt lt An Close successful cout lt lt AninDone lt lt endl end of main The command used to compile and link the previous code it may be different in your machine is opt gec 3 2 2 bin ct 3 2 2 L opt lcg lib l gfal o gfal_example gfal_example cpp As temporary file we may specify one in our local filesystem by using the file prefix In that case we get the following output CERN LCG GDEIS 722398 Manuals Series Page 140 gfal_example file pwd test txt Creating file file afs cern ch user d delgadop gfal test txt Open successful Write successful Close successful Reading back file afs cern ch user d delgadop gfal test txt Open successful Read successful Value of readValues 0 0 Value of readValues 1 10 Value of readValues 2 20 Value of readValues 3 30 Value of readValues 4 40 Value of readValues 5 50 Value of readValues 6 60 Value of readValues 7 70 Value of readValues 8 80 Value of readValues 9 90 Close successful Done We may define a JDL file and access a Grid file from a job Indeed a Grid file cannot be accessed directly from the UI if insecure RFIO is the protocol used for that access However the access from a job running in the same site where the file is stored is allowed The reason for this is that insecure RFIO does not handle Grid certificates s
26. 1 santinel z5 215 Jun 22 2005 lfc_test_pic2 rWXrwXxr x 1 santinel z5 319 Oct 03 14 49 orrriginal cern rwxrwxr x YIWXIWXYI X 1 santinel z5 734 Oct 12 11 17 using _project CERN LCG GDEIS 722398 Manuals Series Page 98 1 santinel Example 7 4 1 2 Listing the entries of a LFC directory The 1 c 1s command lists the LFNs of a directory The command supports several options out of them the 1 for long format and comment for showing user defined metadata information Attention The R option for recursive listing is also available for the command but it should not be used It is a very expensive operation on the catalog and should be avoided In the following example the directory grid dteam MyExample and its subdirectory dayl are listed after being populated lfc ls grid dteam MyExample grid dteam MyExample dayl day2 day3 day4 interesting lfc ls grid dteam MyExample dayl grid dteam MyExample dayl measurel measure2 measure3 measure4 Example 7 4 1 3 Creation of symbolic links The 1fc 1n may be used to create a symbolic link to a file In this way two different LFNs will point to the same file In the following example we create a symbolic link grid lhcb test_roberto MyTest MyLink to the original file grid lhcb test_roberto lfc_test_pic2 lfc ln s grid lhcb test_roberto lfc_test_pic2 grid lhcb test_roberto MyTest MyLink Now we can check that link was created with a long listin
27. 19 DIT for the storage libraries CERN LCG GDEIS 722398 Manuals Series Page 166
28. 41 username O Grid O CERN OU cern ch CN John Doe owner O Grid O CERN OU cern ch CN John Doe timeleft 167 59 48 7 0 days Note that the user must have a valid proxy certificate on the UI created with grid proxy init to successfully interact with his long term certificate on the Proxy server Example 4 4 3 3 Deleting a long term proxy Deleting a stored long term proxy is achieved by doing myproxy destroy s lt myproxy_server gt d And the output is Default MyProxy credential for user O Grid O CERN OU cern ch CN John Doe was successfully removed Also in this case a valid proxy certificate must exist for the user on the UI CERN LCG GDEIS 722398 Manuals Series Page 42 5 INFORMATION SERVICE The architecture of the gLite 3 0 Information Services both MDS and R GMA was described in Chapter 3 In this chapter we have a closer look at the structure of the information published by different elements of those architectures and we examine the tools that can be used to get information from them Remember that although most middleware components from Data and Workload Management rely on MDS R GMA is already in use and many applications specially for accounting and monitoring purposes depend on it Information of current tools used for monitoring in gLite 3 0 is also provided 5 1 THEMDS In the following sections examples are given on how to interrogate the MDS Information Service in gLite 3 0
29. 58 2 45 52 ed 69 59 84 66 0a 8f 22 26 79 c4 ad ad 72 69 7 57 dd dd de 84 ff 8b 75 25 ba 82 f1 6c 62 d9 d8 49 33 7b a9 fb 9c 1e 67 d9 3c 51 53 fb 83 9b 21 cored 0 0 10 0 The grid cert info command takes many options Use the he1p for a full list For example the subject option returns the certificate subject grid cert info subject O Grid O CERN OU cern ch CN John Doe Or one can check the certificate expiration date grid cert info enddate Oct 15 05 37 09 2005 GMT or to know which CA issued the certificate grid cert info issuer C CH O CERN OU GRID CN CERN CA Example 4 3 2 2 Verifying a user certificate To verify a user certificate issue the following command from the Ul openssl verify CApath etc grid security certificates globus usercert pem CERN LCG GDEIS 722398 Manuals Series Page 34 and if the certificate is valid and properly signed the output will be home doe globus usercert pem OK If the certificate of the CA that issued the user certificate is not found in CApath an error message like this will appear usercert pem O Grid O CERN OU cern ch CN John Doe error 20 at 0 depth lookup unable to get local issuer certificate If the variable X509_CERT_DIR is defined use its value as argument of etc grid security certificates Example 4 3 2 3 Verifying the consistency between private key and certificate If for some reason the user
30. A user can be a member of any number of groups and a VOMS proxy contains the list of all groups the user belongs to but when the VOMS proxy is created the user can choose one of these groups as the primary group CERN LCG GDEIS 722398 Manuals Series Page 37 A role is an attribute which typically allows a user to acquire special privileges to perform specific tasks In principle groups are associated to privileges that the user always has while roles are associated to privileges that a user needs to have only from time to time To map groups and roles to specific privileges what counts is the group role combination which is sometimes referred to as a FQAN short form for Fully Qualified Attribute Name The format is FQAN lt group name gt Role lt role name gt for example cms Heavylons Role production It should be noted that a role is by no means global but is always associated to a group Example 4 4 2 1 Creating a VOMS proxy The voms proxy init command generates a Grid proxy contacts a VOMS server retrieves the user attributes and includes them in the proxy If used without arguments it works exactly as grid proxy init To create a basic VOMS proxy without requiring any special role or primary group voms proxy init voms lt vo gt where lt vo gt is the VO of the user The output is similar to Your identity C CH O CERN OU GRID CN John Doe Enter GRID pass phrase Creating tempora
31. Grid e The StorageElement attribute is an optional string indicating the SE where the file should be stored if possible If unspecified the WMS automatically choses a SE defined as close to the CE e The LogicalFileName attribute also optional represents a LFN the user wants to associate to the output file The following code shows an example of JDL requiring explicitely an OutputData attribute Executable test sh Stdoutput std out StdError std err InputSandbox test sh OutputSandbox std out std err BrokerInfo OutputData OutputFile my_out LogicalFileName 1fn grid lhcb test_roberto outputdata StorageElement castorsrm pic es hi Once the job is terminated and the user retrieve his output the Output Sandbox downloaded will contain a further file automatically generated by the JobWrapper containing the logs of the output upload cat DSUpload_ZboHMYWoBsLVax nUCmtaA out Autogenerated by JobWrapper The file contains the results of the upload and registration process in the following format lt outputfile gt lt lfn guid Error gt He de He He He e 0 my_out guid 2a14e544 1800 4257 afdd 7031a6892ef7 CERN LCG GDEIS 722398 Manuals Series Page 120 Warning This mechaism for automatic upload of a specified output by using the OutputData is not longer supported by the gLite WMS Example 7 7 4 Selecting the file catalog to use for match making In order f
32. If these characters should be escaped in the shell for example if they are part of a file name they should be preceded by triple in the JDL or specified inside quoted strings The attributes StdOutput and StdError define the name of the files containing the standard output and stan dard error of the executable once the job output is retrieved For the standard input an input file can be similarly specified though this is not required StdInput std in If some files have to be copied from the UI to the execution node they can be listed in the Input Sandbox attribute InputSandbox test sh fileA fileB Only the file specified as Executable will have automatically the execution flag if other files in the input sandbox have such flag on the UI they will usually lose it when copied to the WN Finally the files to be transferred back to the UI after the job is finished can be specified using the Out put Sandbox attribute CERN LCG GDEIS 722398 Manuals Series Page 67 OutputSandbox std out std err Wildcards are allowed only in the Input Sandbox attribute The list of files in the Input Sandbox is specified relatively to the current working directory Absolute paths cannot be specified in the Output Sandbox attribute The Input Sandbox cannot contain two files with the same name even if they have a different absolute path as when transferred they would overwrite each other The environmen
33. MSS without SRM interface the command obtains the TURL by simple string manipulation of the SURL obtained from the File Catalog and the protocol checking in the Information System if it is supported by the Storage Element No direct interaction with the SE is involved The last two lines of output are always zeroes leg gt sfn 1xb0710 cern ch flatfiles SE00 dteam generated 2004 07 08 file0dcabb4 6 2214 4db8 9ee8 2930delabbef gsiftp gt gsiftp 1xb0710 cern ch flatfiles SE00 dteam generated 2004 07 08 file0dcabb46 22 14 4db8 9ee8 2930delatbef gt 0 gt 0 Be aware that in case of MSS the file could be not staged on disk but only stored on tape For this reason an operation like leg cp gsiftp 1xb0710 cern ch flatfiles SE00 dteam generated 2004 07 08 fileddca bb46 2214 4db8 9ee8 2930dela6bef file tpm somefile txt could hang forever waiting for the file to be staged a timeout mechanism is not implemented in Icg utils e In the case of SRM interface the TURL is returned to the 1cg gt command by the SRM itself In case of MSS the file will be staged on disk if not present already before a valid TURL is returned It could take lcg gt quite a long time to return the TURL depending on the conditions of the stager but a successive lcg cp of such TURL will not hang This is one of the reasons for which a SRM interface is desirable for all MSS The second and third lines of output represent the requestID and fileID for the
34. Remember that the BDII read port is 2170 e The endpoint s for the catalogs can also be specified taking precedence over that published in the IS through environmental variables LRC_ENDPOINT RMC_ENDPOINT for the RLS and LFC_HOST for the LFC If no endpoints are specified the ones published in the Information System are taken e If the variable LCG_GFAL_VO is set in the environment then the vo option is not required for the 1cg commands since they take the value of this variable In respect of authentication and authorization Icg utils manage data transfer securely through gsiftp For this reason the user must have a valid proxy and must appear in the grid mapfile of the SE in order to use 1cg cr lcg cp lcg rep and 1cg del On the other side the information in the LRC and RMC catalogs is not protected RLS allows insecure access and no proxy certificate is required for the rest of 1cg commands which do not deal with physical replicas Although this situation is different for the LFC currently the information stored in the RLS file catalogs can be altered by anyone NOTE The user will often need to gather information on the existing Grid resources in order to perform DM operations For instance in order to specify the destination SE for the upload of a file the information about the available SEs must be retrieved in advance There are several ways to retrieve information about the resources on the Grid The Information Sys
35. SRM 19 a middleware module providing capabilities like transparent file migration from disk to tape file pinning space reservation etc However different SEs may support different versions of the SRM protocol and the capabilities can vary Disk only SEs are normally implemented as classic SEs which do not have an SRM interface or using the Disk Pool Manager DPM which is SRM enabled classic SEs are going to be phased out soon Mass Storage Systems with front end disks and back end tape storage like CASTOR and large disk arrays e g managed with dCache always provide an SRM interface The most common types of SEs currently present in WLCG EGEE are summarized in the following table Type Resources File transfer File O SRM Classic SE Disk server GSIFTP insecure RFIO No DPM Disk pool GSIFTP secure RFIO Yes dCache Disk pool MSS GSIFTP gsidcap Yes CASTOR MSS GSIFTP insecure RFIO Yes CERN LCG GDEIS 722398 Manuals Series Page 20 3 2 5 Information Service The Information Service IS provides information about the WLCG EGEE Grid resources and their status This information is essential for the operation of the whole Grid as it is via the IS that resources are discovered The published information is also used for monitoring and accounting purposes Much of the data published to the IS conforms to the GLUE Schema which defines a common conceptual data model to be used
36. The LDAP schema describes the information that can be stored in each entry of the DIT and defines object classes which are collections of mandatory and optional attribute names and value types While a directory entry describes some object an object class can be seen as a general description of an object as opposed to the description of a particular instance Figure 2 shows the MDS architecture in WLCG EGEE Computing and storage resources at a site run a piece of software called an Information Provider which generates the relevant information about the resource both Static like the type of SE and dynamic like the used space in an SE This information is published via an LDAP server called a Grid Resource Information Server GRIS which normally runs on the resource itself At each site another LDAP server called a Site Grid Index Information Server GIIS collects the information from the local GRISes and republishes it In WLCG EGEE the GIIS uses a Berkeley Database Information Index BDI to store data which is more stable than the original Globus GIIS Finally a BDI is also used as the top level of the hierarchy this BDII queries the GlISes at every site and acts as a cache by storing information about the Grid status in its database The BDII therefore contains all the available information about the Grid Nevertheless it is always possible to get information about specific resources by directly contacting the GlISes or even the GRI
37. a job is submitted it is possible to see its status and its history and to retrieve logging information about it Once the job is finished the job s output can be retrieved although it is also possible to cancel it previously The following examples explain how Example 6 3 2 1 Retrieving the status of a job Given a submitted job whose job identifier is lt jobId gt the command is edg job status lt jobId gt for the LCG RB glite job status lt jobId gt for the gLite WMS And an example of a possible output from the gLite LB is glite job status https cert rb 01 cnaf infn it 9000 55YfzeDigWeoHbpHxx1BQA AEXKAKKKKRKKKKKKKKK KK KKK KKK KK KK KK KK KKK KKK KKK KKKKKKKKKKKKKKKKEK BOOKKEEPING INFORMATION Status info for the Job https cert rb 01 cnaf infn it 9000 55YfzeDigWeoHbpHxx1BQA Current Status Ready Status Reason unavailable Destination cert ce 03 cnaf infn it 2119 blah lsf pps Submitted Mon May 15 15 14 55 2006 CEST KAEKKKKKKKKKK KKK KKK KKK KKK KKK KKK KK KK KKK KKK KKK KKK KKKKKKKKKKKKKKEK where the current status of the job is shown along with the time when that status was reached and the reason for being in that state which may be especially helpful for the ABORTED state The possible states in which a job can be found were introduced in Section 3 3 1 and are summarized in Appendix C Finally the destination field contains the ID of the CE where the job has been submitted Much more
38. alice ALICE v4 01 other GlueHostApplicationSoftwareRunTimeEnvironment amp amp other GlueCEPolicyMaxWallClockTime gt 86000 Example 6 2 6 Using the automatic resubmission It is possible to have the WMS automatically resubmitting jobs which for some reason are aborted by the Grid Two kinds of resubmission are available for the gLite 3 0 WMS the deep resubmission and the shallow resubmission only the former is available in the LCG 2 WMS The resubmission is deep when the job fails after it has started running on the WN and shallow if it fails before the user executable starts The user can limit the number of times the WMS should resubmit a job by using the JDL attributes Ret ryCount and ShallowRetryCount for the deep and shallow resubmission respectively For example to disable the deep resubmission and limit the attempts of shallow resubmission to 3 RetryCount 0 ShallowRetryCount 3 It is advisable to disable the deep resubmission as in some conditions the WMS can lose track of a job even if it is actually running on the WN which may cause problems if an identical job is started elsewhere by the system The values of the MaxRetryCount and MaxShallowRetryCount parameters in the WMS configuration file represent both the default and the maximum limits for the number of resubmissions CERN LCG GDEIS 722398 Manuals Series Page 70 Example 6 2 7 Using the automatic proxy renewal The proxy renewal feature of
39. can be useful for example to use a WMS that is different from the default one Other options present in the glite job and edg job commands are the following the log lt file gt option allows the user to define the log file the default log file is named lt command_name gt _ lt UID gt _ lt PID gt _ lt date_time gt log and it is found in the directory specified in the configuration file The noint option skips all interactive questions and prints all warning and error messages to a log file The help and version options are self explanatory Example 6 3 7 1 Changing the default VO In a gLite 3 0 UI the user can change his default VO by performing the following steps a Make a copy of the file GLITE_WMS_LOCATION etc glite_wms_ui_cmd_var conf for example to SHOME my_ui conf b Edit SHOME my_ui conf and change this line DefaultVo cms if for example he wants to set the CMS VO as default c Define in the shell configuration script SHOME bashrc for bash and HOME cshrc for csh tcsh the en vironment variable setenv GLITE_WMS_UI_CONFIG_VAR SHOME my_ui conf t csh export GLITE_WMS_UI_CONFIG_VAR HOME my_ui conf bash Example 6 3 7 2 Using several RBs CERN LCG GDEIS 722398 Manuals Series Page 90 Several NSs as well as LBs can be specified in the previously indicated VO specific configuration file In this way the submission tool will try to use the first NS specified and will use another one
40. data published from local clients known as primary producers and may also collect data from other sites and re publish it secondary producers Generally speaking primary producers answer Continuous queries and secondary producers answer Latest and History queries the latter query types are only supported if someone has created a secondary producer for the table s concerned this is normally the case for standard tables e g Glue The data may be stored either in memory or in a real database and some queries notably joins are only possible if all the required data can be found in a single real database Such producers are known as archivers The local R GMA servers store all the data and deal with all the client interactions so in this sense RRGMA is a distributed system However there is also a central server known as the Registry which holds the schema the definitions of all the tables and has lists of all consumers and producers to allow them to find each other At present the Registry is a unique service in the Grid Users are free to create and use their own tables However at present there is only a single namespace for tables so users should try to choose distinctive table names e g prefixed with your VO or application name There is a standard table called userTable which can be used for simple tests R GMA is a secure service to the extent that you need a valid proxy to use it or a valid certificate in your web browser H
41. e the file SGLITE_LOCATION etc vomses e the file opt glite etc vomses The configuration file must contain lines with the following syntax alias host port subject vo where the items are respectively an alias usually the name of the VO the host name of the VOMS server the port number to contact for a given VO the DN of the server host certificate and the name of the VO For example dteam lcg voms cern ch 15004 C CH O CERN OU GRID CN host lcg voms cern ch dteam Example 4 4 2 2 Printing information on a VOMS proxy The voms proxy info command is used to print information about an existing VOMS proxy Two useful options are all which prints everything and fqan which prints the groups and roles in FQAN format For example voms proxy info all subject C CH O CERN OU GRID CN John Doe CN proxy issuer C CH O CERN OU GRID CN John Doe identity C CH O CERN OU GRID CN John Doe type gt proxy strength 512 bits path tmp x509up_ul0585 timeleft 11 59 58 VO cms extension information CERN LCG GDEIS 722398 Manuals Series Page 39 VO cms subject C CH O CERN OU GRID CN John Doe issuer C CH O CERN OU GRID CN host lcg voms cern ch attribute cms Role NULL Capability NULL timeleft 11 59 58 4 4 3 Proxy Renewal Proxy certificates created as described in the previous section pose a problem if the job does not finish before the expiration time of the proxy used
42. example of MPI job Example 6 3 6 1 MPI job submission The very simple MPI application could be the following MPItest c include mpi h include lt stdio h gt int main int argc char argv int numprocs Number of processors int procnum Processor number Initialize MPI PI_Init amp arge amp argv Find this processor number PI_Comm_rank MPI_COMM_WORLD amp procnum Find the number of processors PI_Comm_size MPI_COMM_ WORLD amp numprocs printf Hello world from processor d out of d n procnum numprocs Shut down MPI PI_Finalize return 0 The JDL file would be Type Job 3This is so because the job type that is used at the globus gatekeeper level is multiple If you bypass the LCG middleware and submit a job using globus directly and if you specify mpi as the job type then globus calls mpirun directly on the specified executable This is rather limiting because no pre or post MPI activity can be performed CERN LCG GDEIS 722398 Manuals Series Page 86 JobType MPICH NodeNumber 10 Executable MPItest sh Arguments MPltest Stdoutput test out StdError test err InputSandbox MPItest sh MPItest c OutputSandbox test err test out mpiexec out And the script send as executable would be the following MPItest sh bin sh x Binary to execute EXE 1 echo WAKAKKKKKK
43. if this attempt fails e g the NS is not accessible The syntax to do this can be deduced from the following example NSAddresses NS_1 NS_2 LBAddresses LB_la LB lb LB 2 In this case the first NS to be contacted is NS_1 and either LB_1a or LB_1b if the first one is not available will be used as LB servers If NS_1 cannot be contacted then NS_2 and LB_2 will be used instead In general it is probably more useful to just specify several NSs and then the associated LB usually in the same RB or in a close machine to each one of them always using curly brackets as shown in the example CERN LCG GDEIS 722398 Manuals Series Page 91 7 DATA MANAGEMENT 7 1 INTRODUCTION This chapter describes the client tools that are available to deal with the data in gLite 3 0 An overview of the available Data Management APIs is also is given in Appendix F 7 2 STORAGE ELEMENTS The Storage Element is the service which allows a user or an application to store data for future retrieval Even if it is foreseen for the future currently there is no enforcement of policies for space quota management All data in a SE must be considered permanent no scratchable via automatic mechanism local on the site and it is user responsibility to manage the available space in a SE removing unnecessary data moving files to mass storage systems etc 7 2 1 Data Channel Protocols The data access protocols supported in cur
44. information is provided if the verbosity level is increased by using v 1 v 2 or v 3 with the command See for detailed information on each of the fields that are returned then Many job identifiers can be given as arguments of the glite job status command i e CERN LCG GDEIS 722398 Manuals Series Page 75 glite job status lt jobIdl gt lt jobIdN gt The option i lt file path gt can be used to specify a file with a list of job identifiers saved previously with the o option of glite job submit In this case the command asks the user interactively the status of which job s should be printed Subsets of jobs can be selected e g 1 2 4 glite job status i jobs list 1 https cert rb 01 cnaf infn it 9000 ma46vg cgV2SzakmCI CTw 2 https cert rb 01 cnaf infn it 9000 55YfzeDigWeoHbpHxx1BQA a all q quit Choose one or more jobId s in the list 1 2 all If the a11 option is used instead the status of all the jobs owned by the user submitting the command is retrieved As the number of jobs owned by a single user may be large there are some options that limit that job selection The from to MM DD hh mm CC YY options make the command query LB for jobs that were submitted after before the specified date and time The status lt state gt s option makes the command retrieve only the jobs that are in the specified state and the exclude lt state gt e option makes it retrieve jobs tha
45. is a subset of GridFTP Please refer to for more information 5A CERN hierarchical Storage Manager CERN LCG GDEIS 722398 Manuals Series Page 92 and GID The secure RFIO on the other hands can be used for file access to any remote network storage and also from a UI There is some more information about RFIO in Appenddix F The gsidcap protocol is the GSI secure version of the dCache access protocol dcap Being GSI secure gsidcap can be used for inter site remote file access It is possible that in the future kdcap the Kerberos secure version of dcap will be also supported The file protocol was used in the past for local file access to remote network file systems Currently this option is not supported anymore and the file protocol is only used to specify a file on the local machine i e in a UI ora WN but not stored in a Grid SE 7 2 2 The Storage Resource Manager interface The Storage Resource Manager SRM has been designed to be the single interface through the corresponding SRM protocol for the management of disk and tape storage resources Any kind of Storage Element will eventu ally offer an SRM interface that will hide the complexity of the resources behind it and allow the user to request files pin them for a specified lifetime reserve space for new entries and so on Behind this interface the SE will have a site specific policy defined according to which files are migrated from disk to tape users are all
46. is using a certificate usercert pem which does not correspond to the private key userkey pem strange errors may occur To test if this is the case run the command grid proxy init verify In case of mismatch the output will be Your identity C CH O CERN OU GRID CN John Doe Enter GRID pass phrase for this identity Creating PLORY idea A Boe anda Nid dower Done ERROR Couldn t verify the authenticity of the user s credential to generate a proxy from Use debug for further information 4 4 PROXY CERTIFICATES 4 4 1 Proxy Certificates At this point the user is able to generate a proxy certificate A proxy certificate is a delegated user credential that authenticates the user in every secure interaction and has a limited lifetime in fact it prevents having to use one s own certificate which could compromise its safety and avoids having to give the certificate pass phrase each time The command to create a proxy certificate is grid proxy init which prompts for the user pass phrase as in the next example CERN LCG GDEIS 722398 Manuals Series Page 35 Example 4 4 1 1 Creating a proxy certificate To create a proxy certificate issue the command grid proxy init If the command is successful the output will be like Your identity O Grid O CERN OU cern ch CN John Doe Enter GRID pass phrase for this identity Creating PLORY dai as ica we ad aii Done Your proxy is valid until Tue Jun 24 23 48 4
47. job get chkpt lt jobid gt command 6 3 6 MPI Jobs Message Passing Interface MPI applications are run in parallel on several processors Note In WLCG EGEE there is no native support for MPI jobs therefore computing centers supporting MPI jobs should provide a particular farm configuration In gLite 3 0 there is notive support for MPI jobs but since they have never been tested they will not be described here Some WLCG EGEE sites support jobs that are run in parallel using the MPI standard It is not mandatory for WLCG EGEE clusters to support MPI so those clusters who do must publish this in the IS They do so by adding the value MPICH to the GlueHostApplicationSoftwareRunTimeEnvironment GLUE attribute User s jobs can then look for this attribute in order to find clusters that support MPI This is done transparently for the user by the use of a requirements expression in the JDL file as shown later From the user s point of view jobs to be run as MPI are specified setting the JDL JobType attribute to MPICH When that attribute is included then the presence of the NodeNumber attribute in the JDL is mandatory as well This variable specifies the required number of CPUs for the application These two attributes could be specified as follows JobType MPICH NodeNumber 4 The UI automatically requires the MPICH runtime environment installed on the CE and a number of CPUs at least equal to the required number of nodes This i
48. jobmanager lcgpbs dteam 498 hepgrid2 ph liv ac uk 2119 jobmanager 1cgpbs 1hcb 498 hepgrid2 ph liv ac uk 2119 jobmanager lcgpbs babar 498 19 rows In this example we first set the type of query to continuous That is new tuples are received as they are published and the query will not terminate unless the user aborts or a maximum time for the query is reached This timeout is then defined as 120 seconds Finally we query for the ID and the number of CPUs of all CEs publishing information into R GMA in the two minutes following the query 5 2 4 R GMA APIs There exist R GMA APIs in Java C C and Python They include methods for creating consumers as well as primary and secondary producers setting the types of queries and of producers retention periods and time outs retrieving tuples and inserting data The APIs are beyond the scope of this introduction but detailed documentation exists for all APIs including example code 21 5 3 MONITORING The ability to monitor resource related parameters is currently considered a necessary functionality in any net work In such an heterogeneous and complex system as the Grid this necessity becomes fundamental A proper monitoring system permits the existence of a central point of operational information i e in gLite 3 0 the GOC The monitoring system should be able to collect data from the resources in the system in order to analyze us age behavior and performance of the Grid d
49. local o grid or ldapsearch x H ldap 1xb2006 cern ch 2135 b mds vo name local o grid And the obtained reply will be CERN LCG GDEIS 722398 Manuals Series Page 49 version 2 filter objectclass requesting ALL 1xb2006 cern ch 2119 jobmanager lcgpbs atlas local grid dn GlueCEUniqueID 1xb2006 cern ch 2119 jobmanager lcgpbs atlas mds vo name lo cal o grid objectClass GlueCETop objectClass GlueCE objectClass GlueSchemaVersion objectClass GlueCEAccessControlBase objectClass GlueCEInfo objectClass GlueCEPolicy objectClass GlueCEState objectClass GlueInformationService objectClass GlueKey GlueCEHostingCluster 1xb2006 cern ch GlueCEName atlas GlueCEUniqueID 1xb2006 cern ch 2119 jobmanager lcgpbs atlas GlueCEInfoGatekeeperPort 2119 GlueCEInfoHostName 1xb2006 cern ch GlueCEInfoLRMSType torque GlueCEInfoLRMSVersion torque_1 0 1p5 GlueCEInfoTotalCPUs 2 GlueCEInfoJobManager lcgpbs GlueCEInfoContactString 1xb2006 cern ch 2119 jobmanager lcgpbs atlas GlueCEInfoApplicationDir opt exp_soft GlueCEInfoDataDir unset GlueCEInfoDefaultSE 1xb2058 cern ch GlueCEStateEstimatedResponseTime 2146660842 GlueCEStateFreeCPUs 2 GlueCEStateRunningJobs 0 GlueCEStateStatus Production GlueCEStateTotalJobs 0 GlueCEStateWaitingJobs 4444 GlueCEStateWorstResponseTime 2146660842 GlueCEStateFreeJobSlots 0 GlueCEPolicyMaxCPUTime 2880 GlueCEPolicyMax
50. lt lt nCopying lt lt filestatuses 0 surl lt lt to lt lt destfile lt lt n Copy the file to the local filesystem if lcg_cp filestatuses 0 surl destfile dteam 1 0 0 1 0 perror Error in lcg_cp free filestatuses 0 surl if filestatuses 0 status 1 free filestatuses 0 turl free filestatuses if numiter gt 49 cout lt lt nThe file did not reach the READY status It could not be copied lt lt endl Cleaning delete destfile That was all cout lt lt endl CERN LCG GDEIS 722398 Manuals Series Page 145 return reqid return the reqid so that it can be used by the caller end of main The srm_get function is called once to request the staging of the file In this call we retrieve the corresponding TURL and some numbers identifying the request If a LFN was provided several TURLs from several replicas could be retrieved In this case only one TURL will be returned stored in the first position of the filestatuses array The second part of the program is a loop that will repeatedly call srm_get status in order to get the current status of the previous request until the status is equal to ready There is a sleep call to let the program wait some time time increasing with each iteration for the file staging Also a maximum number of iterations is set 50 so that the program does not wait for ever but rather ends finally wit
51. making which first selects among all available CEs those which fulfill the requirements expressed by the user and which are close to specified input Grid files It then chooses the CE with the highest rank a quantity derived from the CE status information which expresses the goodness of a CE typically a function of the numbers of running and queued jobs The RB locates the Grid input files specified in the job description using a service called the Data Location Interface DLI which provides a generic interface to a file catalog In this way the Resource Broker can talk to file catalogs other than LFC provided that they have a DLI interface The most recent implementation of the WMS from EGEE allows not only the submission of single jobs but also collections of jobs possibly with dependencies between them in a much more efficient way then the old LCG 2 WMS 25 Finally the Logging and Bookkeeping service LB tracks jobs managed by the WMS It collects events from many WMS components and records the status and history of the job 3 3 JOB FLOW This section briefly describes what happens when a user submits a job to the WLCG EGEE Grid to process some data and explains how the different components interact 3 3 1 Job Submission Figure 6Jillustrates the process that takes place when a job is submitted to the Grid It refers to the LCG 2 WMS but the gLite WMS is similar The individual steps are as follows CERN LC
52. of the CPU ProcessorClockSpeed clock speed of the CPU ProcessorInstructionSet name of the instruction set architecture of the CPU ProcessorOtherProcessorDescription other description for the CPU ProcessorCacheLl size of the unified L1 cache ProcessorCacheL1I size of the instruction L1 cache ProcessorCacheL1D size of the data L1 cache ProcessorCacheL2 size of the unified L2 cache Application software objectclass GlueHostApplicationSoftware ostApplicationSoftwareRunTimeEnvironment list of software installed on this host ost ost ost ost ost ost ost ost ain ain ain ain Benchmark objectc Benc Benc Main memory objectclass GlueHostMainMemory emoryRAMSize physical RAM emoryRAMAvailable unallocated RAM emoryVirtualSize size of the configured virtual memory emoryVirtualAvailable available virtual memory lass GlueHost Benchmark hmarkS100 SpecInt2000 benchmark hmarkSF00 SpecFloat2000 benchmark Network adapter objectclass GlueHostNetworkAdapter etworkAdapterName name of the network card etworkAdapterIPAddress IP address of the network card CERN LCG GDEIS 722398 Manuals Series Page 155 GlueHostNetworkAdapterMTU the MTU size for the LAN to which the network card is attached GlueHostNetworkAdapterOutboundIP permission for outbound connectivity GlueHostNetworkAdapterInboundIP permission for inbound connectivity Processor load objectclass G
53. returns the port number used by lt SE gt for the data transfer protocol lt Protocol gt e getVirtualOrganization returns the name of the VO specified in the VirtualOrganisation JDL at tribute The v option produced a more verbose output and the f lt filename gt option tells the command to parse the BrokerInfo file specified by lt filename gt If the f option is not used the command tries to parse the file SGLITE_WL_RB_BROKERINFO There are basically two ways for parsing elements from a BrokerInfo file The first one is directly from the job and therefore from the WN where the job is running In this case the SGLITE_WL_RB_BROKERINFO variable is defined as the location of the BrokerInfo file in the working directory of the job and the command will work without problems This can be accomplished for instance by including a line like the following in a submitted shell script glite brokerinfo getCE where the glite brokerinfo command is called with any desired function as its argument If on the contrary glite brokerinfo is invoked from the UI the GLITE_WL_RB_BROKERINFO variable will be usually undefined and an error will occur The solution to this is to include an instruction to generate the BrokerInfo file as output of the submitted job and retrieve it with the rest of generated output by specifying the file in the Output Sandbox when the job finishes This can be done for example by including the following lines
54. submit m myproxy fts cern ch s https fts sc cr cnaf infn it 8443 sc3infn glite data transfer fts services FileTransfer SARA CNAF in p passwd where the input file SARA CNAF in looks like Scat SARA CNAF in srm srm grid sara nl 8443 srm managervl SFN pnfs grid sara nl data lhcb test roberto zz_zz f srm sc cr cnaf infn it 8443 srm managervl SFN castor cnaf infn it grid lcg lhcb test roberto SARA_1 srm srm grid sara nl 8443 srm managervl SFN pnfs grid sara nl data lhcb test roberto zz_zz f srm sc cr cnaf infn it 8443 srm managervl SFN castor cnaf infn it grid lcg lhcb test roberto SARA_2 srm srm grid sara nl 8443 srm managervl SFN pnfs grid sara nl data lhcb test roberto zz_zz f srm sc cr cnaf infn it 8443 srm managervl SFN castor cnaf infn it grid lcg lhcb test roberto SARA_3 The passwd in the example is an environment variable opportunely set to the value of the password to be passed to FTS Attention The transfers handled by FTS within a single job bulk submission must all refer to the same channel otherwise FTS will not process such transfers and will return the message Inconsistent channel Example 7 5 3 2 Querying the status of a job The following example shows a query to FTS for inferring information about the states of a transfer job Sglite transfer status s https fts sc cr cnaf infn it 8443 sc3infn glite data transfer fts CERN LCG GDEIS 722398 Manuals Series Page 103 services FileTr
55. system LRemoteFileSystemReadOnly true is the file system is read only LRemoteFileSystemType file system type e g NFS AFS etc LRemoteFileSystemName name of the file system LRemoteFileSystemServer unique identifier of the server exporting this file system e File Information objectclass GlueSLFile Glues Glues Glues Glues Glues Glues Glues Glues LFileName file name LFileSize file size LFileCreationDate file creation date and time LFileLastModified date and time of the last modification of the file LFileLastAccessed date and time of the last access to the file LFileLatency time needed to access the file LFileLifeTime file lifetime LFilePath file path e Directory Information objectclass GLueSLDirectory GlueS Glues Glues Glues Glues Glues Glues Glues LDirectoryName directory name LDirectorySize directory size LDirectoryCreationDate directory creation date and time LDirectoryLastModified date and time of the last modification of the directory LDirectoryLastAccessed date and time of the last access to the directory LDirectoryLatency time needed to access the directory LDirectoryLifeTime directory lifetime LDirectoryPath directory path e Architecture objectclass GLueSLArchitecture Glues LArchitectureType type of storage hardware i e disk RAID array tape library etc e P
56. the VO she wishes to join and register some personal data with a Registration Service Once the user registration is complete he can access WLCG EGEE The Grid Security Infrastruc ture GSI in WLCG EGEE enables secure authentication and communication over an open network 17 GSI is based on public key encryption X 509 certificates and the Secure Sockets Layer SSL communication protocol with extensions for single sign on and delegation In order to authenticate herself to Grid resources a user needs to have a digital X 509 certificate issued by a Certification Authority CA trusted by WLCG EGEE Grid resources are generally also issued with certificates to allow them to authenticate themselves to users and other services The user certificate whose private key is protected by a password is used to generate and sign a temporary certificate called a proxy certificate or simply a proxy which is used for the actual authen tication to Grid services and does not need a password As possession of a proxy certificate is a proof of identity the file containing it must be readable only by the user and a proxy has by default a short lifetime typically 12 hours to reduce security risks if it should be stolen The authorisation of a user on a specific Grid resource can be done in two different ways The first is simpler and relies on the grid mapfile mechanism The Grid resource has a local grid mapfile which maps user certificates to loca
57. the WMS is automatically enabled as long as the user has stored a long proxy in the default MyProxy server usually defined in the MYPROXY_SERVER environment variable However it is possible to indicate to the WMS a different MyProxy server in the JDL file MyProxyServer myproxy cern ch The proxy renewal can be disabled altogether by adding to the JDL MyProxyServer Example 6 2 8 Customizing the goodness of a CE The choice of the CE where to execute the job among all the ones satisfying the requirements is based on the rank of the CE namely a quantity expressed as a floating point number The CE with the highest rank is the one selected By default the Rank is equal to other GlueCEStateEstimatedResponseTime where the estimated re sponse time is an estimation of the time interval between the job submission and the beginning of the job execution However the user can redefine the rank with the Rank attribute as a function of the CE attributes For example Rank other GlueCEStateFreeCPUs which will rank best the CE with the most free CPUs The next one is a more complex expression Rank other GlueCEStateWaitingJobs 0 other GlueCEStateFreeCPUs other GlueCEStateWaitingJobs In this case the selected CE will be the one with the least waiting jobs or the most free CPus if there are no waiting jobs 6 3 THE COMMAND LINE INTERFACE In this section all commands available for the user to mana
58. the documentation regarding the different batch systems supported ATTENTION This command executes heavy processes on the Computing Elements please do not use it too often not every minutes or event processed Rather limit the usage to something like once every hour or when a sensitive percentage of your job is accomplished The following files should be present in the WN either already in the release or shipped with the job in a tarball e lcg getJobStats small wrapper bash script around the corresponding python script e lcg getJobTimes small wrapper bash script around the corresponding python script e lcg getJobStats py python script executable e lcg getJobTimes py python script executable e lcg_jobConsumedTimes py python module e lcg_jobStats py python module In principle one should only deal with lcg getJobStats or with lcg_jobStats py for python API lcg getJobTimes and lcg_jobConsumedTimes py provide a way to estimate the used CPU and wall clock time without querying the batch system but by parsing the proc filesystem It is internally called by 1cg getJobStats in the cases where it cannot get the information from the batch system like when Condor is used Information on this tool can be found under http goc grid sinica edu tw gocwiki Time_Left_Utility D 6 INFORMATION SYSTEM READER LCG INFO This command was already discussed in Section Nevertheless the more up to date information c
59. the given VO it is mandatory when querying for attributes which are inherently referred to a VO like AvailableSpace and UsedSpace Apart from the listing options the help option can be specified alone to obtain a detailed description of the command and the list attrs option can be used to get a list of the supported attributes Example 5 1 2 1 Get the list of supported attributes To have a list of the supported attributes give leg info list attrs the output is similar to Attribute name Glue object class Glue attribute name EstRespTime GlueCE GlueCEStateEst imatedResponseTime WorstRespTime GlueCE GlueCEStateWorstResponseTime TotalJobs GlueCE GlueCEStateTotalJobs TotalCPUs GlueCE GlueCEInfoTotalCPUs CERN LCG GDEIS 722398 Manuals Series Page 47 For each attribute the simplified attribute name used by lcg info the corresponding object class and the attribute name in the GLUE schema are given Example 5 1 2 2 List all the Computing Elements in the BDII satisfying given conditions and print the desired attributes You want to know how many jobs are running and how many free CPUs there are on CEs that have more an Athlon CPU and have Scientific Linux leg info list ce query Processor Athlon 0S SL attrs RunningJobs FreeCPUs The output could be CE lcgce psn ru 2119 jobmanager 1cgpbs biomed RunningJobs 30 FreeCPUs 0 It must be stressed that 1cg info only supports a l
60. tw GlueCESEBindCEUniquelD tb009 grid sinica edu tw 2119 3jobmanager 1cgpbs biomed Example 5 1 5 2 Listing all the CEs which publish a given tag querying the BDII The attribute GlueHostApplicationSoftwareRunTimeEnvironment can be used to publish experiment specific information tag on a CE for example that a given experiment software is installed To list all the CEs which publish a given tag a query to the BDII can be performed In this example that information is retrieved for all the subclusters ldapsearch h 1xn1187 cern ch p 2170 b o grid x objectclass GlueSubCluster GlueChunkKey GlueHostApplicationSoftwareRunTimeEnvironment Example 5 1 5 3 Listing all the SEs which support a given VO A Storage Element supports a VO if users of that VO are allowed to store files on that SE It is possible to find out which SEs support a VO with a query to the BDIT For example to have the list of all SEs supporting ATLAS to gether with the storage space available in each of them the GlueSAAccessControlBaseRule which specifies a supported VO is used ldapsearch LLL h 1xn1187 cern ch p 2170 b mds vo name local o grid x GlueSAAccessControlBaseRule alice GlueChunkKey GlueSAStateAvailableSpace GlueSAStateUsedSpace And the obtained result will be something like the following dn GlueSALocalID alice GlueSEUniqueID gw38 hep ph ic ac uk mds vo name UKI LT 2 IC HEP PPS mds vo name local o grid GlueSASta
61. users to list all the replicas of a file that have been successfully registered in the File Catalog lcg rep vo dteam lfn my_aliasl CERN LCG GDEIS 722398 Manuals Series Page 108 gt sfn 1xb0707 cern ch flatfiles SE00 dteam generated 2004 07 09 file79aee616 6cd7 4b75 8848 f 09110adel78 gt sfn 1xb0710 cern ch flatfiles SE00 dteam generated 2004 07 08 file0dcabb46 2214 4db8 9ee8 2930delabbef Again LFN GUID or SURL can be used to specify the file for which all replicas must be listed The SURLs of the replicas are returned The 1cg 1g command list GUID returns the GUID associated with a specified LFN or SURL lcg lg vo dteam sfn 1xb0707 cern ch flatfiles SE00 dteam generated 2004 07 09 file7 Yaee616 6cd7 4b75 8848 09110adel78 gt guid db7ddbc5 613e 423 9501 3c0c00a0ae24 The 1cg 1a command list aliases can be used to list the LFNs associated with a particular file which can be identified by its GUID any of its LFNs or the SURL of one of its replicas lcg la vo dteam guid baddb707 0cb5 4d9a 8141 a046659d243b gt lfn my_aliasl The 1fc commands for LFC and the tools edg 1rc and edg rmc for the RLS offer more functionalities for catalog interaction although the ones provided by the 1cg commands should be enough for a normal user Example 7 6 1 4 Copying files out of the Grid The 1cg cp command can be used to copy a Grid file to a non grid storage resource The first argument source fi
62. which have no attributes and thus no actual resource data and some other that include general attributes that are defined in entries of both CEs and SEs These are the version of the schema that is used the URL of the IS server and finally the GlueKey which is used to relate different entries of the tree and in this way overcome OpenLDAP limitations in query flexibility e Base class objectclass GlueTop No attributes Base class for general object classes attributes matching rules etc objectclass GlueGeneralTop No attributes Glu Glu Schema Version Number objectclass GlueSchemaVersion eSchemaVersionMa jor Major Schema version number eSchemaVersionMinor Minor Schema version number e Internal attributes to express object associations objectclass GLueKey Glu eChunkKey Relative DN AttributeType Attribute Value to reference a related entry in the same branch than this DN Glu eForeignKey Relative DN AttributeType Attribute Value to reference a related entry in a dif ferent branch Information for the Information Service objectclass GlueInformationService GlueInformationServiceURL The Information Service URL publishing the related information e Service entity objectclass GlueService Glu Glu Glu Glu Glu Glu Glu Glu Glu eServiceUniquelb unique identifier of the service eServiceName human readable name of the service eServiceType type of service eServiceVer
63. 00 IQM Vq20r9RzgcStUMWdg All the command options work exactly as in glite job status and edg job status Note If the job has not reached the CE yet i e its status is WAITING or READY the cancellation request could be ignored and the job may continue running although a message of successful cancellation is returned to the user In such cases just cancel the job again when its status is SCHEDULED or RUNNING Example 6 3 2 3 Retrieving the output of a job After the job has finished it reaches the DONE status its output can be copied to the UI with the command glite job output for the gLite WMS or edg job get output for the LCG RB which takes a list of jobs as argument For example glite job output https cert rb 01 cnaf infn it 9000 55YfzeDigWeoHbpHxx1BQA Retrieving files from host cert rb 01 cnaf infn it for https KKEKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KKK KK KKK KKKK KK KKK KKK KKK KKKKKKKKKKKKKKKKKKK JOB GET OUTPUT OUTCOME Output sandbox files for the job https cert rb 0l cnaf infn it 9000 55YfzeDigWeoHbpHxx1BQA have been successfully retrieved and stored in the directory tmp scampana_55YfzeDigWeoHbpHxx1BQA KKEKKKKKKKKKKKKKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KK RARA KK KK KK A KARA By default the output is stored under tmp but it is possible to specify in which directory to save the output using the dir lt path_name gt option All command options work exa
64. 05 Official Name INFN CNAF Bologna Italy Friendly Name INFN GNAF Institution Department Homepage http grid it cnaf inin it Contact email site manage n cnat infn it Contact telephone 39 051 6092745 Emergency telephone 39 051 6092809 CSIRT email site manage m enai_infn it CSIRT telephone 39 051 6092809 Operational hours 09 00 18 00 CET CEST Supported VOs Alce Atlas BaBar Biomed CMS DTeam ESR LHCb Zeus GIIS URL dap g rdit ce 00 1 cnaf infn it 2 135 mds vo name infn cnaf o grid Organisation Grid IT ON T Nodes egee bdi 01 cnaf infn it egee bdi 02 cnaf infn it Currently reserved for ATLAS gridit ce 00 1 cnaf infn it testbed011 cnaf infn it INFN RLS INFO Map testbed0 11 cnaf infn it 2 135 mds vo name infn rs info o grid lefgsrv cnaf infn it testbedO 13 cnaf infn it Only trusts IT RBs at present Gb Z ES G4 D Done Figure 7 The status page of the INFN CNAF site cert ce 03 cnaf infn it resource grid dn GlueClusterUniquelD cert ce 03 cnaf infn it mds vo name resource o grid objectClass GlueClusterTop objectClass GlueCluster objectClass GlueSchemaVersion objectClass GlueInformationService objectClass GlueKey GlueClusterName cert ce 03 cnaf infn it GlueClusterService cert ce 03 cnaf infn it GlueClusterUniquelD cert ce 03 cnaf infn it GlueForeignKey GlueCEUniqueID cert ce 03 cnaf infn it 2119 blah lsf pps GlueForeignKey GlueCEUniqueID ce
65. 1 1hcb001 z5 12553 Feb 20 12 02 zz_globus gt rw rw r 1 lhcb001 25 12553 Nov 04 14 00 zz_zz f The srm get metadata returns the following information srm get metadata srm castorsrm cern ch 8443 castor cern ch grid lhcb zz_globus e SRMClientVl getFileMetaData contacting service httpg castorgrid06 cern ch 8443 srm managervl FileMetaData srm castorsrm cern ch 8443 castor cern ch grid lhcb zz_globus RequestFileStatus SURL srm castorsrm cern ch 8443 castor cern ch grid lhcb zz_globus size 12553 owner lhcb001 group z5 permMode 33188 checksumType null checksumValue null isPinned false isPermanent true isCached false state fileld 0 TURL estSecondsToStart 0 sourceFilename destFilename The srm advisory delete command for deleting the entry zz_globus does not work in this case srm advisory delete srm castorsrm cern ch 8443 castor cern ch grid lhcb zz_globus The file is not removed as srm get metadata is still reporting the same information srm get metadata srm castorsrm cern ch 8443 castor cern ch grid lhcb zz_globus gt wee SRMClientVl getFileMetaData contacting service httpg castorgrid01 cern ch 8443 srm managervl FileMetaData srm castorsrm cern ch 8443 castor cern ch grid lhcb zz_globus RequestFileStatus SURL srm castorsrm cern ch 8443 castor cern ch grid lhcb zz_globus CERN LCG GDEIS 722398 Manuals Series Page 116 size 12553 owner lhcb001 gr
66. 119 jobmanager lcglsf grid_glite 5 5 0 0 0 cclcgceli07 in2p3 fr 2119 jobmanager bqs short 5 5 0 0 0 cclcgceli07 in2p3 fr 2119 jobmanager bgqs medium 34 0 0 0 0 prep ce 02 pd infn it 2119 jobmanager lcglsf alice 4 4 0 0 0 zeus75 cyf kr edu pl 2119 jobmanager lcgpbs alice 1 1 0 0 0 cclcgceli07 in2p3 fr 2119 jobmanager bgs alice_long 2 2 0 0 0 tb009 grid sinica edu tw 2119 jobmanager lcgpbs alice Example 5 1 1 2 Obtaining information about storage resources To know the status of the storage resources lcg infosites vo atlas se kkkkxkkxkkkxkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkxkxkx k These are the related data for atlas in terms of SE KKAXKEKKKXXKKKKRKKKKKKKKKKKAKKKKKKKKK KAR KK RA KK A KK KAAKKAAAKKARRA Avail Space Kb Used Space Kb Type SEs 0 0 n a se02 lip pt 160250000 8860000 n a se0ld lip pt 1000000000000 500000000000 n a castorsrm pic es 745343488 6277966848 n a lcg gridka se fzk de 1000000000000 500000000000 n a castorsrm ific uv es 1000000000000 500000000000 n a castorgrid ific uv es 66810000 6500000 n a cert se 01 cnaf infn it 138020000 10260000 n a prep se 01 pd infn it 4145003580 249174980 n a prod se 01 pd infn it 5650364 2187160 n a zeus73 cyf kr edu pl 1 L n a dpm01 grid sinica edu tw 275585848 35358272 n a gw38 hep ph ic ac uk 3840000000 1360000000 n a grid08 ph gla ac uk 186770000 10090000 n a gridl3 csl ee upatras gr 3700000000 12440000000 n a castor grid sinic
67. 12 clcerts nokeys in my_cert pl2 out usercert pem where my_cert pl2 is the input PKCS12 format file userkey pem is the output private key file usercert pem is the output PEM certificate file CERN LCG GDEIS 722398 Manuals Series Page 30 The first command creates only the private key due to the nocerts option and the second one creates the user certificate nokeys clcerts option The grid change pass phrase file lt private_key_file gt command changes the pass phrase that pro tects the private key This command will work even if the original key is not password protected It is important to know that if the user loses the pass phrase the certificate will become unusable and a new certificate will have to be requested Once in PEM format the two files userkey pem and usercert pem should be copied to a User Interface This will be described later 4 1 4 Renewing the Certificate Most CAs issue certificates with a limited duration usually one year this implies the need to renew it periodically The renewal procedure usually requires that the certificate holder sends a request for renewal signed with the old certificate and or that the request is confirmed by a phone call the details depend on the policy of each CA Renewed certificates have the same DN as the old ones failing to renew the certificate usually implies the loss of the DN and the necessity to request a completely new certificate with a different D
68. 141 a046659d243b CERN LCG GDEIS 722398 Manuals Series Page 111 Likewise 1cg uf unregister file allows to delete a GUID SURL mapping respectively the first and second argument of the command from the catalog lcg uf vo dteam guid baddb707 0cb5 4d9a 8141 a046659d243b sfn 1xb0710 cern ch flatfil es SE00 dteam generated 2004 07 12 file04eec6b2 9ce5 4fae bf62 b6234bf334d6 If the last replica of a file is unregistered the corresponding GUID LFN mapping is also removed WARNING 1cg uf just removes entries from the catalog it does not remove any physical replica from the SE Watch out for consistency Example 7 6 1 8 Managing aliases The 1cg aa add alias command allows the user to add a new LEN to an existing GUID lcg la vo dteam guid baddb707 0cb5 4d9a 8141 a046659d243b gt lfn my_aliasl leg aa vo dteam guid baddb707 0cb5 4d9a 8141 a046659d243b lfn my_new_alias leg la vo dteam guid baddb707 0cb5 4d9a 8141 a046659d243b gt lfn my_aliasl gt lfn my_new_alias Correspondingly the 1cg ra command remove alias allows a user to remove an LEN from an existing GUID leg ra vo dteam guid baddb707 0cb5 4d9a 8141 a046659d243b l1fn my_aliasl lcg la vo dteam guid baddb707 0cb5 4d9a 8141 a046659d243b lfn my_new_alias v In order to list the aliases of a file the 1cg 1a command discussed previously has been used 7 6 2 Low Level Data Management Tools GSIFTP The following low lever u
69. 22398 Manuals Series Page 84 This section gives a brief overview of how checkpointable jobs should work in gLite 3 0 Checkpointable jobs are jobs that can be logically decomposed in several steps The job can save its state in a particular moment so that if the job fails that state can be retrieved and loaded by the job later In this way a checkpointable job can start running from a previously loaded state instead of starting from the beginning again Checkpointable jobs are specified by setting the JDL JobType attribute to Checkpointable When a check pointable job is submitted the user can specify the number or list of steps in which the job can be decomposed and the step to be considered as the initial one This can be done by setting respectively the JDL attributes JobSteps and CurrentStep The CurrentStep attribute is a mandatory attribute and if not provided by the user it is set automatically to O by the UI When a checkpointable job is submitted to be run from the beginning it is submitted as any other job using the edg job submit command If on the contrary the job must start from a intermediate state e g after a crash the chkpt lt state_file gt option may be used where state_file must be a valid JDL file where the state of a previously submitted job was saved In this way the job will first load the given state and then continue running until it finishes That JDL job state file can be obtained by using the edg
70. 4 2003 and the proxy certificate will be written in tmp x509up_u lt uid gt where lt uid gt is the Unix UID of the user unless the environment variable X509_USER_PROXY is defined in which case its value is taken as the proxy file name If the user gives a wrong pass phrase the output will be ERROR Couldn t read user key This is likely caused by either giving the wrong pass phrase or bad file permissions key file location home doe globus userkey pem Use debug for further information If the proxy certificate file cannot be created the output will be ERROR The proxy credential could not be written to the output file Use debug for further information If the user certificate files are missing or the permissions of userkey pem are not correct the output is ERROR Couldn t find valid credentials to generate a proxy Use debug for further information By default the proxy has a lifetime of 12 hours To specify a different lifetime the valid H M option can be used the proxy is valid for H hours and M minutes default is 12 00 The old option hours is deprecated When a proxy certificate has expired it becomes useless and a new one has to be created with grid proxy init Longer lifetimes imply bigger security risks though Use the option help for a full listing of options It is also possible to print information about an existing proxy certificate or to destroy it before its expiration as in the following e
71. B to the WN where the job is executed An event is logged in the LB and the status of the job is RUNNING g While the job runs Grid files can be directly accessed from an SE using either the secure RFIO or gsidcap protocols or after copying them to the local filesystem on the WN with the Data Management tools CERN LCG GDEIS 722398 Manuals Series Page 27 h The job can produce new output files which can be uploaded to the Grid and made available for other Grid users to use This can be achieved using the Data Management tools described later Uploading a file to the Grid means copying it to a Storage Element and registering it in a file catalog i If the job ends without errors the output not large data files but just small output files specified by the user in the so called Output Sandbox is transferred back to the RB node An event is logged in the LB and the status of the job is DONE j At this point the user can retrieve the output of his job to the UI An event is logged in the LB and the status of the job is CLEARED k Queries for the job status can be addressed to the LB from the UI Also from the UI it is possible to query the BDII for the status of the resources 1 If the site to which the job is sent is unable to accept or run it the job may be automatically resubmitted to another CE that satisfies the user requirements After a maximum allowed number of resubmissions is reached the job will be marked as aborted
72. DN like the following Mds Vo name lt sitename gt mds vo name local o grid In Figure 8 a view of the DIT of the BDI of the gLite 3 0 is shown In the figure only the sub tree that corresponds to the CERN site is expanded The DN for every entry in the DIT is shown Entries for storage and computing resources as well as for the bindings between CEs and SEs can be seen in the figure Each entry can contain attributes from different object classes This can be seen in the entry with DN GlueClusteringUniqueID 1xn1184 cern ch Mds Vo name cernlcg2 mds vo name local o grid which is highlighted in the figure This entry contains several attributes from the object classes GlueClusterTop GlueCluster GlueSchemaVersion GlueInformationService and GlueKey In the right hand side of the window the DN of the selected entry and the name and value in the cases where it exists of the attributes for this entry are shown Notice how the special objectclass attribute gives information about all the object classes that are applied to this entry As seen a graphical tool can be quite useful to examine the structure and certainly the details also of the Information Service LDAP directory In addition the schema object classes attributes can be also examined Example 5 1 5 1 Interrogating a BDIT In this example a query is sent to the BDII in order to retrieve two attributes from the GlueCESEBind object class for all sites ldapsearch x
73. EGEE Grid and a description of how to use them Examples are given of the management of jobs and data the retrieval of information about resources and other functionality An introduction to the gLite 3 0 middleware is presented in Chapter B This chapter describes all the middleware components and provides most of the necessary terminology It also presents the WLCG and the EGEE projects which developed the gLite 3 0 middleware In Chapter the preliminary procedures to follow before starting to use the Grid are described how to get a certificate join a Virtual Organisation and manage proxy certificates Details on how to get information about the status of Grid resources are given in Chapter 5 where the different information services and monitoring systems are discussed An overview of the Workload Management service is given in Chapter 6 This chapter explains the basic commands for job submission and management as well as those for retrieving information on running and finished jobs Data Management services are described in Chapter 7 Not only the high level interfaces are de scribed but also commands that can be useful in case of problems or for debugging purposes Finally the appendices give information about the gLite 3 0 middleware components Appendix A the configuration files and enviroment variables for users Appendix B the possible states of a job during submission and execution Appendix C user tools for the Gr
74. FC or LRC a file residing on an SE lcg la Lists the aliases for a given LFN GUID or SURL leg lg Gets the GUID for a given LFN or SURL lcg 1r Lists the replicas for a given LEN GUID or SURL Each command has a different syntax arguments and options but the vo lt vo_name gt option to specify the virtual organization of the user is present and mandatory in all the commands except for 1cg gt unless the environment variable LCG_GFAL_VO has been set in this case the VO for the user is taken from the value of that CERN LCG GDEIS 722398 Manuals Series Page 105 variable The config lt file gt option allows to specify a configuration file and the i option allows to connect insecurely to the File Catalog are currently ignored Timeouts The commands lcg cr lcg del lcg gt lcg rf lcg sd and 1cg rep all have timeouts im plemented By using the option t the user can specify a number of seconds for the tool to time out The default is O seconds i e no timeout If a tool times out in the middle of operations all actions performed till that moment are undone so no broken files are left on a SE and not unexisting files are registered in the catalogs Environment e For all 1cg commands to work the environment variable LCG_GFAL_INFOSYS must be set to point to the IS provider the BDII in the format hostname domain port so that the commands can retrieve the necessary information for their operation
75. File Catalog However usage of low level tools for both data transfer and catalog entries management could cause inconsistencies between SEs physical files files and catalog entries what would imply the corruption of GRID files This is why the usage of low level tools is strongly discouraged unless really necessary 7 4 1 LFC Commands In general terms the user should usually interact with the file catalog through high level utilities Icg utils see Section 7 6 1p The CLIs and APIs that are available for catalog interaction provide further functionality and more fine grained control for the operations with the catalog In some situations they represent the only possible way to achieve the desired functionality with the LFC With gLite 3 0 the variable LFC_HOST must be set to hold the hostname of the LFC server Attention For GFAL and the lcg utils the LFC_HOST variable is only another way to define the location CERN LCG GDEIS 722398 Manuals Series Page 96 Figure 11 Architecture of the LFC of the LFC but for the 1fc commands the variable is required since these tools do not use the information published in the IS The directory structure of the LFC name space initiates with the grid directory Under this one there is a directory for each one of the supported VOs Users of a VO will have read and write permissions only under the directory of their VO e g grid 1hcb for users of LHCb VO If such a directory does no
76. G GDEIS 722398 Manuals Series Page 26 Grid enabled data transfers Figure 6 Job flow in the WLCG EGEE Grid a After obtaining a digital certificate from a trusted Certification Authority registering in a VO and obtaining an account on a User Interface the user is ready to use the WLCG EGEE Grid He logs in to the UI and creates a proxy certificate to authenticate himself in subsequent secure interactions b The user submits a job from the UI to a Resource Broker In the job description file one or more files to be copied from the UI to the WN can be specified This set of files is called the Input Sandbox An event is logged in the LB and the status of the job is SUBMITTED c The WMS looks for the best available CE to execute the job To do so it interrogates the BDI to query the status of computational and storage resources and the File Catalog to find the location of any required input files Another event is logged in the LB and the status of the job is WAITING d The RB prepares the job for submission creating a wrapper script that will be passed together with other parameters to the selected CE An event is logged in the LB and the status of the job is READY e The CE receives the request and sends the job for execution to the local LRMS An event is logged in the LB and the status of the job is SCHEDULED f The LRMS handles the job execution on the available local farm worker nodes User files are copied from the R
77. HE VO Box The VO BOX is a type of node which will be deployed in all sites where the experiment can run specific agents and services The acess to the VO BOX is restricted to the SOFTWARE GROUP MANAGER of the VO If you are not your VO SGM you will not be interested in the VO BOX A description of the VO BOX functionalities and usage is described in WIKI indicated below Information on this can be found under http goc grid sinica edu tw gocwiki VOBOX_HowTo CERN LCG GDEIS 722398 Manuals Series Page 133 E 4 EXPERIMENTS SOFTWARE INSTALLATION Authorized users can install software in the computing resources of LCG 2 The installed software which we will call Experiment Sofware is also published in the Information Service so that user jobs can run on nodes where the software they need is installed The Experiment Software Manager ESM is the member of the experiment VO entitled to install Application Software in the different sites The ESM can manage install validate remove Experiment Software on a site at any time through a normal Grid job without previous communication to the site administrators Such job has in general no scheduling priorities and will be treated as any other job of the same VO in the same queue There would be therefore a delay in the operation if the queue is busy The site provides a dedicated space where each supported VO can install or remove software The amount of available space must be n
78. IS 722398 Manuals Series Page 16 i 3 1 PRELIMINARY MATTERS 3 1 1 Code Development Many of the services offered by WLCG EGEE can be accessed both by the user interfaces provided CLIs or GUIs or from applications by making use of various APIs References to APIs used for particular services will be given later in the sections describing such services A totally different matter is the development of software that forms part of the gLite middleware itself This falls outside the scope of this guide 3 1 2 Troubleshooting This document will also explain the meaning of the most common error messages and give some advice on how to avoid some common errors This guide cannot however include all the possible failures a user may encounter while using gLite 3 0 These errors may be produced due to user mistakes to misconfiguration of the Grid components to hardware or network failures or even to bugs in the gLite middleware Subsequent sections of this guide provide references to documents which go into greater detail about the gLite 3 0 components The Global Grid User Support GGUS service provides centralised user support for WLCG EGEE by answering questions tracking known problems maintaining lists of frequently asked questions pro viding links to documentation etc The GGUS portal is the key entry point for Grid users looking for help Finally a user who thinks that there is a security risk in the Grid may directly
79. KCS12 format file to be created My certificate is an optional name which can be used to select this certificate in the browser after the user has uploaded it if the user has more than one Once in PKCS12 format the certificate can be loaded into the WWW browser Instructions about how to do this for some popular browsers are available at http Icg web cern ch LCG users registration load cert html 4 2 2 Virtual Organisations A VO is an entity which typically corresponds to a particular organisation or group of people in the real world The membership of a VO grants specific privileges to the user For example a user belonging to the ATLAS VO will be able to read the ATLAS files or to exploit resources reserved to the ATLAS collaboration Becoming member of a VO usually requires being a member of the corresponding collaboration the user must comply with the rules of the VO to gain membership Of course it is also possible to be expelled from a VO when the user fails to comply with these rules Currently it is only possible to register a certificate to one VO at a time the only way to belong to more than one VO is to use different certificates for each one 4 3 SETTING UP THE USER ACCOUNT 4 3 1 The User Interface Apart from registering with WLCG EGEE a user must also have an account on a WLCG EGEE User Interface in order to access the Grid To obtain such an account a local system administrator must be contacted The official
80. KKKKKKKKKKKKKKKKKKKKKKKKK KKK KK KK ARA AAA KK KK RAR RA KK KK KK ARA AM echo Running on SHOSTNAME echo As whoami echo WAKKKKKKKKKKKKKKKKKKKKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKKKKKKKKKKKKKKKKKEKW echo Compiling binary SEXE echo mpicc o S EXE S EXE c mpicc o EXE EXE c if x PBS_NODEFILE x then echo PBS Nodefile PBS_NODEFILE HOST_NODEFILE PBS_NODEFILE fi if xSLSB_HOSTS x then echo LSF Hosts SLSB_HOSTS HOST_NODEFILE pwd lsf_nodefile for host in LSB_HOSTS do echo host gt gt HOST_NODEFILE done fi if xSHOST_NODEFILE x then echo No hosts file defined Exiting exit fi echo WAKKKKKKKKKKKKKKKKKKKKKK KKK KKK KKK KKK KKK KKK KKK KKK KK KK KARA RARA KK KARA AA CPU_NEEDED cat HOST_NODEFILE wc 1 echo Node count CPU_NEEDED echo Nodes in HOST_NODEFILE cat SHOST_NODEFILE CERN LCG GDEIS 722398 Manuals Series Page 87 echo WAKAKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KK KK KK RAR RRA KK KK KA RAR RA AA KK KK ARA M CPU_NEEDED cat SHOST_NODEFILE wc 1 echo Checking ssh for each node NODES cat SHOST_NODEFILE for host in NODES do echo Checking Shost ssh Shost hostname done echo WKAKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KKK KK KK RA AAA KK KK RAR RA AA AKK KK K AAA echo Executing SEXE with mpirun chmod 755 SEXE mpirun np CPU_NEEDED machinefile HOST_NODEFILE pwd SEXE In
81. N 4 2 REGISTERING WITH WLCG EGEE 4 2 1 The Registration Service Before a user can use WLCG EGEE registration of some personal data and acceptance of some usage rules are necessary In the process the user must also choose a Virtual Organisation VO The VO must ensure that all its members have provided the necessary information to the VO s database and have accepted the usage rules The procedure through which this is accomplished may vary from VO to VO pointers to all the VOs in WLCG EGEE can be found at http cic in2p3 fr As an example of registration service we describe here the example of the LCG Registrar which serves the VOs of the LHC experiments For detailed information please visit the following URL http Icg registrar cern ch The registration procedure normally requires to use a web browser with the user certificate loaded for the request to be properly authenticated Browsers including Internet Explorer and Mozilla use the PKCS12 certificate format if the certificate was issued to a user in PEM format it has to be converted to PKCS12 The following command can be used to perform that conversion openssl pkcs12 export inkey userkey pem in usercert pem CERN LCG GDEIS 722398 Manuals Series Page 31 out my_cert pl2 name My certificate where userkey pem is the path to the private key file usercert pem is the path to the PEM certificate file my_cert pl2 is the path for the output P
82. Only one Requirements attribute can be specified if there are more than one only the last one is con sidered If several conditions must be applied to the job then they all must be included in a single Requirements attribute using a boolean expression For example let us suppose that the user wants to run on a CE using PBS as batch system and whose WNs have at least two CPUs He will write then in the job description file Requirements other GlueCEInfoLRMSType PBS amp amp other GlueCEInfoTotalCPUs gt 1 The WMS can be also asked to send a job to a particular CE with the following expression Requirements other GlueCEUniqueID lxshare0286 cern ch 2119 jobmanager pbs short Note As explained in normally the condition that a CE is in production state is automatically added to the requirements clause Thus CEs that do not correctly publish this will not match This condition is nevertheless configurable If the job must run on a CE where a particular experiment software is installed and this information is published by the CE something like the following must be written Requirements Member CMSIM 133 other GlueHostApplicationSoftwareRunTimeEnvironment Note The Member operator is used to test if its first argument a scalar value is a member of its second argument a list In this example the GlueHostApplicationSoftwareRunTimeEnvironment attribute is a list Example 6 2 3 Specifying requirements using wildc
83. RunningJobs 0 GlueCEPolicyMaxTotalJobs 0 GlueCEPolicyMaxWallClockTime 4320 GlueCEPolicyPriority 1 GlueCEPolicyAssignedJobSlots 0 GlueCEAccessControlBaseRule VO atlas GlueForeignKey GlueClusterUniqueID 1xb2006 cern ch GlueInformationServiceURL ldap 1xb2006 cern ch 2135 mds vo name local o gri CERN LCG GDEIS 722398 Manuals Series Page 50 d GlueSchemaVersionMajor 1 GlueSchemaVersionMinor 2 1xb2006 cern ch 2119 jobmanager lcgpbs alice local grid dn GlueCEUniqueID 1xb2006 cern ch 2119 jobmanager lcgpbs alice mds vo name lo cal o grid objectClass GlueCETop objectClass GlueCE objectClass GlueSchemaVersion objectClass GlueCEAccessControlBase objectClass GlueCEInfo objectClass GlueCEPolicy objectClass GlueCEState objectClass GlueInformationService objectClass GlueKey GlueCEHostingCluster 1xb2006 cern ch GlueCEName alice GlueCEUniqueID 1xb2006 cern ch 2119 3jobmanager 1cgpbs alice GlueCEInfoGatekeeperPort 2119 GlueCEInfoHostName 1xb2006 cern ch GlueCEInfoLRMSType torque GlueCEInfoLRMSVersion torque_1 0 1p5 GlueCEInfoTotalCPUs 2 GlueCEInfoJobManager lcgpbs GlueCEInfoContactString 1xb2006 cern ch 2119 jobmanager 1cgpbs alice GlueCEInfoApplicationDir opt exp_soft n GlueCEInfoDataDir unset GlueCEInfoDefaultSE 1xb2058 cern ch GlueCEStateEstimatedResponseTime 2146660842 GlueCEStateFreeCPUs 2 GlueCEStateRunningJobs 0 GlueCEStateStatus Production
84. SQL query CERN LCG GDEIS 722398 Manuals Series Page 23 for a table and the registry selects the best producers to answer the query through a process called mediation The consumer then contacts each producer directly combines the information and returns a set of tuples The details of this process are hidden from the user who just receives the tuples in response to a query An R GMA system is defined by the registry and the schema what information will be seen by a consumer depends on what producers are registered with the registry There is only one registry and one schema in the WLCG EGEE Grid Secondary Producer Primary Producer P2 On demand Producer Table 1 Producer P1 details Table 2 Producer P1 details Table 2 Producer P2 details Table 1 Column defs Table 2 Column defs Table 2 Producer P3 details A Table 3 Producer P2 details Table 3 Producer P4 details Regist Consumer da Table 3 Column defs Figure 4 The virtual database of R GMA There are two types of producers primary producers which publish information coming from a user or an In formation Provider and secondary producers which consume and republish information from primary producers and normally store it in a real database Producers can also be classified depending on the type of queries accepted e Continuous or stream information is sent directly to consumers as it is produced e Latest only the latest i
85. Ses CERN LCG GDEIS 722398 Manuals Series Page 21 ss vs SE ed mo O site information information between _ this and other sites r Status supported file statistics storage elerrents that are protocols close not necessarily at the sarre site Figure 1 The Directory Information Tree DIT Figure 2 The MDS Information Service in WLCG EGEE CERN LCG GDEIS 722398 Manuals Series Page 22 The top level BDII obtains information about the sites in the Grid from the Grid Operations Centre GOC database where site managers can insert the contact address of their GIIS as well as other useful information about the site R GMA R GMA is an implementation of the Grid Monitoring Architecture GMA proposed by the Global Grid Forum GGF 22 In R GMA information is in many ways presented as though it were in a global distributed relational database although there are some differences for example a table may have multiple rows with the same primary key This model is more powerful than the LDAP based one since relational databases support more advanced query operations It is also much easier to modify the schema in R GMA making it more suitable for user information The architecture consists of three major components Figure B e The Producers which provide the information register themselves with the Registry and describe the type and structure of the information they provide e The Consumers wh
86. T of the site BDII grouping all the information of the Grid resources in that site The information of the different site BDIIs is compiled in a global BDII as described in Section 5 1 5 CERN LCG GDEIS 722398 Manuals Series Page 161 Unique ID string Namestring Descriptionstring Use rSupportContact string SysAdminContactstring SecurityCo ntact string Location string Latitude reals2 Longitude real32 Web uri Spo nso nstring 0OtherInto strina StorageElement U nique lD string Name string TypeserviceType_1 Version string Endpointuri Status sen ice Status_t Status nfo string WSDL uri Semantics uri StartTime dateTime_xs_1 Ow ner string ServiceData Keyostring Vale string Created with Poseidon for UML Community Edition Not for Commercial Use Figure 15 DIT for the core information CERN LCG GDEIS 722398 Manuals Series Page 162 asn PSUILIOD 10 JOH UOHIPA AUN WUWOD IAN 104 LOPISSOY Y YA payaso ZEUL SVE OL ZUUL SHEW OL TULIS NSUN Buy sadi wy ey suns uyany ZOULeZIS ENLACE ZEUSZ YH LOWS LEY UBS 1pog J I PUNOqUINQUaE PONEN UBS og J PUNOqU Uae ONO l Buus vod Lose gauyous d Buias e Su0Lon4sulsossa0q014 Buys paads20 20653204 bus uo Bue Asosse00 4 Buys 13p0V1 0553204 Buus opua aosa 2014 LPuus pues vou eu UL UNA e MOS LOLE DIK Buus uos ea ass bunejado Buu sesrarsy ass buyesado Buys swen ways is buesado Bu
87. TE 3 0 0 0 ec ccc cee een n nee e ene neas 95 7 4 1 LEC Commands 2 0 ee 96 7 5 FILE TRANSFER SERVICE o iieis rererere rren 100 7 5 1 Basic Concepts ee ee ee e ee eee a E a 101 EPS ws i ky a dots Soh hoe ee OG a ed a kde SO ee oe eR 101 7 5 3 FIS Commands ee ee a a 102 7 6 FILE AND REPLICA MANAGEMENT CLIENT TOOLS ioio iioii eo 104 7 6 1 LCG Data Management Client Tools o ooo o 105 7 6 2 Low Level Data Management Tools GSIFTP 112 E a Re ce ee 114 7 1 JOB SERVICES AND DATA MANAGEMENT 0 000 cece cece eee cece eee eeeeaes 118 18 RCGUSGING EIEN ON CLM sis 2osinaaies bbaipaneaeninicinsesaedaniesupmaape 121 7 8 1 VOMS interaction 0 0 0 a eee ee 121 7 8 2 Posix Access Control Lists 0 0 0 0 0 20 000 00000048 122 7 8 3 Protection of files 0 0 0 000000 0000000000048 122 79 POOL AND LCG 22 wii teiciw ies fanbase ara adhe 122 A THE GRID MIDDLEWARE e oooccccccnnnncnnnnnnnononononononnnnnnnnninnnnnns 124 B CONFIGURATION FILES AND VARIABLES ccccccccccccccccccccccvcecces 126 CERN LCG GDEIS 722398 Manuals Series Page 6 i D USER TOOL ee a D 1 D 6 INTRODUCIR ii 130 JOB MANAGEMENT FRAMEWORK 0 0000 cece cece cece eee ee eee eee nese eens 130 JOB MONITORING LCG JOB MONITOR 000 eee e eee e eee ee eee teen ee enes 130 JOB STATUS MONITORING LCG JOB STATUS
88. UI has no more control over the launched listener process that has hence to be killed by the user through the returned process id when the job is finished Example 6 3 4 2 Interacting with the job through a bash script A simple script dialog sh to interact with the job is presented in this section It is assumed that the nolisten option was used when submitting the job The function of the script is to get the information sent by the interactive job present it to the user and send the user s response back to the job As arguments the script accepts the names of the three pipes input output and error that the job will use CERN LCG GDEIS 722398 Manuals Series Page 81 Interactive Job Console Jobld https pceis01 cern ch 9000 8iYRpIneMRPtr dPzWeBOg Standard Output Welcome Please tell me your name Antonio That is all Antonio Bye bye ARA AA RARA RARA AAA AAA RARA RARA INTERACTIVE JOB FINISHED KEEEKEEEKEEEEKEEEEEEEEEEEEEEEEEEEEEE Standard Error Sending standard input fo Figure 10 X window for an interactive job and the process id pid of the listener process All this information is returned when submitting the job as can be seen in the returned answer for the submission of the same interactive jdl and interactive sh used before edg job submit nolisten interactive jdl Selected Virtual Organisation name from UI conf file dteam Connecting to host
89. WWOD JAN 104 UOplasod YIM payeaig JULSU y ssa23y WEHOJ Fuu s 4unosspatiodd n5 Bus Ayjiqedeg uus uolsua A Y JOJyssaoor adA unquodpu3 Buus q 2907 JOD0 O1gssac0y Bums Ayjiqedegt Buus uos18 A y 1019 oquos2d un 4ujodpug Buus qIe907 1010901 dj QUOD fie1qn 381015 a ooo 558 908 Bus 100y y adAlys adAl Buyys yyegt Bus g ye207 bebeueW Buyspeoqouaungaes TELLO d Y YYSAS aunjosyyoyt Zelul eauyezist eur e o 18zIS uns J8n2 a 112SU 018 wou Buys gjienbiup juaula 39beJ01Ss J6us a ny 358 9 01140955920Yy Zelur aoedsalqeyreayt zgwuavedgspasn Sulis su 19M 3 14 ZEW VONBINQUIJXE CEU SS I4 WN Xe jyt CEU Be QxXeyyt CEI BZISS I4 XB CEI SZISSI 4UII Zelurejono Figure 18 DIT for the storage resources Page 165 Manuals Series CERN LCG GDEIS 722398 StorageLibrary UniquelD string Name string info rmatio nServiceU AL uri Architecture Type string Performance MAX lOCapacity inB2 FileSystem Name string Rootstring Size int32 Available Space int32 ReadO nlybookan Type string Name string Size int Creation Date datetime LastiModified datetime LastAccessed datetime Latency int Life Time datetime Ow ner string RemoteFileSystem LocalFileSystem Created with Poseidon for UML Community Edition Not for Commercial Use Figure
90. a edu tw 58460000 3160000 n a epbf004 ph bham ac uk CERN LCG GDEIS 722398 Manuals Series Page 45 Example 5 1 1 3 The option closeSE will give an output as follows leg in w ne or ne o w o ne O D ne o w ne o ne OL om om ne o ne ot w o ne ot o ne otr w ne o ne o w w ne o ne or o ct ct CE ct ct ct EE sat ct ct ch cet ck et cr oct ct oct od fosites vo dteam closeSE he CE 1xb2039 cern ch 2119 blah pb he close SE 1xb2058 cern ch Listing the close Storage Elements s dteam he CE grid06 ph gla ac uk 2119 jobmanager lcgpbs dteam he close SE grid08 ph gla ac uk he CE 1xb2090 cern ch 2119 blah 1s he close SE 1xb2058 cern ch Q he CE cert ce 03 cnaf infn it 2119 he close SE cert se 01 cnaf infn he CE prep ce 01 pd infn it 2119 b f grid_tests blah 1sf pps LE lah 1sf cert he close SE prod se 01 pd infn it he CE epbf005 ph bham ac uk 2119 3 obmanager 1cgpbs dteam he close SE epbf004 ph bham ac uk he CE imalaydee hep ph ic ac uk 21 he close SE gw38 hep ph ic ac uk he CE tb023 grid sinica edu tw 211 19 jobmanager 1cgpbs dteam 9 blah pbs dteam he close SE dpm01 grid sinica edu tw he CE tb009 grid sinica edu tw 211 9 jobmanager 1cgpbs dteam he close SE dpm01 grid sinica edu tw he close SE castor grid sinica edu tw Example 5 1 1 4 Listing
91. a machine where the interface is installed the user can see them just by typing FC lt tab gt or FC lt Ctr1 D gt if that does not work CERN LCG GDEIS 722398 Manuals Series Page 123 APPENDIX A THE GRID MIDDLEWARE The operating systems supported by LCG 2 are Red Hat 7 3 and Scientific Linux 3 while the supported architec tures are IA32 and IA64 The LCG 2 middleware layer uses components from several Grid projects including DataTag EDT DataGrid EDG EGEE INFN GRID Globus and Condor In some cases LCG patches are applied to the components so the final software used is not exactly the same as the one distributed by the original project The components which are currently used in LCG 2 are listed in table I CERN LCG GDEIS 722398 Manuals Series Page 124 i Component LCG EGEE EDG EDT INFN GRID Globus Condor Other Basic middleware Globus 2 4 3 V ClassAds 0 9 4 J Security MyProxy y VO management LDAP based y VOMS y y y Workload management Condor Condor G 6 6 5 J EDG WMS y J Data management Replica Manager J J Replica Location Service J J LCG File Catalog y V Disk Pool Manager y GFAL J LCG DM tools J Fabric management LCFG J J J Quattor y y YAIM y LCAS LCMAPS J Monitoring GridICE y Information system MDS J Glue Schema J J BDI y R GMA y y LCG Information tools y Table 1 Software
92. ad of the site network the number of jobs being run or waiting to be run and the amount of total and available storage space in the site If a particular site is selected then several informations regarding each one of the services present in each of the nodes of the site are shown The nodes are classified as Resource Brokers CE access nodes or SE access nodes There are also other types of views Geo Gris and VO views The Geo view presents a geographical repre sentation of the Grid The Gris view shows current and historical information about the status on or off of every node Finally the VO view holds the same information that the site view but here nodes are classified in a per VO basis The user can specify a VO name and get the data about all the nodes that support it CERN LCG GDEIS 722398 Manuals Series Page 64 Finally the job monitoring section of GridICE provides figures about the number of jobs of each VO that are running or are queued in each Grid site CERN LCG GDEIS 722398 Manuals Series Page 65 6 WORKLOAD MANAGEMENT 6 1 INTRODUCTION The Workload Management WMS is the gLite 3 0 component that allows users to submit jobs and performs all tasks required to execute them without exposing the user to the complexity of the Grid It is the responsibility of the user to describe his jobs and their requirements and to retrieve the output when the jobs are finished In the WLCG EGEE Grid two differen
93. all the resources suitable to execute a given job submit jobs for execution cancel jobs retrieve the output of finished jobs e show the status of submitted jobs retrieve the logging and bookkeeping information of jobs copy replicate and delete files from the Grid retrieve the status of different resources from the Information System In addition the WLCG EGEE APIs are also available on the UI to allow development of Grid enabled applications 3 2 3 Computing Element A Computing Element CE in Grid terminology is some set of computing resources localized at a site i e a cluster a computing farm A CE includes a Grid Gate GG which acts as a generic interface to the cluster a Local Resource Management System LRMS sometimes called batch system and the cluster itself a collection of Worker Nodes WNs the nodes where the jobs are run There are two GG implementations in gLite 3 0 the LCG CE developed by EDG and used in LCG 2 and the gLite CE developed by EGEE Sites can choose what to install and some of them provide both types The GG is responsible for accepting jobs and dispatching them for execution on the WNs via the LRMS In gLite 3 0 the supported LRMS types are OpenPBS LSF Maui Torque BQS and Condor with work underway to support Sun GridEngine lFor Globus based CEs it is called Gatekeeper 21 CG 2 is the former middleware stack used by WLCG EGEE CERN LCG GDEIS 722398 Manua
94. an be always got from http goc grid sinica edu tw gocwiki Information System_Reader CERN LCG GDEIS 722398 Manuals Series Page 132 APPENDIX E VO wWIDE UTILITIES E 1 INTRODUCTION This section introduces some utilities that are only relevant to certain people in a VO VO managers experiment software managers They are basically administrative tools The purpose of this section is to introduce the functionality of some of them and to point to other sources of documentation where more detailed information about the utilities can be found Detailed information on the different tools here summarised is provided in Wiki under the following URL http goc grid sinica edu tw gocwiki VoDocs E 2 FREEDOM OF CHOICE FOR RESOURCES The Freedom of Choice for Resources FCR Pages is a web interface for VO Software Managers The tool gives a possibility to set up selection rules for Computing and Storage Elements which will affect the information published by top level BDIIs configured accordingly Resources can be permanently in or excluded or be used in case if the execution of the Sites Functional Tests SFT was successful There is also a possibility to use VO specific test results to be taken in account on the top of the SFT results Certificate based authentication method protects the pages Access has to be requested Information on this can be found under http goc grid sinica edu tw gocwiki FCR_Pages E 3 T
95. ansfer l c2e2cdb1 a145 11da 954d 944 2354a08b Pending Source srm srm grid sara nl 8443 srm managervl SFN pnfs grid sara nl data lhcb test roberto Destination srm sc cr cnaf infn it 8443 srm managerv1l SFN castor cnaf infn it grid lcg lhcb test State Pending Retries 0 Reason null Duration 0 Attention The verbosity level of the status of a given job is set with the v option the individual transfer status is however available through the option 1 Example 7 5 3 3 Listing ongoing data transfers The following example allows for querying all ongoing data transfers with specified intermediate state in a defined FTS service The user willing to know ongoing transfer jobs on a channel has to specify instead the channel with c option Sglite transfer list s s https fts sc cr cnaf infn it 8443 A sc3infn glite data transfer fts services FileTransfer Pending grep c2e2cdb1 al45 11da 954d 944f2354a08b c2e2cdb1 al45 1lda 954d 944 2354a08b Pending Example 7 5 3 4 Canceling a job The use of the FTS user command for canceling a data transfer job previously submitted is shown there glite transfer cancel s https fts sc cr cnaf infn it 8443 sc3infn glite data transfer fts services FileTransfer c2e2cdb1 a145 11da 954d 944f2354a08b 7 6 FILE AND REPLICA MANAGEMENT CLIENT TOOLS While FTS is intended for transferring large amount of data e g production data in a reliable and asynchronous way the average grid us
96. ansfer channel signal Allows to change the status of all transfers of a given job or of a given channel Man pages are available for all these commands Example 7 5 3 1 Submitting a job to FTS Once a user has successfully registered his proxy to a MyProxy server he has to submit to FTS a transfer job It can do it either by specifying the source destination pair in the command line CERN LCG GDEIS 722398 Manuals Series Page 102 Sglite transfer submit m myproxy fts cern ch s https fts sc cr cnaf infn it 8443 sc3infn glite data transfer fts services FileTransfer srm srm grid sara nl 8443 srm managervl SFN pnfs grid sara nl data lhcb test roberto zz_zz f A srm sc cr cnaf infn it 8443 srm managervl SFN castor cnaf infn it grid lcg lhcb test roberto SARA_1 25354 Enter MyProxy password Enter MyProxy password again c2e2cdb1 a145 11da 954d 944f2354a08b or by specifying all source destination pairs in an input file bulk submission The m option specifies the my proxy server to use the s option specifies the FTS service endpoint to be contacted If the service starts with http https or httpg it is taken as a direct service endpoint URL otherwise is taken as a service instance name and Service Discovery is invoked to look up the endpoints If not specified the first available transfer service from the Service Discovery utility will be used This is true for all subsequent examples Sglite transfer
97. ards It is also possible to use regular expressions when expressing a requirement Let us suppose for example that the user wants all his jobs to run on any CE in the domain cern ch This can be achieved putting in the JDL file the following expression Requirements RegExp cern ch other GlueCEUniqueld The opposite can be required by using Requirements RegExp cern ch other GlueCEUniqueld CERN LCG GDEIS 722398 Manuals Series Page 69 Example 6 2 4 Specifying requirements on a close SE In order to specify requirements on the SE close to the CE where the job will run the RB uses a special match making mechanism called gang matching 34 For example to ensure that the job runs on a CE with at least 200 MB of free disk space on a close SE the following JDL expression can be used Requirements anyMatch other storage CloseSEs target GlueSAStateAvailableSpace gt 204800 Example 6 2 5 A complex requirement used in gLite 3 0 The following example has been actually used by the Alice experiment in order to find a CE that has some software packages installed VO alice AliEn and VO alice ALICE v4 01 Rev 01 and that allows the job to run for more than 86 000 seconds i e so that the job is not aborted before it has time to finish Requirements other GlueHostNetworkAdapterOutboundIP true amp amp Member VO alice AliEn other GlueHostApplicationSoftwareRunTimeEnvironment amp amp Member VO
98. as been submitted to the system it is assigned a transfer channel based on the files that it containsand accordingly the VO the site and the network provider policies Attention FTS only deals with physical files and does not provide any facility for dealing with logical files or collection like entities This means that a user cannot expect FTS understands a task like Put this LFN to RAL and registers the new replica on LFC which is rather supposed to be a task for an higher level service File Placement Service Intelligent job submission and reorganization for an optimal use of the network is targeted for higher level service as well Each VO currently provides its own FPS like service 7 5 2 States The possible states a job can assume are e Submitted The job has been submitted to FTS but not yet assigned to a channel e Pending The job has been assigned to a channel and its files waiting for being transferred e Active The transfer for some of the job s files is ongoing CERN LCG GDEIS 722398 Manuals Series Page 101 e Canceling The job is going to be canceled e Done All files in a job get transferred successfully e Failed Some transfers of the files in a job are failed e Canceled The job has been canceled e Hold Job in this state require manual interventions because the state for some of the files cannot be resolved automatically they might have been retried many times before this state The job s files can ass
99. as created a file which has been registered into the XML catalog of POOL Now the point is how to register this file into the LCG catalog the RLS In the first place it is necessary to obtain the connection to the RLS catalog A contact string has to be specified through the environment variable POOL_CATALOG as follows export POOL_CATALOG edgcatalog_http lt host gt lt port gt lt path gt bash shell setenv POOL_CATALOG edgcatalog_http lt host gt lt port gt lt path gt csh shell For example into LCG 2 this environment variable may be set to export POOL_CATALOG edgcatalog_http rlscert01 cern ch 7777 V0 v2 2 edg local replica catalog services edg local replica catalog In the case that the user has specified the file as a SURL into POOL he can assign it an LFN with POOL as follows FCregisterLFN p lt SURL gt 1 lt LFN gt Now the user can make some test to check whether the file is into the LRC with the RLS client edg 1rc mappingsByPfn lt SURL gt endpoint lt LRC gt Or into the RMC edg rmc mappingsByAlias lt LFN gt endpoint lt RMC gt Finally he can check if the Data Management tools are able to find the file leg lr vo lt VO gt lfn lt LFN gt Note that in case that the POOL user has defined the SURL entry following a ROOT format he must use the command FCrenamePFN to create a SURL entry compatible with the RLS catalog A complete list of POOL commands can be found into 42 And in
100. at said there might be some use cases of the command being the most important one the deletion of directories by using the r option This action should be of course performed only in rare occasions and probably by only special users within a VO with administrative privileges In the next example the directory trash that has been previously created is removed lfc 1s 1 d grid dteam MyExample trash gt drwxr xrwx 0 dteam004 cg 0 Jul 06 11 13 grid dteam MyExample trash lfc rm grid dteam MyExample trash gt grid dteam MyExample trash Is a directory lfc 1s 1 d grid dteam MyExample trash gt grid dteam MyExample trash No such file or directory 7 5 FILE TRANSFER SERVICE FTS File Transfer Service is the lowest level data movement service defined in the gLITE architecture and in tegrated within the gLite 3 0 It s responsible for moving in a reliable way sets of files from one site to another CERN LCG GDEIS 722398 Manuals Series Page 100 allowing participant sites to control the network and disk resources usage FTS is designed for moving physical files point to point Together with the File Catalog the data movement protocol GSIFTP and the Storage Element Service it represents the forth pillar that completes the Data Management System picture on LCG gLite 7 5 1 Basic Concepts In order to understand the terminology currently used by the FTS community the following concepts are defined e Job A tra
101. ata of the Virtual Organization This can be realized in two different ways a An ACL entry is opportunely added to every file or directory to give the role Ihcbodataadmin all permis sions on them b The role Ihcbdataadmin can be recognized by all LCG Data Management services as being a special role to get all permissions on files or directories owned by the VO without using ACLs 7 9 POOL AND LCG 2 The Pool Of persistent Objects for LHC POOL tool is used by most of the LHC experiments as a common persistency framework for the LCG application area It is through POOL that they store their data Objects created by users using POOL are stored into its own File Catalog XML Catalog In order to be able to operate in the Grid it is certainly very important for these experiments to have an interaction between the POOL catalog and the LCG 2 file catalogs Currently there is a satisfactory interoperability between the POOL catalogs and the RLS catalogs That is there is a way to migrate POOL catalog entries to the RLS catalogs i e register those files within LCG 2 and CERN LCG GDEIS 722398 Manuals Series Page 122 also the LCG 2 Data Management tools can access those files as with any other Grid file A way to achieve the same kind of interoperability between POOL and the LFC has not been implemented yet Example 7 9 1 Migration from POOL XML to LCG RLS We assume that the user has used POOL and as result h
102. atic job submission e get_output It retrieves and handles the corresponding outputs Information on this tool can be found under http goc grid sinica edu tw gocwiki Job_Management_Framework D 3 JOB MONITORING LCG JOB MONITOR The lcg job monitor command can be used to monitor the process of a job currently running in a WN The command is intended to be run on a UI This tool gives information about all statistics for a given jobld e g memory virtual memory real memory CPU time DN etc The information is retrieved by querying the JobMonitor table published via R GMA The command can CERN LCG GDEIS 722398 Manuals Series Page 130 return information either for a single job given the job id for a user specifying the DN or for a whole VO by name Standard R GMA type of queries are supported LATEST HISTORY CONTINUOUS Usage is lcg job monitor j lt jobid gt v lt VO gt u lt DN gt q lt query_type gt Information on this tool can be found under http goc grid sinica edu tw gocwiki Job_Monitoring D 4 JOB STATUS MONITORING LCG JOB STATUS This tool provides information about the status of a running job It is intended to be run on a UI The information is retrieved by querying the JobStatusRaw table published via R GMA The command returns information for a specified job given the job id Usage lcg job status py j lt jobid gt q lt type gt where the query
103. by the actual certificate when received from the CA This should be readable by everyone i e chmod 444 usercert pem Then the userreq penm file has to be sent usually by e mail to the desired CA 4 1 3 Getting the Certificate After a request is generated and sent to a CA the CA will have to confirm that the user asking for a certificate is who he claims he is This usually involves to physically meet or to have a phone call with a Registration Authority somebody delegated by the CA to verify the legitimacy of a request and in case approve it After approval the certificate is generated and delivered to the user This can be done via e mail or by giving instructions to the user to download it from a web page If the certificate was directly installed in the user s browser then it must be exported saved to disk for Grid use Details of how to do this will depend on supported browser versions and are described on the CA s website The received certificate will usually be in one of two formats PEM extension pem or PKCS12 format extension p12 This last one is the most common one for certificates installed in a browser but it is the other one the PEM format which must be used in WLCG EGEE The certificates can be converted from one format to the other If the certificate is in PKCS12 format then it can be converted to PEM using the openss1 command openssl pkcs12 nocerts in my_cert p12 out userkey pem openssl pkcs
104. components of LCG 2 and projects that contributed to them CERN LCG GDEIS 722398 Manuals Series Page 125 APPENDIX B CONFIGURATION FILES AND VARIABLES Some of the configuration files and environmental variables that may be of interest for the Grid user are listed in the following tables Unless explicitly stated they are all located defined in the User Interface Environmental variables Variable Notes SEDG_LOCATION Base of the installed EDG software SEDG_TMP Temporary directory SEDG_WL_JOBID Job id defined for a running job In a WN SEDG_WL_LIBRARY_PATH Library path for EDG s WMS commands SEDG_WL_LOCATION Base of the EDG s WMS software SEDG_WL_PATH Path for EDG s WMS commands SEDG_WL_RB_BROKERINFO Location of the BrokerInfo file Ina WN SEDG_WL_UI_CONFIG_VAR May be used to specify a configuration different that SEDG_WL_LOCATION etc edg_wl_ui_cmd_var conf SEDG_WL_UI_CONFIG_VO May be used to specify a configuration different that SEDG_WL_LOCATION etc lt vo gt edg_wl_ui conf SLCG_CATALOG_TYPE Type of file catalog used edg or 1 c for lcg utils and GFAL LCG_GFAL_INFOSYS kota Location of the BDII for leg utils and GFAL SLCG_GFAL_VO May be used to tell lcg utils or GFAL about a user s VO To set in a UI or ship with a job s JDL LFC_HOST Location of the LFC catalog only for catal
105. contact the relevant site administrator if the situation is urgent as this may be faster than going through GGUS Information on the local site contacts can be obtained from the Information Service or from the GOC database 16 which is described in Chapter f 3 1 3 User and VO utilities This guide mainly covers information useful for the average user Thus only core WLCG EGEE mid dleware is described Nevertheless there are several tools which are not part of the middleware but may be very useful to users Some of these tools are summarised in Appendix D Likewise there are utilities that are only available to certain authorised users of the Grid An example is the administration of the resources viewed by a VO or the installation of VO software on WLCG EGEE nodes Authorised users can install software on the computing resources of WLCG EGEE CERN LCG GDEIS 722398 Manuals Series Page 17 i the installed software is also published in the Information Service so that users can select sites where the software they need is installed Information on such topics is given in Appendix E 3 2 THE WLCG EGEE ARCHITECTURE This section provides a quick overview of the WLCG EGEE architecture and services 3 2 1 Security As explained earlier the WLCG EGEE user community is grouped into Virtual Organisations Before WLCG EGEE resources can be used a user must read and agree to the WLCG EGEE usage rules and any further rules for
106. ctly as in glite job status and edg job status NOTE The output of a job will in principle be removed from the RB after a certain period of time How long this period is may vary depending on the administrator of the RB but the currently suggested time is 10 days so users should try always to retrieve their jobs within one week after job completion to have a safe margin CERN LCG GDEIS 722398 Manuals Series Page 77 Example 6 3 2 4 Retrieving logging information about submitted jobs The glite job logging info for gLite LB and edg job get logging info for LCG LB commands query the LB persistent database for logging information about previously submitted jobs The job s logging information is stored permanently by the LB service and can be retrieved also after the job has terminated its life cycle This is especially useful in the analysis of job failures The argument of this command is a list of one or more job identifiers The i and o options work as in the previous commands As an example for the gLite LB consider glite job logging info v 1 https cert rb 01 cnaf infn it 9000 55YfzeDigWeoHbpHxx1BQA kkkxkxkxkxkxkxkxkkkkxkxkxkkkkkkkkkkkkkkkkkkkkkkkkxkkkkkkkkkkkkkkkxkkkkkkkkkkkkkkxkxk LOGGING INFORMATION Printing info for the Job https cert rb 01 cnaf infn it 9000 55YfzeDigWeoHbpHxx1BQA Event RegJob source UserInterface timestamp on May 15 15 14 55 2006 CEST Event Transfer destination
107. d Create a proxy certificate Steps a to c need to be executed only once to have access to the Grid Step d needs to be executed the first time a request to the Grid is submitted It generates a proxy valid for a certain period of time usually 12 hours At the proxy expiration a new proxy must be created before the Grid services can be used again The following sections provide details on the prerequisites 4 1 OBTAINING A CERTIFICATE 4 1 1 X 509 Certificates The first requirement the user must fulfill is to be in possession of a valid X 509 certificate issued by a recognized Certification Authority CA The role of a CA is to guarantee that a user is who he claims to be and is entitled to own his certificate It is up to the user to discover which CA he should contact In general CAs are organized geographically and by research institute Each CA has 1ts own procedure to release certificates The following URL maintains an updated list of recognized CAs as well as detailed information on how to request certificates from a particular CA http Icg web cern ch LCG users registration certificate html An important property of a certificate is the subject or Distinguished Name DN a string containing infor mation about the user A typical example is O Grid O CERN OU cern ch CN John Doe 4 1 2 Requesting the Certificate Generally speaking obtaining a certificate involves creating a request to a CA The request is normally
108. ddleware stack of LCG 2 but that can be very useful for user activities on the Grid Certainly there are potentially tens of tools and it is impossible to cover them all in this guide Basically the purpose of this guide is to introduce the functionality of some of them and to point to other sources of documentation where more detailed information about the tools can be found This section will probably evolve to include information of new tools as they appear or we gain knowledge of them Detailed information on the different tools here summarised is provided in Wiki under the following URL http goc grid sinica edu tw gocwiki User_tools D 2 JOB MANAGEMENT FRAMEWORK The submission of large bunches of jobs to LCG resources is a common practice of the VOs during their production phases software implementation data production analysis etc The monitoring of these bunches of jobs can become a difficult task without adapted tools that are able to handle the submission and retrieval of the outputs Most of the VOs have developed their own tools to perform such job Here a framework to automatically submit large bunches of jobs and keep track of their outputs is proposed Its aim is to assist and guide the users or VOs with the intention of developing their own tools They could seize parts of this framework to include them in their own applications The framework consists mainly of two tools e submitter_general It perform the autom
109. ds They are site independant identifiers and do not require any administrator action A DN is mapped to a a virtual uid and a VOMS group or role to a virtual gid They are currently used by LFC and DPM Only te virtual uid and the primary virtual gid i e the ID mapped to the primary group as intended in chapter for VOMS are used to control access to the File Catalog entries or the file on the Storage Elements Unless changed with chown the files are owned by the DN of the user who created the file the group ownership depends on the S_ISGID value of the parent directory if it is set the file group ownership is the same as the parent one if not the primary group of the user is used The LCG Data Managements is also designed for supporting secondary groups Secondary groups are useful as explained by this example Let s suppose a given user dressing the production role needs to access some collective files whose group ownership is lhcb Let suppose these files are not world readable In this case if secondary groups are not supported and then the user is not seen from the Ihcb group unless not explicitely set by an accessing ACLs giving Ihcb Role production as supplementary group the user cannot access these files CERN LCG GDEIS 722398 Manuals Series Page 121 7 8 2 Posix Access Control Lists There are two different level of ACLs base and extended The former map directly to standard Unix permi
110. e lt lt argv 1 lt lt endl if fd gfal_open argv 1 O_WRONLY O_CREAT 0644 lt 0 perror gfal_open exit 1 cout lt lt T Open successful F Write into the file reading the 10 integers at once from the int array if rc gfal_write fd original INTBLOCK INTBLOCK if re lt 0 perror gfal_write else cerr lt lt gfal_write returns lt lt rc lt lt endl void gfal_close fd exit 1 CERN LCG GDEIS 722398 Manuals Series Page 139 cout lt lt Write successful Close the file if rc gfal_close fd lt 0 perror gfal_close exit 1 cout lt lt Close successful lt lt endl Reopen the file for reading cout lt lt nReading back lt lt argv 1 lt lt endl if fd gfal_open argv 1 O_RDONLY 0 lt 0 perror gfal_open exit 1 cout lt lt Open successful Read the file 40 bytes directly into the readValues array if rc gfal_read fd readValues INTBLOCK INTBLOCK if rc lt 0 perror gfal_read else cerr lt lt gfal_read returns lt lt rc lt lt endl void gfal_close fd exit 1 cout lt lt Read successful Show what has been read for int i 0 i lt 10 i cout lt lt n tValue of readValues lt lt i lt lt lt lt readValues 1 Close the file if rc gfal_close fd lt
111. e 4 5 14 The Site GUS BDT e 53 515 he top BDH p s csere a ic a e a e aa a a a e ee 55 32 JRGMAisuinessona tano rin anar 58 S21 REGMA CONCOPES i aii ia a a ee eg He a a a 58 5 2 2 The R GMA Browser e 59 5 2 3 The R GMA Cll 00 2002002020 220000 60 5 24 R GMA APIS 0 ee ek kk OR eR ee a ee E na 63 5 30 MON A A 63 ANA AAN 64 6 WORKLOAD MANAGEMENT sica A ARA 66 i user PORO PPP PI 08 O oan ctaest 66 6 2 JOB DESCRIPTION LANGUAGE 000000 isee coco n nono cnn nn n nono nnnnnnnnnnnos 66 6 3 THE COMMAND LINE INTERFACE ooococcocncncno cece ence ence ene neas 71 6 3 1 Job Submission 2 0 0 a 72 pa o Soares re a RR aon eA 75 a a Ae ect rd lak ek ey S 79 6 3 4 Interactive Jobs 2 e n ee 80 Bl God oh Hoe ae We ae Se a He He a se geo oe aa 84 6 3 6 MPI Jobs y pa ea aaa pa ee eee Ree OR eae ee wR Rw RG aw 85 a ao arty eee 89 7 DATA MANAGEMENT ssssceceececeeceseecceeeseesceeeaeeseeeeaeeetan 92 TA INTRODUCTION iisi iiien 92 CERN LCG GDEIS 722398 Manuals Series Page 5 7 2 STORAGE ELEMENTS iiio ccnn ennn 92 7 2 1 Data Channel Protocols 2 2 2 2 2 o e e 92 7 2 2 The Storage Resource Manager interface o ooo 93 7 2 3 Types of Storage Elements o e 0000000004 93 7 3 FILES NAMING CONVENTION IN GLITE 3 0 0 0 0 c ccc cece cece ee eee oo 94 7 4 FILE CATALOGS IN GLI
112. e output will be something like Certificate Data Version 3 0x2 Serial Number 5 0x5 Signature Algorithm md5WithRSAEncryption Issuer C CH O CERN OU cern ch CN CERN CA Validity Not Before Sep 11 11 37 57 2002 GMT Not After Nov 30 12 00 00 2003 GMT Subject O Grid O CERN OU cern ch CN John Doe Subject Public Key Info Public Key Algorithm rsaEncryption RSA Public Key 1024 bit Modulus 1024 bit 00 ab 8d 77 0f 56 d1 00 09 b1 c7 95 3e ee 5d c0 af 8d db 68 ed 5a c0 17 ea ef b8 2f e7 60 2d a3 55 e4 87 38 95 b3 4b 36 99 77 06 5d b5 de 8a ff cd da e7 34 cd Ta dd 2a f2 39 5f 4a 0a 7f f4 44 b6 a3 ef 2c 09 ed bd 65 56 70 e2 a7 0b c2 88 a3 6d ba b3 ce 42 3e a2 2d 25 08 92 b9 5b b2 df 55 f4 c3 f5 10 af 62 7d 82 f4 0c 63 0b d6 bb 16 42 9b 46 9d e2 fa 56 c4 f9 56 c8 0b 2d 98 f6 c8 0c db Exponent 65537 0x10001 CERN LCG GDEIS 722398 Manuals Series Page 33 X509v3 extensions etscape Base Url http home cern ch globus ca tscape Cert Type SSL Client S MIME Object Signing tscape Comment For DataGrid use only tscape Revocation Url http home cern ch globus ca bc870044 r0 tscape CA Policy Url http home cern ch globus ca CPS pdf Signature Algorithm md5WithRSAEncryption 30 a9 d7 82 ad 65 15 bc 36 52 12 66 33 95 b8 77 6f a6 52 87 51 03 15 6a 2b 78 7e f2 13 a8 66 b4 7f ea f6 31 aa 2e 6 90 31 9a e0 02 ab a8 93 0e 0a 9d db 3a 89 ff d3 e6 be 41 2e c8 bf 73 a3 ee 48 35 90 1f be 9a 3a b5 45 9d
113. eCEInfoLRMSVersion version of the local batch system GlueCEInfoGRAMVersion version of GRAM GlueCEInfoHostName fully qualified name of the host where the gatekeeper runs GlueCEInfoGateKeeperPort port number for the gatekeeper GlueCEInfoTotalCPUs number of CPUs in the cluster associated to the CE GlueCEInfoContactString contact string for the service GlueCEInfoJobManager job manager used by the gatekeeper GlueCEInfoApplicationDir path of the directory for application installation GlueCEI GlueCEInfoDefaultSE unique identifier of the default SE foDataDir path a shared the directory for application data e CE State objectclass GLueCEState GlueCEStateStatus queue status queueing jobs are accepted but not run production jobs are accepted and run closed jobs are neither accepted nor run draining jobs are not accepted but those in the queue are run CERN LCG GDEIS 722398 Manuals Series Page 153 GlueCEStateTotalJobs total number of jobs running waiting GlueCEStateRunningJobs number of running jobs GlueCEStateWaitingJobs number of jobs not running GlueCEStateWorstResponseTime worst possible time between the submission of a job and the start of its execution in seconds GlueCEStateEstimatedResponseTime estimated time between the submission of a job and the start of its execution in seconds GlueCEStateFreeCPUs number of CPUs available to the sched
114. ecure RFIO does and while the UID a user s job is mapped to will be allowed to access a file in a SE the user s UID in the Ul is different and will not be allowed to perform that access In opposition to the insecure RFIO the secure version also called gsirfio includes all the usual GSI security and so it can deal with certificates rather than with users UIDs For this reason it can be used with no problem to access files from UIs or in remote SEs Just as gsidcap can Attention Some SEs support only insecure RFIO classic SEs and CASTOR while others support only secure RFIO dpm but they all publish rfio as the supported protocol in the IS The result is that currently GFAL has to figure out which one of the two RFIO versions it uses basing on an environmental variable This variable is called LCG_RFIO_TYPE If its value is dpm the secure version of RFIO will be used if its value is castor or the variable it is undefined then insecure RFIO will be the one chosen Unfortunately an insecure RFIO client cannot talk to a secure server and viceversa Therefore the user must correctly define the indicated variable depending on the SE he wants to talk to before using GFAL calls Otherwise the calls will not work Another important issue is that of the names used to access files For classic SEs both the SURL and TURL names of the files must include a double slash between the hostname of the SE and the path of the file This is ne
115. ed in the schema Finally the DITs currently used in the IS for the publishing of these attributes are shown G 1 THE GLUE SCHEMA LDAP OBJECT CLASSES TREE Top _ GlueTop 1 3 6 1 4 1 8005 100 1 GlueGeneralTop 1 ObjectClass 1 GlueSchemaVersion 2 GlueCESEBindGroup 3 GlueCESEBind 4 GlueKey 5 GlueInformationService 6 GlueService 7 GlueServiceData 8 GlueSite 2 Attributes 1 Attributes for GlueSchemaVersion 8 Attributes for GlueSiteTop 2 GlueCETop CERN LCG GDEIS 722398 Manuals Series Page 148 ObjectClass 1 GlueCE 2 GlueCEInfo 3 GlueCEState 4 GlueCEPolicy 5 GlueCEAccessControlBase 6 GlueCEJob 7 GlueVOView 2 Attributes 1 Attributes for GlueCE 7 Attributes for GlueVOView 3 MyObjectClass 4 MyAttributes 3 GlueClusterTop ObjectClass 1 GlueCluster 2 GlueSubCluster 3 GlueHost 4 GlueHostArchitecture 5 GlueHostProcessor 6 GlueHostApplicationSoftware 7 GlueHostMainMemory 8 GlueHostBenchmark 9 GlueHostNetworkAdapter CERN LCG GDEIS 722398 Manuals Series Page 149 Pages A 10 GlueHostProcessorLoad 11 GlueHostSMPLoad 12 GlueHostOperatingSystem 13 GlueHostLocalFileSystem
116. ed in the previous section The g option allows to specify a GUID otherwise automatically created leg cr vo dteam d 1xb0710 cern ch g guid baddb707 0cb5 4d9a 8141 a046659d243b file pwd file2 gt guid baddb707 0cb5 4d9a 8141 a046659d243b Attention This option should not be used except for expert users and in very particular cases Because the specification of an existing GUID is also allowed a misuse of the tool may end up in a corrupted GRID file in which replicas of the same file are in fact different from each other CERN LCG GDEIS 722398 Manuals Series Page 107 Finally in this and other commands the n lt streams gt options can be used to specify the number of parallel streams to be used in the transfer default is one Known problem When multiple streams are requested the GridFTP protocol establishes that the GridFTP server must open a new connection back to the client the original connection and only one in the case of one stream is opened from the client to the server This may become a problem when a file is requested from a WN and this WN is firewalled to disable inbound connections which is usually the case The connection will in this case fail and the error message returned in the logging information of the job performing the data access will be 425 can t open data connection Example 7 6 1 2 Replicating a file Once a file is stored on an SE and registered within the catalog t
117. eded by GFAL for RFIO An example of correct SURL and TURL is sfn 1xb0710 cern ch flatfiles SE00 dteam my_file rfio 1xb0710 cern ch flatfiles SE00 dteam my_file CERN LCG GDEIS 722398 Manuals Series Page 141 These name requirements are imposed by the use of RFIO as access protocol As seen in previous examples the 1cg commands will work with SURLs and TURLs registered in the catalogs even if they do not follow this tules This does not happen with RFIO Therefore it is always better to use LFNs or GUIDs when dealing with files not to have to deal with SURL and TURL naming details IMPORTANT Nevertheless the entries in the RMC and LRC may contain SURLs which do not comply with the described rules As a result when GFAL uses the GUID or LFN to retrieve the SURL of the file it will get an incorrect one and the call will fail In those cases using the correct SURL which usually means doubling the slash after the hostname instead of the GUID or LEN is the only way to access the file Having this all in mind let us build a JDL file to create and read a Grid file with our C program Executable gfal_example std0utput std out StdError std err Arguments sfn 1xb0707 cern ch flatfiles SE00 dteam my_temp_file InputSandbox gfal_example OutputSandbox std out std err After submitting the job the output retrieved in std out is as follows Creating file sfn 1xb0707 cern ch flatfiles SE00 d
118. egotiated between the VO and the site Information on this can be found under http goc grid sinica edu tw gocwiki Experiments_Software_Installation CERN LCG GDEIS 722398 Manuals Series Page 134 APPENDIX F DATA MANAGEMENT AND FILE ACCESS THROUGH AN APPLI CATION PROGRAMMING INTERFACE The development of code for jobs submitted to LCG 2 is out of the scope of this guide and therefore the different APIs for Data Management and Grid File Access will not be covered in full detail This section just summarizes what APIs exist and gives some examples of Icg_util and GFAL use edg mc Hl Lec edg Irc MCLI api CLI API Figure 13 Layered view of the Data Management APIs and CLIs Figure 13 shows a layered view of the different APIs that are available for Data Management operations in LCG 2 In the figure the CLIs and APIs whose use is discouraged are shadowed It also includes the already described CLIs which can be usually related to one of the APIs as being in the same layer On the top just below the tools developed by the users we find the lcg_util API This is a C API that provides the same functionality as the 1cg commands Icg utils In fact the commands are just a wrapper around the C calls This layer should cover most basic needs of user applications It is abstract in the sense that it is independent from the underlying technology since it will transparently interact with either the RLS or the LFC catalog a
119. er has to deal with his own relatively small data for private analysis gLite 3 0 offers a variety of Data Management Client tools to upload download files to from the Grid replicate data and locate the best replica available and interact with the file catalogs Every user should deal with data management through the LCG Data Management tools usually referred to as Icg utils or 1cg commands They provide a high level CERN LCG GDEIS 722398 Manuals Series Page 104 interface both command line and APIs to the basic DM functionality hiding the complexities of catalog and SEs interaction Furthermore such high level tools ensure the consistency between Storage Elements and catalog in DM operations and try to minimize the risk of grid files corruption The same functionalities are exploited by the edg replica manager wrapper More details on this are given below Some lower level tools like edg gridftp commands globus url copy and srm dedicated commands are also available These low level tools are quite helpful in some particular cases see examples for more details Their usage however is strongly discouraged for non expert users since such tools do not ensure consistency between physical files in the SE and entries in the file catalog and their usage might result very dangerous 7 6 1 LCG Data Management Client Tools The LCG Data Management tools usually called Icg utils allow users to copy files between UI CE WN and a
120. erformance objectclass GLueSLPerformance Glues LPerformanceMaxIOCapacity maximum bandwidth between the service and the network e Storage Space objectclass GlueSA CERN LCG GDEIS 722398 Manuals Series Page 159 GlueSARoot pathname of the directory containing the files of the storage space GlueSALocalID local identifier GlueSAPath root path of the area GlueSAType guarantee on the lifetime for the storage area permanent durable volatile other GlueSAUniquelbD unique identifier e Policy objectclass GlueSAPolicy GlueSAPolicyMaxFileSize maximum file size GlueSAPolicyMinFileSize minimum file size GlueSAPolicyMaxData maximum allowed amount of data that a single job can store GlueSAPolicyMaxNumFiles maximum allowed number of files that a single job can store GlueSAPolicyMaxPinDuration maximum allowed lifetime for non permanent files GlueSAPolicyQuota total available space GlueSAPolicyFileLifeTime lifetime policy for the contained files e State objectclass GlueSAState GlueSAStateAvailableSpace total space available in the storage space in kilobytes GlueSAStateUsedSpace used space in the storage space in kilobytes e Access Control Base objectclass GlueSAAccessControlBase GlueSAAccessControlBase Rule list of the access control rules G 5 ATTRIBUTES FOR THE CE SE BINDING The CE SE binding schema represents a mean for advertising relationships betwee
121. ervice URI eSEName eSEPort human readable name for the service port number that the service listens eSEHost ingSL unique identifier of the storage library hosting the service eSESizeTotal the total size of the storage space managed by the service eSESizeFree the size of the storage capacity that is free for new areas for any VO user eSEArchitecture underlying architectural system category The attribute GlueSEType is deprecated from version 1 2 of the GLUE schema e Storage Service State objectclass GlueSEState GlueSEStateCurrentlOLoad system load for example number of files in the queue e Storage Service Access Protocol objectclass GlueSEAccessProtocol Glu Glu Glu Glu Glu Glu Glu Glu eSEAccessProtocolType protocol type to access or transfer files eSEAccessProtocolPort port number for the protocol eSEAccessProtocolVersion protocol version eSEAccessProtocolSupportedSecurit y security features supported by the protocol eSEAccessProtocolAccessTime time to access a file using this protocol eSEAccessProtocolLocallID local identifier eSEAccessProtocolEndpoint network endpoint for this protocol eSEAccessProtocolCapability function supported by this control protocol e Protocol details objectclass GlueSEControlProtocol GlueSEControlProtocolType protocol type e g srmv1 GlueSEControlProtocolVersion protocol version GlueSEControlProtocolLocalID local identifier
122. es If a string itself contains double quotes they must be escaped with a backslash e g Arguments hello 10 The character cannot be specified in the JDL Comments must be preceded by a sharp character or a double slash at the beginning if each line Multi line comments must be enclosed between and ATTENTION The JDL is sensitive to blank characters and tabs No blank characters or tabs should follow the semicolon at the end of a line CERN LCG GDEIS 722398 Manuals Series Page 66 Example 6 2 1 Define a simple Hello world job To define a job which runs the hostname command on the WN write a JDL like this Executable bin hostname Stdoutput std out StdError std err The Executable attribute specifies the command to be run by the job If the command is already present on the WN it will be expressed as a absolute path if it has been copied from the UI only the file name must be specified and the path of the command on the UI should be given in the Input Sandbox attribute see below For example Executable test sh InputSandbox home doe test sh Stdoutput std out StdError std err The Arguments can contain a string value which is taken as argument list for the executable Arguments fileA 10 In the Executable and in the Arguments attributes it may be necessary to use special characters such as amp gt lt
123. es e g GlueSite as the interface is reasonably intuitive The browser is read only so you can t do any damage CERN LCG GDEIS 722398 Manuals Series Page 59 File New Bockmarks Desktop Windows Halp rowser Home Page Konqueror BD Lean a p Acgico gridkp 1 ac uk eceo R GMA lindes html y Ti a LCG Deployment seanrs Home Page Bora Home Page E sien coriis Java cos Kit API B citas Gobus Toolkit API Reterence cennir cs Netwo Forms MAIN MENU a gt R GMA All tables Browser EDG Into Providers SELECT WET Network Monitoring Name CMS GlueClusterUniquelD TotalCPUs LAMSType FROM GlueCE GAMIAppStart GkRecords GlueBatchJob GlueBatchQueue ccessControlBase ule Dasmpiiom attans GlueCESEBind GlueCluster GlueHost GlueHostLocalFileSystem GlueHostNetworkAdapter GlueHostPoolAocount GlueHostProcess GlueHostRemoteFileSystem GlueHostRole GlueSA GlueSAAccessControlBaseRule Type of query C History Latest C Continuous Continuous amp old Queries wait for 5 seconds Use Mediator C Select producers you want to query GlueSE eCGee ed di TT httpyvlogmon psn ru 8080 A GMA LatestProducerServiet 2140072798 Enabling Grids alueservica M http logmon psn ru 8080 R GMA LatestProducerServiet 920785001 For E sciencE F http logrnond 1 gridpp rl ac uk 8080 R GMA LatestProducerServiet 1266528736 a z
124. ese reasons once the DPM is deployed it will replace the Classic SE Old Classic SEs will be converted to DPMs with only one disk in the pool 7 3 FILES NAMING CONVENTION IN GLITE 3 0 As an extension of what was introduced in Chapter the different types of names that can be used within the gLite 3 0 files catalogs are summarized below e The Grid Unique IDentifier GUID which identifies a file uniquely is of the form guid lt 40_bytes_unique_string gt guid 38ed3f 60 c402 11d7 a6b0 f53ee5a37eld e The Logical File Name LFN or User Alias which can be used to refer to a file in the place of the GUID and which should be the normal way for a user to refer to a file has this format CERN LCG GDEIS 722398 Manuals Series Page 94 lfn lt anything_you_want gt lfn importantResults Test1240 dat In case the LCG File Catalog is used see Section 7 4 the LFNs are organized in a hierarchical directory like structure and they will have the following format lfn grid lt MyVO gt lt MyDirs gt lt MyFile gt e The Storage URL SURL also known as Physical File Name PFN which identifies a replica in a SE is of the general form lt sfn srm gt lt SE_hostname gt lt some_string gt where the prefix will be s n for files located in SEs without SRM interface and srm for SRM managed SEs In the case of sfn prefix the string after the hostname is the path to the location of the file and can be decomposed i
125. et the timeout for queries or the maximum age of lt units gt tuples to return A simple example of how to query the R GMA virtual database follows Example 5 2 3 1 Querying the R GMA Information System Inside the interface you can easily perform any query using SQL syntax a gt set query continuous t query type to continuous a gt set timeout 120 seconds t timeout to 120 seconds a gt select UniquelD TotalCPUs from GlueCE o UniqueID TotalCPUs o hepgrid2 ph liv ac uk 2119 jobmanager lcgpbs atlas 498 hepgrid2 ph liv ac uk 2119 jobmanager lcgpbs dteam 498 hepgrid2 ph liv ac uk 2119 3jobmanager 1cgpbs 1hcb 498 hepgrid2 ph liv ac uk 2119 jobmanager lcgpbs babar 498 grid001 fi infn it 2119 jobmanager lcgpbs lhcb 68 grid001 fi infn it 2119 Jobmanager 1cgpbs cms 68 CERN LCG GDEIS 722398 Manuals Series Page 62 grid001 fi infn it 2119 jobmanager lcgpbs atlas 68 grid001 fi infn it 2119 jobmanager lcgpbs lhcb 68 grid001 fi infn it 2119 jobmanager lcgpbs cms 68 grid001 fi infn it 2119 jobmanager lcgpbs atlas 68 grid012 ct infn it 2119 jobmanager lcglsf alice 174 grid001 fi infn it 2119 jobmanager lcgpbs lhcb 68 grid001 fi infn it 2119 jobmanager lcgpbs cms 68 grid001 fi infn it 2119 jobmanager lcgpbs atlas 68 grid012 ct infn it 2119 jobmanager lcglsf infinite 174 hepgrid2 ph liv ac uk 2119 jobmanager lcgpbs atlas 498 hepgrid2 ph liv ac uk 2119
126. etect and notify fault situations contract violations and user defined events The GOC web page contains a whole section containing monitoring information for gLite 3 0 Apart from R GMA that was explained previously several different monitoring tools can be used including general purpose monitoring tools and Grid specific systems like GridICE 29 Also important are the web pages publishing the results of functional tests applied periodically to the all the sites registered within gLite 3 0 The results of this tests show if a site is responding correctly to standard Grid operations otherwise an investigation on the cause of the unexpected results is undertaken Some VOs may even CERN LCG GDEIS 722398 Manuals Series Page 63 decide to automatically exclude from their BDII the sites that are not passing the functional tests successfully so that they do not appear in the IS and are not considered for possible use by their applications Note Please do not report problems occurring with a site if this site is marked as ill in the test reports If that is the case the site is already aware of the problem and working to solve The results of some sets of functional sites can be checked in the following URLs https Icg sft cern ch sft l pps lastreport cgi https Icg sft cern ch sft pps lastreport cgi http goc grid sinica edu tw gstat In the following section as an example of monitoring system the GridICE service is reviewed
127. for Grid resource monitoring and discovery More details about the GLUE schema can be found in Appendix G Two IS systems are used in gLite 3 0 the Globus Monitoring and Discovery Service MDS 20 used for resource discovery and to publish the resource status and the Relational Grid Monitoring Architecture R GMA 21 used for accounting monitoring and publication of user level information MDS The Globus MDS implements the GLUE Schema using OpenLDAP an open source implementation of the Lightweight Directory Access Protocol LDAP a specialised database optimised for reading browsing and searching information Access to MDS data is insecure both for reading clients and users and for writing ser vices pubilishing information The LDAP information model is based on entries objects like a person a computer a server etc each with one or more attributes Each entry has a Distinguished Name DN that uniquely identifies it and each attribute has a type and one or more values A DN is formed by a sequence of attributes and values and based on their DNs entries can be arranged into a hierarchical tree like structure called a Directory Information Tree DIT Figure I schematically depicts the Directory Information Tree DIT of a site the root entry identifies the site and entries for site information CEs SEs and the network are defined in the second level Actual entries published in gLite 3 0 are shown in Appendix G
128. g lfc 1s 1 grid lhcb test_roberto MyTest MyLink lrwxrwxrwx 1 santinel z5 0 Feb 21 16 54 grid lhcb test_roberto MyTest MyLink gt grid lhcb test_roberto lfc_test_pic2 CERN LCG GDEIS 722398 Manuals Series Page 99 Remember that links created with 1fc 1n are soft If the LFN they are pointing to is removed the links themselves are not deleted but keep existing as broken links Example 7 4 1 4 Adding metadata information to LFC entries The 1fc setcomment and 1fc delcomment commands allow the user to associate a comment with a catalog entry and delete that comment respectively This is the only user defined metadata that can be associated with catalog entries The comments for the files may be listed using the comment option of the 1fc 1s command This is shown in the following example lfc setcomment grid dteam MyExample interesting filel Most promising measure lfc ls comment grid dteam MyExample interesting filel gt grid dteam MyExample interesting filel Most promising measure Example 7 4 1 5 Removing LFNs from the LFC As explained before the 1fc rm will only delete catalog entries but not physical files In principle the deletion of replicas 1cg de1 should be used instead When the last replica is removed the entry is also removed from the catalog Indeed the 1fc rm commands will not allow a user to remove an LEN for which there are still SURLs associated 1 e physical replicas exist Th
129. g to host egee rb 01 mi infn it port 7772 Logging to host egee rb 01 mi infn it port 9002 KKKKKKKKKKKKKKKKKKKKKKKKKK KKK KK KK KA KK KKK KK KARA RARA KKK KKK KKK KKK KKKKKKKKKKKKKKKKKKKKKKKKKKKK JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server Use glite job status command to check job current status Your job identifier is https egee rb 01 mi infn it 9000 LPHg_gAnR1P3XtWvQ1c700 kkkkxkxkxkxkxkxkxkkkxkkxkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkxkkkkkkkkkkkkxkkkkkkkkkkkkkkx k CERN LCG GDEIS 722398 Manuals Series Page 72 In case of failure an error message will be displayed instead and an exit status different from zero will be returned The command returns to the user the job identifier jobId which defines uniquely the job and can be used to perform further operations on the job like interrogating the system about its status or canceling it The format of the jobld is https Lbserver_address port unique_string where unique_string is guaranteed to be unique and Lbserver_address is the address of the Logging and Bookkeeping server for the job and usually but not necessarily is also the gLite WMS or LCG Resource Broker Note the jobId does NOT identify a web page If the command returns the following error Error API_NATIVE_ERROR Error while calling the NSClient multi native api AuthenticationException Failed to establish security context
130. ge jobs are described For completeness both the LCG and gLite CLI are described and more informations are given whenever the two differ considerably For a more detailed information on all these topics and on the different commands please refer to and 24 CERN LCG GDEIS 722398 Manuals Series Page 71 6 3 1 Job Submission To submit a job to the WLCG EGEE Grid the user must have a valid proxy certificate in the User Interface machine as described in Chapter 4 and use the following command edg job submit lt jdl_file gt for the LCG WMS glite job submit lt jdl_file gt for the gLite WMS where lt jdl_file gt is a file containing the job description usually with extension jdl Example 6 3 1 1 Submitting a simple job Create a file test jdl with these contents Executable bin hostname Stdoutput std out StdError std err OutputSandbox std out std err It describes a simple job that will execute bin hostname Standard output and error are redirected to the files std out and std err respectively which are then transferred back to the User Interface after the job is finished as they are in the Output Sandbox The job is submitted by issuing edg job submit test jdl for the LCG WMS glite job submit test jdl for the gLite WMS If the submission is successful the output is similar to Selected Virtual Organisation name from proxy certificate extension atlas Connectin
131. generated using either a web based interface or console commands Details of which type of request a particular CA accepts are described on each CA s website CERN LCG GDEIS 722398 Manuals Series Page 29 For a web based certificate request a form must be usually filled in with information such as name of the user institute etc After submission a pair of private and public keys are generated together with a request for the certificate containing the public key and the user data The request is then sent to the CA Note The user must usually install the CA certificate on him browser first This is because the CA has to sign the certificate using its own one and the user s browser must recognize it as a valid one For some CAs the certificate requests are generated using a command line interface The following discussion describes a common scenario for command line certificate application using a hypothetical grid cert request command Again details of the exact command and requirements of each CA will vary and can be found on the CA s website The grid cert request command would create for example the following 3 files userkey pem contains the private key associated with the certificate This should be set with permissions so that only the owner can read it i e chmod 400 userkey pem userreg pem contains the request for the user certificate essentially the publick key usercert pem a placeholder to be replaced
132. gt Global Grid User Support ttp www ggus org GOC Database 2 0 ttps goc grid support ac uk gridsite gocdb2 70 3 lt O lt la O o gt o O Q a a N O O t Y lt 5 gt o n a c Q oma l O ttp www unix globus org security overview htm CERN LCG GDEIS 722398 Manuals Series Page 9 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 LCG 2 User Guide https edms cern ch document 454439 The Storage Resource Manager http sdm lbl gov srm wg MDS 2 2 Features in the Globus Toolkit 2 2 Release http www globus org toolkit mds mds_gt2 R GMA Relational Grid Monitoring Architecture http www r gma org index html B Tierney et al A Grid Monitoring Architecture GGF 2001 revised 2002 http www didc lIbl gov GGF PERF GMA WG papers GWD GP 16 2 pdf S Campana M Litmaath A Sciab LCG 2 Middleware overview https edms cern ch document 498079 F Pacini WMS User s Guide https edms cern ch document 572489 WPI Workload Management Software Administrator and User Guide http www inin it workload grid docs DataGrid 01 TEN 01 18 1 _2 pdf CESNET LB Service User s Guide https edms cern ch document 571273 The GLUE schema http infnforge cnaf infn it glueinfomodel Using Ixplus as an LCG 2 User Interface http grid deployment web cern ch grid deplo
133. h an aborting message When the file is ready it is copied using 1cg_cp in the same way as seen in a previous example This or other application should then perform some operation on the file this is not shown here A possible output of this program is the following The status of the file is 0 srm castorsrm cern ch castor cern ch grid dteam testSRM test_1 Waiting for the file to be staged in HHH AEE HE EEE RE EEE READY srm castorsrm cern ch castor cern ch grid dteam testSRM test_1 Copying srm castorsrm cern ch castor cern ch grid dteam testSRM test_1 to file tmp srm_gfal_retrieved Source URL srm castorsrm cern ch castor cern ch grid dteam testSRM test_1 File size 2331 Source URL for copy gsiftp castorgrid cern ch 2811 shift 1xfs5614 data03 cg stage test_1 172962 Destination URL file tmp srm_gfal_retrieved streams 1 Transfer took 590 ms Where the 0 file status means that the file exists but it lays on the tape not staged yet the hash marks show the iterations in the looping and finally the READY indicates that the file has been staged in and it can be copied what it is done afterwards as shown by the normal verbose output If the same program was run again passing the same SURL as argument it would return almost immediately since the file has been already staged This is shown in the following output The status of the file is CERN LCG GDEIS 722398 Manuals Series Page 146
134. h the Information System or the SRM interface for SRM managed SEs The TURL therefore can change with time and should be considered only valid for a relatively small period of time after it has been obtained 7 4 FILE CATALOGS IN GLITE 3 0 Users and applications need to locate files or replicas on the Grid The File Catalog is a service which fulfills such requirement maintaining mappings between LFN s GUID and SURL s CERN LCG GDEIS 722398 Manuals Series Page 95 In gLite 3 0 two types of file catalogs are currently deployed the old Replica Location Server RLS and the new LCG File Catalog LFC Both of them are deployed as centralized catalogs The catalogs publish their endpoints service URL in the Information Service so that the LCG Data Manage ment tools and any other interested services the RB for example can find the way to them and then to the Grid files information Be aware that for the RLS there are two different endpoints one for LRC and one for RMC while for LFC being it a single catalog there is only one The user can decide which catalog to use by setting the environmental variable LCG_CATALOG_TYPE equal to edg for RLS or 1fc for the LFC Since gLite 3 0 the default file catalog is LFC Attention the RLS and LFC are not respectively mirrored Entries in the LFC will not appear in RLS and vice versa Choose to use any of the two catalogs but be consistent with your choice The RLS in fact it consi
135. he file can be replicated using the 1cg rep command as in lcg rep v vo dteam d 1xb0707 cern ch guid db7ddbc5 613e 423f 9501 3c0c00a0ae24 gt Source URL sfn 1xb0710 cern ch flatfiles SE00 dteam generated 2004 07 08 file0dcabb4 6 2214 4db8 9ee8 2930delabbef File size 30 Destination specified 1xb0707 cern ch Source URL for copy gsiftp 1xb0710 cern ch flatfiles SE00 dteam generated 2004 07 08 f ile0dcabb46 2214 4db8 9ee8 2930delabbef Destination URL for copy gsiftp 1xb0707 cern ch flatfiles SE00 dteam generated 2004 07 09 file50c0752c 61f 4bc3 b48e af3 22924b57 streams 1 Transfer took 2040 ms Destination URL registered in LRC sfn 1xb0707 cern ch flatfiles SE00 dteam generated 2 004 07 09 file50c0752c f61f 4bc3 b48e af3f22924b57 where the file to be replicated can be specified using a LEN GUID or even a particular SURL and the d option is used to specify the SE where the new replica will be stored This destination can be either an SE hostname or a complete SURL and it is expressed in the same format as with 1cg cr The command also admits the P option to add a relative path to the destination as with 1cg cr For one GUID there can be only one replica per SE If the user tries to use the 1cg rep command with a destination SE that already holds a replica the command will exit successfully but no new replica will be created Example 7 6 1 3 Listing replicas GUIDs and aliases The Icg lr command allows
136. his is considered useful A good example of this is the exposed SRM interface that GFAL provides Some code exploiting this functionality is shown later Finally below GFAL we find some other CLIs and APIs which are technology dependent Their direct use is in CERN LCG GDEIS 722398 Manuals Series Page 136 general discouraged except for the mentioned cases of the LFC client tools and the edg gridftp commands Nonetheless some notes on the RFIO API are given later on Example F 0 1 Using Icg util API to transfer a file The following example copies a file from a SE to the WN where our job is running using the call with timeout The file can be then accessed locally with normal file I O calls The source code follows include lt iostream gt include lt unistd h gt For the unlink function include lt fstream gt For file access inside doSomethingWithFile extern C include lcg_util h using namespace std A function to further process the copied file bool doSomethingWithFile const char pFile2 Ib asta int main Parameters of the lcg_cp call ifstream int lcg_cpt char src_file char dest_file char vo int nbstreams E char conf_ file int insecure int verbose int timeout char src_file lfn my_lcg_cr_example char dest_file new char 200 char vo dteam int nbstreams 1 conf_file 0 currently ignored insecure 0 currently ign
137. ich request the information can query the Registry to find out what type of informa tion is available and locate Producers that provide such information Once this information is known the Consumer can contact the Producer directly to obtain the relevant data e The Registry which mediates the communication between the Producers and the Consumers Note that the producers and consumers are processes servlets running in a server machine at each site some times known as a MON box Users interact with these servlets using CLI tools or APIs on the WNs and UIs and they in turn interact with the Registry and with consumers and producers at other sites on the user s behalf From the user s point of view the information and monitoring system appears like a large relational database and it can be queried as such Hence R GMA uses a subset of SQL as query language The Producers publish tuples database rows with an SQL insert statement and Consumers query them using SQL select statements Producer Stor ve Co S On Transfer s Data rl Registry gore 9 N gor Consumer Figure 3 The R GMA architecture R GMA presents the information as a single virtual database containing a set of virtual tables A schema contains the name and structure column names types and settings of each virtual table in the system Figure 4 The registry contains a list of producers which publish information for each table A consumer runs an
138. id Appendix D VO wide utilities Appendix IE APIs for data management and file access Appendix IF and the GLUE Schema used to describe Grid resources Appendix G CERN LCG GDEIS 722398 Manuals Series Page 15 i 3 OVERVIEW The EGEE project has a main goal of providing researchers with access to a geographically dis tributed computing Grid infrastructure available 24 hours a day It focuses on maintaining the gLite middleware and on operating a large computing infrastructure for the benefit of a vast and diverse research community The World wide LHC Computing Grid Project WLCG 3 was created to prepare the computing infrastructure for the simulation processing and analysis of the data of the Large Hadron Collider LHC experiments The LHC which is being constructed at the European Laboratory for Particle Physics CERN will be the world s largest and most powerful particle accelerator The WLCG and the EGEE projects share a large part of their infrastructure and operate it in conjunc tion For this reason we will refer to it as the WLCG EGEE infrastructure The gLite 3 0 middleware comes from a number of Grid projects like DataGrid 4 DataTag 5 Globus 6 GriPhyN 7 iVDGL 8 EGEE and LCG This middleware is currently installed in sites participating in WLCG EGEE In WLCG other Grid infrastructures exist namely the Open Science Grid OSG 9 which uses the middleware distributed by VDT 10 and N
139. ile GFAL will deal with any of the other supported protocols also Secondly RFIO does not understand GUIDs LFNs or SURLs and it can only operate with RFIO s TURLs Finally as was explained previously insecure RFIO can only be used in order to access files that are located in the same local area network where the CE holding the job is located In order to access or move files between different sites the user should use a different method Of course if the only way to remotely access a file is insecure RFIO as is the case for classic SEs or CASTOR then GFAL calls will also use insecure RFIO as the protocol for the interaction and therefore this last limitation will also apply Although direct use of RFIO s APIs is discouraged information on it and its APIs can be found in 39 Example F 0 3 Explicit interaction with the SRM using GFAL The following example program can be useful for copying a file that is stored in a MSS It asks for the file to be staged from tape to disk first and only tries to copy it when the file has been migrated The program uses both the Icg_util and the GFAL APIs From Icg_util just the 1cg_cp call is used From GFAL srm_get which requests a file to be staged from tape to disk and srm_get_status which checks the status of the previous request are used The source code follows include lt stdio h gt include lt stdlib h gt include lt sys types h gt include lt iostream gt
140. ire within less then 00 20 hours The advanced proxy management offered by the UI of gLite 3 0 through the renewal feature is available via the myproxy commands The user must know the host name of a MyProxy server or it may be the value of the MYPROXY_SERVER environment variable For the WMS to know what MyProxy server must be used in the proxy certificate renewal process the name of the server must be included in an attribute of the job s JDL file see Chapter 6 If the user does not add it manually then the name of the default MyProxy server is added automatically when the job is sub mitted This default MyProxy server node is VO dependent and is usually defined in the UI VO s configura tion file stored at SEDG_WL_LOCATION etc lt vo gt edg_wl_ui conf for the old LCG job management and at SGLITE_WMS_LOCATION etc lt vo gt glite_wmsui conf for the new gLite job management CERN LCG GDEIS 722398 Manuals Series Page 40 Example 4 4 3 1 Creating a long term proxy and storing it in a MyProxy Server To create and store a long term proxy certificate the user must do for example myproxy init s lt myproxy_server gt d n where s lt myproxy server gt specifies the hostname of the machine where a MyProxy Server runs the d option instructs the server to associate the user DN to the proxy and the n option avoids the use of a pass phrase to access to the long term proxy so that the WMS can perform the renewal auto
141. is located and how it can be accessed Logical File Name 1 Logical File Name N Figure 5 Different filenames in gLite 3 0 A file can be unambigously identified by its GUID this is assigned the first time the file is registered in the Grid and is based on the UUID standard to guarantee its uniqueness A GUID is of the form guid lt unique_string gt e g quid 93bd772a b282 4332 a0c5 c79e99fc2e9c In order to locate a file in the Grid a user will normally use an LFN LFNSs are usually more intuitive human readable strings since they are allocated by the user Their form is 1fn lt any_alias gt A Grid file can have many LENs in the same way as a file in a Unix file system can have many links The SURL provides informations about the physical location of the file Currently SURLs have the following formats sfn lt SE_hostname gt lt path gt or srm lt SE_hostname gt lt path gt for files residing on a classic SE and on an SRM enabled SE respectively Finally the TURL gives the necessary information to retrieve a physical replica including hostname path protocol and port as for any conventional URL so that the application can open or copy it The format is lt protocol gt lt SE_hostname gt lt port gt lt path gt There is no guarantee that the path or even the hostname in the SURL is the same as in the TURL for the same file For a given file there can be as many TURLs as there are data access protocols
142. l accounts When a user s request for a service reaches a host the certificate subject of the user contained in the proxy is checked against what is in the local grid mapfile to find out to which local account if any the user certificate is mapped and this account is then used to perform the requested operation 17 The second way relies on the Virtual Organisation Membership Service VOMS and the LCAS LCMAPS mechanism which allow for a more detailed definition of user privileges and will be explained in more detail later A user needs a valid proxy to submit jobs those jobs carry their own copies of the proxy to be able to authenticate with Grid services as they run For long running jobs the job proxy may expire before the job has finished causing the job to fail To avoid this there is a proxy renewal mechanism to keep the job proxy valid for as long as needed The MyProxy server is the component that provides this functionality CERN LCG GDEIS 722398 Manuals Series Page 18 i 3 2 2 User Interface The access point to the WLCG EGEE Grid is the User Interface UI This can be any machine where users have a personal account and where their user certificate is installed From a UI a user can be au thenticated and authorized to use the WLCG EGEE resources and can access the functionalities offered by the Information Workload and Data management systems It provides CLI tools to perform some basic Grid operations list
143. le can be a LFN GUID or one SURL of a valid Grid file the second argument destination file must be a local filename or valid TURL In the following example the verbose mode is used and a timeout of 100 seconds is specified lcg cp vo dteam t 100 v lfn grid dteam hosts file tmp f2 Source URL lfn grid dteam hosts File size 104857600 Source URL for copy gsiftp 1xb2036 cern ch storage dteam generated 2005 07 17 fileeal5c9c9 abcd 4e9b 8724 1 ad60chafe5b Destination URL file tmp 2 streams 1 set timeout to 100 seconds 85983232 bytes 8396 77 KB sec avg 9216 11 Transfer took 12040 ms Notice that although this command is designed to copy files from a SE to non grid resources if the proper TURL is used using the gsiftp protocol a file could be transferred from one SE to another or from out of CERN LCG GDEIS 722398 Manuals Series Page 109 the Grid to a SE This should not be done since it has the same effect as using 1cg rep BUT skipping the file registration making in this way this replica invisible to Grid users Example 7 6 1 5 Obtaining a TURL for a replica The 1cg gt allows to get a TURL from a SURL and a supported protocol The command behaves very differently if the Storage Element exposes an SRM interface or not The command always returns three lines of output the first is always the TURL of the file the last two are meaningful only in case of SRM interface e In case of classic SE or a
144. le whose name is specified as a com mand line argument The program opens the files writes a set of numbers into it and closes it Afterwards the files is opened again and the previously written numbers are read and shown to the user The source code gfal_example cpp follows include lt iostream gt include lt fcntl h gt CERN LCG GDEIS 722398 Manuals Series Page 138 include lt stdio h gt extern C include opt lcg include gfal_api h using namespace std Include the gfal functions are C and not C therefore are extern extern C int gfal_open const char int mode_t int gfal_write int const void size_t int gfal_close int int gfal_read int void size_t RRR RRR KR KK RK RK MAIN EA main int argc char argv int fd file descriptor int rc error codes size_t INTBLOCK 40 how many bytes we will write each time 40 10 int a time Check syntax there must be 2 arguments if argc 2 cerr lt lt Usage lt lt argv 0 lt lt filename n exit 1 Declare and initialize the array of input values to be written in the file int original new int 10 for int i 0 i lt 10 i original 1 1 10 just 0 10 20 30 Declare and give size for the array that will store the values read from the file int readValues new int 10 Create the file for writing with the given name cout lt lt nCreating fil
145. le should be set in all WNs and UIs The P option allows the user to specify a relative path name for the file in the SE The absolute path is built appending the relative path to a root directory which is VO and SE specific and is determined through the Information System If no P option is given the relative path is automatically generated following a certain schema There is also the possibility to specify the destination as a complete SURL including SE hostname the path and a chosen filename The action will only be allowed if the specified path falls under the user s VO directory The following are examples of the different ways to specify a destination d 1xb0710 cern ch d sfn 1xb0710 cern ch flatfiles SE00 dteam my_file d 1xb0710 cern ch P my_dir my_file The option 1 lt 1f n gt can be used to specify a LEN leg cr vo dteam d 1xb0710 cern ch 1 lfn my_aliasl file home antonio filel gt guid db7ddbc5 613e 423f 9501 3c0c00a0ae24 REMINDER If the RLS catalog is used the LFN takes the form 1fn lt someLFN gt where lt someLFN gt can be any string In the case that the LFC is used the LFNs are organized to a hierarchical namespace like UNIX di rectory trees So the LEN will take the form 1fn grid lt voname gt lt dir1 gt Remember that subdirectories in the name space are not created automatically by 1cg cr and you should manage them yourself through the lfc mkdir and 1fc rmdir command line tools describ
146. local LFC servers In order to retrieve the names corresponding to the local LFC servers of a certain VO use the command as follows lcg infosites vo atlas lfcLocal 1xb2038 cern ch pps 1fc cnaf infn it CERN LCG GDEIS 722398 Manuals Series Page 46 ccleglfcli03 in2p3 fr ae 5 1 2 Icg info The 1cg info command can be used to list either CEs or SEs satisfying a given set of conditions on their attributes and to print for each of them the values of a given set of attributes The information is taken from the BDII specified by the LCG_GFAL_INFOSYS environment variable or in the command line The general format of the command for listing CEs or SEs information is lceg info list ce list se query lt some_query gt attrs lt some_attrs gt where either 1ist ce or list se must be used to indicate if CEs or SEs should be listed the query option introduces a filter conditions to be accomplished to the elements of the list and the attrs option may be used to specify which attributes to print If list ce is specified then only CE attributes are considered others are just ignored and the reverse is true for list se The attributes supported which may be included with attrs or within the query expression are only a subset of the attributes present in the GLUE schema those that are most relevant for a user The vo option can be used to restrict the query to CEs and SEs which support
147. lows bin bash Usage information if lt 4 then echo Not enough input arguments echo Usage interaction sh lt input_pipe gt lt output_pipe gt lt error_pipe gt lt listener_pid gt exit 1 some error number fi Welcome message echo e nInteractive session startedin 22 n Read what the job sends and present it to the user cat lt 2 amp Get the user reply read userInput echo SuserInput gt 1 Clean up wait two seconds for the pipes to be flushed out sleep 2 rm 1 2 3 Remove the pipes if n 4 then kill 4 Kill the shadow listener fi And we are done echo e n i echo The temporary files have been deleted and the listener process killed CERN LCG GDEIS 722398 Manuals Series Page 83 echo The interactive session ends here exit 0 Note that before exiting the script removes the temporary pipe files and kills the listener process This must be done either inside the script or manually by the user if the nolisten option is used otherwise the X window or text console interfaces created by edg job submit will do it automatically Now let us see what the result of the interaction is dialog sh tmp listener IxKsoi817f XbygN56dNwug in A tmp listener IxKsoi817fXbygN56dNwug out tmp listener 1xKsoi817fXbygN56dNwug err 7335 Interactive session started Welcome Please tell me your name A
148. ls Series Page 19 The WNs generally have the same commands and libraries installed as the UI apart from the job management commands VO specific application software may be preinstalled at the sites in a dedicated area typically on a shared file system accessible from all WNs It is worth stressing that strictly speaking a CE corresponds to a single queue in the LRMS follow ing this naming syntax CEName lt gg _hostname gt lt port gt jobmanager lt LRMS_type gt lt batch_queue_name gt According to this definition different queues defined in the same cluster are considered different CEs This is currently used to define different queues for jobs of different lengths or other properties e g RAM size or from different VOs Examples of CE names are cel01 cern ch 2119 jobmanager lcglsf grid_alice t2 ce 01 mi infn it 2119 Jobmanager 1cgpbs short 3 2 4 Storage Element A Storage Element SE provides uniform access to storage resources The Storage Element may control simple disk servers large disk arrays or tape based Mass Storage Systems MSS Most WLCG EGEE sites provide at least one SE Storage Elements can support different data access protocols and interfaces described in detail in Section 7 2 Simply speaking GSIFTP a GSI secure FTP is the protocol for whole file transfers while local and remote file access is performed using RFIO or gsidcap Some storage resources are managed by a Storage Resource Manager
149. lueHost ProcessorLoad GlueHostProcessorLoadLast1Min one minute average processor availability for a single node GlueHostProcessorLoadLast5Min 5 minute average processor availability for a single node GlueHostProcessorLoadLast15Min 15 minute average processor availability for a single node SMP load objectclass GLueHost SMPLoad GlueHostSMPLoadLast1Min one minute average processor availability for a single node GlueHostSMPLoadLast5Min 5 minute average processor availability for a single node GlueHostSMPLoadLast15Min 15 minute average processor availability for a single node Operating system objectclass GLueHostOperatingSystem GlueHostOperatingSystemOSName OS name GlueHostOperatingSystemOSRelease OS release GlueHostOperatingSystemOSVersion OS or kernel version e Local file system objectclass GLueHost LocalFileSystem GlueHostLocalFileSystemRoot path name or other information defining the root of the file system GlueHostLocalFileSystemSize size of the file system in bytes GlueHostLocalFileSystemAvailableSpace amount of free space in bytes GlueHostLocalFileSystemReadOnly true if the file system is read only GlueHostLocalFileSystemType file system type GlueHostLocalFileSystemName the name for the file system GlueHostLocalFileSystemClient host unique identifier of clients allowed to remotely access this file s
150. mand line interface or APIs for C C python and java and for queries there is also a web browser interface Several applications already use R GMA especially for accounting and monitoring purposes This section gives a brief overview of R GMA but for more information see 21 5 2 1 R GMA concepts From a user point of view R GMA is very similar to a standard relational database Data are organised in relational tables and inserted and queried with SQL style INSERT and SELECT statements the allowed syntax is a subset of SQL but reasonably complete for most purposes However there are some differences to bear in mind The most basic is that a standard relational database can only have one row tuple with a given primary key value but R GMA usually has more than one Related to this is the fact that R GMA supports three different query types Each tuple has a timestamp and for a given primary key value you can query the most recent tuple Latest query a history of all tuples within some defined retention period History query or ask for tuples to be streamed to you as they are published Continuous query Continuous queries can also return a limited amount of historical old data There are also some differences depending on how and where the data are stored Each site has an R GMA CERN LCG GDEIS 722398 Manuals Series Page 58 server which deals with all R GMA interaction from clients on that site The servers store
151. matically The output will be similar to Your identity O Grid O CERN OU cern ch CN John Doe Enter GRID pass phrase for this identity Creating PLORY dai a a Done Your proxy is valid until Thu Jul 17 18 57 04 2003 A proxy valid for 168 hours 7 0 days for user O Grid O CERN OU cern ch CN John Doe now exists on myproxy cern ch By default the long term proxy lasts for one week and the proxy certificates created from it last 12 hours These lifetimes can be changed using the c and the t option respectively If the s lt myproxy_server gt option is missing the command will try to use the MYPROXY_SERVER envi ronment variable to determine the MyProxy Server ATTENTION If the hostname of the MyProxy Server is wrong or the service is unavailable the output will be similar to Your identity O Grid O CERN OU cern ch CN John Doe Enter GRID pass phrase for this identity Greating Proxy ii ede tae A a Ree Done Your proxy is valid until Wed Sep 17 12 10 22 2003 Unable to connect to adc0014 cern ch 7512 where only the last line reveals that an error occurred Example 4 4 3 2 Retrieving information about a long term proxy To get information about a long term proxy stored in a Proxy Server the following command may be used myproxy info s lt myproxy_server gt d where the s and d options have the same meaning as in the previous example The output is similar to CERN LCG GDEIS 722398 Manuals Series Page
152. n a CE and a SE or several SEs This is defined by site administrators and is used when scheduling jobs that must access input files or create output files from or to SEs In the GLUE Schema they are defined in the UML diagram for the Computing Element Storage Service Bind e Associations between an CE and one or more SEs objectclass GlueCESEBindGroup Gl Gl ueCESEBindGroupCEUniquelD unique ID for the CE ueCESEBindGroupSEUniquelD unique ID for the SE e Associations between an SE and a CE objectclass GlueCESEBind Gl Gl Gl Gl of Gl ueCESEBindCEUniqueID unique ID for the CE ueCESEBindCEAccesspoint access point in the cluster from which CE can access a local SE ueCESEBindSEUniqueID unique ID for the SE ueCESEBindMount Info information about the name of the mount directory on the worker nodes the CE and the exported directory from the SE ueCESEBindWeight it expresses a preference when multiple SEs are bound to a CE CERN LCG GDEIS 722398 Manuals Series Page 160 G 6 THE DIT USED BY THE MDS The DITs used in the local BDIIs the GRISes of a CE and a SE are shown in the Figures 15 16 17 18 and 19 The GRIS of a CE contains information for computing resources different entries for the different queues and also for the computing storage service relationships A GRIS located in a SE publishes information for the storage resources The DITs of every GRIS in a site are included in the DI
153. n addition it makes it possible to pre stage files migrate them on the stager beforehand so that they are available on disk to an application at runtime to pin and unpin files ensure the persistency of files on disk until they are released to reserve space in the stager etc SRM interfaces to other MSS rather than CASTOR are not supported by gLite 3 0 although they are strongly desirable It is therefore up to the sites to provide an SRM implementation for their specific MSS e dCache Disk pool manager it consists of a dCache server and one or more pool nodes The server represents the single point of access to the SE and presents files in the pool disks under a single virtual file system tree Nodes can be dynamically added to the pool File transfer is managed through GridFTP while the native gsidcap protocol allows POSIX like data access It presents an SRM interface which allows to overcome the limitations of the Classic SE e LCG Disk pool manager is the LCG lightweight alternative to dCache easy to install and although not so powerful as dCache offers all the functionality required by small sites Disks can be added dynamically to the pool at any time Like in dCache a virtual file system hides the complexity of the disk pool architecture The LCG DPM includes a GridFTP server for file transfer and ensures file access through secure RFIO It also presents an SRM interface In addition disk quota allocation per VO is supported For th
154. n ch project persist tutorial learningPoolByExamples html The edg replica manager wrapper script http grid deployment web cern ch grid deployment eis docs edg rm wrapper pdf R Santinelli F Donno Experiment Software Installation on LCG 2 https edms cern ch document 498080 APPLICABLE DOCUMENTS A1 EDG User s Guide http marianne in2p3 fr datagrid documentation EDG Users Guide 2 0 pdf CERN LCG GDEIS 722398 Manuals Series Page 11 A2 A3 A4 A5 A6 A7 1 6 1 6 1 AFS APT BDU LCG 1 User Guide http grid deployment web cern ch grid deployment eis docs LCG 1 UserGuide htm LDAP Services User Guide http hepunx rl ac uk edg wp3 documentation wp3 Idap_user_guide html LCG 2 User Scenario https edms cern ch file 498081 UserScenario2 pdf LCG 2 Frequently Asked Questions https edms cern ch document 495216 Tank And Spark http grid deployment web cern ch grid deployment eis docs internal chep04 SW_Installation pdf How to Manually configure and test the Experiment Software Installation mechanism on LCG 2 http grid deployment web cern ch grid deployment eis docs configuration_of tankspark TERMINOLOGY Glossary Andrew File System Application Programming Interface Berkeley Database Information Index CASTOR CERN Advanced STORage manager CE Computing Element CERN European Laboratory for Particle Physics ClassAd Classified ad
155. n the SE s access point path to the storage area of the SE the relative path to the VO of the file s owner and the relative path to the file sfn lt SE_hostname gt lt SE_Accesspoint gt lt VO_path gt lt filename gt sfn tbed0101 cern ch flatfiles SE00 dteam generated 2004 02 26 file3596e86 c402 1 1d7 a6b0 f53ee5a37eld In the case of SRM managed SEs one cannot assume that the SURL will have any particular format other than the srm prefix and the hostname In general SRM managed SEs can use virtual file systems and the name a file receives may have nothing to do with its physical location which may also vary with time An example of this kind of SURL follows srm castorgrid cern ch castor cern ch grid dteam generated 2004 09 15 file24e3227a cb1b 4826 9e5c 07dfb9f257a6 e The Transport URL TURL which is a valid URI with the necessary information to access a file in a SE has the following form lt protocol gt lt some_string gt gsiftp tbed0101 cern ch flatfiles SE00 dteam generated 2004 02 26 fi1e3596e86f c40 2 11d7 a6 b0 f53ee5a37eld where lt protocol gt must be a valid protocol supported by the SE to access the contents of the file GSIFTP RFIO gsidcap and the string after the double slash may have any format that can be understood by the SE serving the file While SURLs are in principle invariable they are entries in the file catalog see Section 7 4 TURLs are obtained dynamically from the SURL throug
156. nd will use the correct protocol usually GSIFTP for file transfer Apart from the basic calls lcg_cp 1cg cr etc there are other calls that enhance this with a buffer for com plete error messages 1cg_cpx lcg_crx that include timeouts lcg_cpt lcg_crt and both 1cg_ cpxt lcg_crxt Actually all calls use the most complete version i e lcg_cpxt with default values for the argu ments that were not provided Below the Icg_util API we find the Grid File Access Library GFAL GFAL provides calls for catalog inter action storage management and file access and can be very handy when an application requires access to some part CERN LCG GDEIS 722398 Manuals Series Page 135 of a big Grid file but does not want to copy the whole file locally The library hides the interactions with the LCG 2 catalogs and the SEs and SRMs and presents a POSIX interface for the I O operations on the files The function names are obtained by prepending gfal _ to the POSIX names for example gfal_open gfal_read gfal_close GFAL accepts GUIDs LFNs SURLs and TURLs as file names and in the first two cases it tries to find the closest replica of the file Depending on the type of storage where the file s replica resides in GFAL will use one protocol or another to access it GFAL can deal with GSIFTP secure and insecure RFIO or gsidcap in a transparent way for the user unless a TURL is used in which case the protocol is explicitl
157. nformation the tuple with the most recent timestamp for a given value of the primary key is sent to the consumer e History all tuples within a configurable retention period are stored to allow subsequent retrieval by con sumers Latest queries correspond most directly to a standard query on a real database Primary producers are usually of type continuous Secondary producers which often use a real database to store the data must be set up in advance to archive information and be able to reply to latest and or history queries Secondary producers are also required for joins to be supported in the consumer queries R GMA is currently used for accounting and both system and user level monitoring It also holds the same GLUE schema information as the MDS although this is not currently used to locate resources for job submission CERN LCG GDEIS 722398 Manuals Series Page 24 3 2 6 Data Management In a Grid environment files can have replicas at many different sites Ideally the users do not need to know where a file is located as they use logical names for the files that the Data Management services will use to locate and access them Files in the Grid can be referred to by different names Grid Unique IDentifier GUID Logical File Name LFN Storage URL SURL and Transport URL TURL While the GUIDs and LFNs identify a file irrespective of its location the SURLs and TURLs contain information about where a physical replica
158. ng on Computing Elements and Storage Elements at the different sites report information on the characteristics and status of the services They give both static and dynamic information In order to interrogate the GRIS on a specific Grid Element the hostname of the Grid Element and the TCP port where the GRIS run must be specified Such port is always 2135 The following command can be used ldapsearch x h lt hostname gt p 2135 b mds vo name local o grid where the x option indicates that simple authentication instead of LDAP s SASL should be used the h and p options precede the hostname and port respectively and the b option is used to specify the initial entry for the search in the LDAP tree For a GRIS the initial entry of the DIT is always o grid and the second one next level is mds vo name local It is in the entries in the deeper levels that the actual resource information is shown That is why mds vo name local o grid is used as DN of the initial node for the search For details please refer to Appendix G The same effect can be obtained with ldapsearch x H lt LDAP_URI gt b mds vo name local o grid where the hostname and port are included in the H lt LDAP_URI gt option avoiding the use of h and p Example 5 1 3 1 Interrogating the GRIS on a Computing Element The command used to interrogate the GRIS located on host 1xb2006 is ldapsearch x h 1xb2006 cern ch p 2135 b mds vo name
159. nikhef nl CERN LCG GDEIS 722398 Manuals Series Page 88 nodel6 4 farmnet nikhef nl Checking nodel6 44 farmnet nikhef nl nodel6 44 farmnet nikhef nl Checking nodel6 45 farmnet nikhef nl nodel6 45 farmnet nikhef nl Checking nodel6 45 farmnet nikhef nl nodel6 45 farmnet nikhef nl Checking nodel6 46 farmnet nikhef nl nodel6 46 farmnet nikhef nl Checking nodel6 46 farmnet nikhef nl nodel6 46 farmnet nikhef nl Checking nodel6 47 farmnet nikhef nl nodel6 47 farmnet nikhef nl Checking nodel6 47 farmnet nikhef nl nodel6 47 farmnet nikhef nl Checking nodel6 48 farmnet nikhef nl nodel6 48 farmnet nikhef nl Checking nodel6 48 farmnet nikhef nl nodel6 48 farmnet nikhef nl AXKAKKKKKKKKKKK KK KK KK KK KK K KK KK KK KK KK RRA KK KA KK KK KK RAR AA AAA KK KK AAA AAA AAA Executing MPItest with mpirun out of 1 out of 1 of 1 Hello world from processor 2 world from processor 6 world from processor 3 out world from processor 4 out of 1 world from processor 7 out of 1 8 5 1 9 0 world from processor 8 out of 1 world from processor 5 out of 1 world from processor 1 out of 1 world from processor 9 out of 1 world from processor E108 A Oe ee ee eee 000000000 000000000 ou 6 3 7 Advanced Command Options All the glite job and edg job commands read some configuration files which the user can edit if he is not sa
160. niquelD attribute of the GlueCESEBindGroup object class For more information see Appendix G Warning Sometimes using the InputData attribute might result to be a bad idea if many jobs need to access a single file and a unique replica of such file exists in the Grid all jobs will land at the site containing the replica Very soon all the CPUs of the site will be filled and jobs will be put in waiting state It would be more efficient to schedule the job in a site with free CPUs and replicate the file in the close SE of the site Example 7 7 2 Specifying a Storage Element With the LCG RB the user can ask the job to run close a specific Storage Element in order to store there the output data using the attribute Output SE For example OutputSE castorsrm cern ch The Resource Broker will abort the job if there is no CE close to the Output SE specified by the user NOTE The same is not true with the gLite RB Even if there are CEs close to a given Output SE specified by the user no resources get matched when this field is defined on the JDL CERN LCG GDEIS 722398 Manuals Series Page 119 Example 7 7 3 Automatic upload and registration of output files The OutputData attribute list allows the user to automatically upload and register files produced by the job on the WN For each file three attributes can be set e The OutputFile attribute is mandatory and specifies the name of the generated file to be uploaded to the
161. nsfer job consists of a set of files to be transferred in a source destination pairs format A job may optionally have parameters key value pair block used for passing options to the underlying transport layer gridFTP and contains a MyProxy password used together with the client s subject to retrieve the user s proxy used for the transfer File Refers to the source destination physical name pairs to be transferred Both source and destination should be defined in a Storage URL format suitable for interaction with an SRM or in a Gridftp format i e used by globus url copy Job State the job s state is a function of whole individual File states constituting the Job File State is the state of an individual file transfer Channel is a specific network pipe used for transferring files There are two distinguished types of chan nels production channels for ensuring bulk distribution of files within a production job These channels are dedicated network pipe between Tier 0 Tier 1 s and other major Tier 2 s centers non production channels assigned typically to open network these channels do not assure production QOS The jobs and their constituent files are processed asynchronously Upon submission the client is handed a job identifier that can later be used to query the status of the job as it progresses through the system The job identifier allows the user for querying the various statuses of his jobs and for canceling them Once a job h
162. ntonio That is all Antonio Bye bye kkxkxkxkxkkxkkkkkkkkkkkkkkkkkkkkkkkkkkxkXxk x INTERACTIVE JOB FINISHED kkkkxkxkxkxkxkxkxkkkxkkkkkkkkkkkkkkkkkkkkxkxk The temporary files have been deleted and the listener process killed The interactive session ends here Until now several options for the edg job submit command used for interactive jobs have been explained but there is another command that is used for this kind of jobs It is the edg job attach command Usually the listener process and the X window are started automatically by edg job submit However in the case that the interactive session with a job is lost or if the user needs to follow the job from a different machine not the UD or on another port a new interactive session can be started with the edg job attach command This commands starts a listener process on the UI machine that is attached to the standard streams of a previously submitted interactive job and displays them on a dedicated window The port lt port_number gt option specifies the port on which the listener is started 6 3 5 Checkpointable Jobs NOTE Checkpointable jobs are not supported in WLCG EGEE and that functionality is not part of the official distribution of the current gLite 3 0 release Checkpointable jobs in the gLite 3 0 middleware have never been tested and will not be described here Any site installing or using it will do it only under its own responsibility CERN LCG GDEIS 7
163. ob end up in a particular CE the BrokerInfo file will not be created The information about the BrokerInfo file the glite brokerinfo or edg brokerinfo for LCG WMS CLI and its respective API can be found in 36 The glite brokerinfo command has the following syntax edg brokerinfo v f lt filename gt function parameter parameter where function is one of the following e getCE returns the name of the CE the job is running on e getDataAccessProtocol returns the protocol list specified in the DataAccessProtocol JDL attribute e getInputData returns the file list specified in the InputData JDL attribute e getSEs returns the list of the Storage Elements with contain a copy of at least one file among those specified in InputData e getCloseSEs returns a list of the Storage Elements close to the CE e getSEMountPoint lt SE gt returns the access point for the specified lt SE gt if it is in the list of close SEs of the WN e getSEFreeSpace lt SE gt returns the free space on lt SE gt at the moment of match making e getLFN2SFN lt LFN gt returns the storage file name of the file specified by lt LFN gt where lt LFN gt is a logical file name of a GUID specified in the InputData attribute e getSEProtocols lt SE gt returns the list of the protocols available to transfer data in the Storage Element lt SE gt CERN LCG GDEIS 722398 Manuals Series Page 79 e getSEPort lt SE gt lt Protocol gt
164. og type 1 c LCG_LOCATION Base of the installed LCG software Type of RFIO secure or insecure to use by GFAL LCG_TMP Temporary directory SLCG_RFIO_TYPE LRC_ENDPOINT May be used to define th LRC endpoint for Icg utils and GFAL over riding the IS SRMC_ENDPOINT May be used to define the RMC endpoint for Icg utils and GFAL overriding the IS SVO_ lt VO_name gt _DEFAULT_SE Default SE defined for a CE Ina WN VO_ lt VO_name gt _SW_DIR Base directory of the VO s software or for WNs with no shared file system among WNs Ina WN X509_USER_PROXY Location of the user proxy certificate default is tmp x509up_ujuid CERN LCG GDEIS 722398 Manuals Series Page 126 Configuration files Configuration File Notes SEDG_WI L_LOCATION etc edg_wl_ui_cmd_var conf WMS command line tools settings Retry count er ror and output directory default VO SEDG_WI L_LOCATION etc lt VO gt edg_wl_ui conf VO s specific WMS settings Resource Broker Log ging and Bookkeeping server and Proxy Server to use SEDG_LOCATION var edg rgma rgma defaults RGMA default values SEDG_LOCATION etc edg_wl conf In the RB Resource Broker s configuration file It may be useful to find out which BDII the RB is using This file can not be edited by the user since it is located in the Resou
165. ogate a site GIIS is usually the same as that of GRISes 2135 In order to interrogate the GIIS and not the local GRIS a different base name must be used instead of mds vo name local o grid and one formed basing on the site name is generally used For the site BDII the port is different 2170 but the base name is also of the same format of site GIISes The complete contact string for a site GIIS is published in the GOC page So if for example you have a look to the following URL https goc grid support ac uk gridsite db index php siteSelect INFN CNAF You will retrieve the information shown in Figure the GIS URL is ldap gridit ce 001 cnaf infn it 2135 mds vo name infn cnaf o grid In this case the site still has a site GIIS In order to interrogate it we can use the command shown in the following example Example 5 1 4 1 Interrogating the site BDII ldapsearch x H ldap cert ce 03 cnaf infn it 2170 b o grid ersion 2 filter objectclass requesting ALL grid dn o grid objectClass GlueTop resource grid dn mds vo name resource o grid objectClass GlueTop CERN LCG GDEIS 722398 Manuals Series Page 53 GOC Database Mozilla lt 2 gt File Edit View Go Bookmarks Tools Window Help Beek Boren E aca Stop amp https goc grid support ac uk gridsite db index php siteSelect INFN CNAF Site Information INFN CNAF ROC italy Site local time 17 00 PM 10 03 20
166. ogical AND of logical expressions separated by commas and the only allowed operator is In equality comparisons of strings the matches any number of characters Another useful query is the one to know which CEs have installed a particular version of an experiment s software That would be something like leg info vo cms list ce attrs Tag query Tag ORCA_8_7_1 Example 5 1 2 3 List all the Storage Elements in the BDI Similarly suppose that you want to know which CEs are close to each SE lcg info list se vo cms attrs CloseCEs the output will be like SE castorgrid cern ch CloseCE ce01 slc3 cern ch 2119 jobmanager lcglsf grid ce01 slc3 cern ch 2119 jobmanager lcglsf grid_cms hephygr oeaw ac at 2119 jobmanager torque cms ce01 slc3 cern ch 2119 jobmanager lcglsf grid_lhcb ce01 slc3 cern ch 2119 jobmanager lcglsf grid_alice ce01 slc3 cern ch 2119 jobmanager lcglsf grid_atlas ce01 slc3 cern ch 2119 jobmanager lcglsf grid_dteam hephygr oeaw ac at 2119 jobmanager torque dteam CERN LCG GDEIS 722398 Manuals Series Page 48 The bdii option can be used to specify a particurar bdii e g bdii exp bdii cern ch 2170 and the sed option can be used to output the results of the query in a format easy to parse in a script in which values for different attributes are separated by and values of list attributes are separated by 5 1 3 The Local GRIS The local GRISes runni
167. or the RB to select CEs that are close to the files required by a job by the InputData attribute it has to locate the SEs where these files are stored In order to do this the RB uses the Data Location Interface service which in turn contacts some file catalog Since it is possible that several different file catalogs exist on the Grid the user has the possibility to select which one he wants to talk to by using the JDL attribute DataCatalog The user will specify the attribute and the endpoint of the catalog as value as in this example DataCatalog http lfc lhcb ro cern ch 8085 If no value is specified then the first catalog supporting the DLI interface that is found in the information system is used which should probably be the right choice for the normal user NOTE Only catalogs that present a DLI interface can be talked to by the DLI service Currently the RLS catalog does not provide such interface and so in order to use this catalog the RB is configured to interface directly with it without using the DLI Thus summarizing the RB can be configured to either talk to the RLS catalog or to use the DLI In the second possibility the user has the option to manually specify the endpoint of the catalog he wants to use 7 8 ACCESSING FILES ON GLITE 7 8 1 VOMS interaction While the WMS relies on the LCMAPS mechanism for getting local UID GID on the user voms proxy presented basis the DMS implements the concept of Virtual I
168. orduGrid 11 which uses the ARC middleware These are not covered by this guide The case of the LHC experiments illustrates well the motivation behind Grid technology The LHC accelerator will start operation in 2007 and the experiments that will use it ALICE ATLAS CMS and LHCb will generate enormous amounts of data The processing of this data will require huge computational and storage resources and the associated human resources for operation and support It was not considered feasible to concentrate all the resources at one site and therefore it was agreed that the LCG computing service would be implemented as a geographically distributed Computational Data Grid This means that the service will use computing and storage resources installed at a large number of computing sites in many different countries interconnected by fast networks The gLite middleware hides much of the complexity of this environment from the user giving the impression that all of these resources are available in a coherent virtual computer centre The users of a Grid infrastructure are divided into Virtual Organisations VOs 1121 abstract entities grouping users institutions and resources in the same administrative domain 113 The WLCG EGEE VOs correspond to real organisations or projects such as the four LHC experi ments the community of biomedical researchers etc An updated list of all the EGEE VOs can be found at the CIC portal 14 CERN LCG GDE
169. ored int verbose 1 int timeout 180 Form the name of the destination file char my_file my_retrieved_file char pwd getenv PWD strcpy dest_file file strcat dest_file pwd strcat dest_file strcat dest_file my_file CERN LCG GDEIS 722398 Manuals Series Page 137 The lcg_cp call itself if lcg_cp src_file dest_file vo nbstreams 0 0 verbose timeout 0 cout lt lt File correctly copied to local filesystem lt lt endl else perror Error with lcg_cp Further processing doSomethingWithFile my_file Cleaning Delete the temporary file unlink my_file That was it cout lt lt endl return 0 end of main The program should be compilable with the following line opt gec 3 2 2 bin c 3 2 2 ISLCG_LOCATION include Y LSLCG_LOCATION 1lib LSGLOBUS_LOCATION 1lib A lleg_util lgfal lglobus_gass_copy_gcec32 o lcg_cp_example lcg_cp_example cpp Note The linkage with libglobus_gass_copy_gcc32 so should not be necessary and also the one with libgfal so should done transparently when linking liblcg_util so Nevertheless their explicit linkage as shown in the example was necessary for the program to compile in the moment that this guide was written At time of reading it may not be necessary anymore Example F 0 2 Using GFAL to access a file The following C code uses the libgfal library to access a Grid fi
170. oup z5 permMode 33188 checksumType null checksumValue null isPinned false isPermanent true isCached false state fileld 0 TURL estSecondsToStart 0 sourceFilename destFilename queueOrder 0 edg gridftp ls v gsiftp castorsrm cern ch castor cern ch grid lhcb gt rw rw r 1 lheb001 z5 2317 Aug 15 2005 test acsmith 009 dat gt mrw r r 1 lhcb001 z5 12553 Feb 20 12 02 zz_globus gt rw rw r 1 lhcb001 25 12553 Nov 04 14 00 zz_zz f Attention On CASTOR based storages the space occupied by a file is freed only through the command edg gridftp rm or the native CASTOR command nsrm edg gridftp rm gsiftp castorsrm cern ch castor cern ch grid lhcb zz_globus edg gridftp ls v gsiftp castorsrm cern ch castor cern ch grid lhcb gt rw rw r 1 lheb001 z5 2317 Aug 15 2005 test acsmith 009 dat gt rw rw r 1 lhcb001 25 12553 Nov 04 14 00 zz_zz f By issuing srm advisory delete versus a d cache storage the user can however also remove multiple files in one single operation bulk removal The following example shows the content of a directory on a remote d cache storage edg gridftp ls v gsiftp srm grid sara nl pnfs grid sara nl data lhcb gt drw 512 test gt drw 512 sc3 gt rw 12553 from_CNAF gt drw 512 production gt rw 12553 ZZ_ZZRAL f gt drw 512 disk gt rw 12553 gsi_zz gt rw 12553 ZZ_ZZ gt drw 512 user gt rw 12553 22 7aP IG CERN LCG GDEIS 722398 Manuals Se
171. out lt lt nThe status of the file is lt lt endl cout lt lt endl lt lt filestatuses 0 status lt lt lt lt filestatuses 0 surl free filestatuses 0 surl if filestatuses 0 status 1 cout lt lt lt lt filestatuses 0 turl lt lt lt lt endl free filestatuses 0 turl else cout lt lt endl free filestatuses if filestatuses 0 status 1 cout lt lt endl lt lt Error when trying to stage the file Not waiting lt lt endl exit 1 CERN LCG GDEIS 722398 Manuals Series Page 144 Now watch the status until it gets to STAGED 1 int srm_getstatus int nbfiles char surls int reqid char token E struct srm_filestatus filestatuses int timeout cout lt lt nWaiting for the file to be staged in lt lt endl int numiter 1 int filesleft 1 char destfile new char 200 while numiter lt 50 amp amp filesleft gt 0 sleep longer each iteration sleep numiter cout lt lt just to show we are waiting and not dead cout flush if nbreplies srm_getstatus nbfiles surls reqid NULL amp filestatuses timeout lt 0 perror srm_getstatus exit 1 if filestatuses 0 status 1 cout lt lt nREADY lt lt filestatuses 0 surl lt lt endl filesleft Create a name for the file to be saved strcpy destfile file tmp srm_gfal_retrieved cout
172. owed to read and write etc SRMs will be able also to handle request for transfers from one SRM to another one third party copy It is important to notice that the SRM protocol is a storage management protocol and not a file access or file transfer one For these tasks the client application will access directly the appropriate file access or transfer server Unfortunately at the moment not all SEs implement the same version of the SRM interface and none of these versions offers all of the functionalities that the SRM standard 19 defines The high level Data Management tools and APIs will in general interact transparently with SRM 7 2 3 Types of Storage Elements There are different types of possible SEs in gLite 3 0 e Classic SE it consists of a GridFTP server and an insecure RFIO daemon rfiod in front of a physical single disk or disk array The GridFTP server supports secure data transfers The rfiod daemon ensures LAN limited file access through RFIO If the same Classic SE serves multiple Virtual Organizations the only way to allocate disk quota per VO is through physical partitioning of the disk which is an option that site managers in general do not like In other words a single VO might fill up the entire SE Once again 1t is user responsibility to monitor disk usage in the case of a Classic SE Furthermore the classic SE does NOT support the SRM interface and never will SA storage Manager developed by the Deu
173. owever there is currently no authorisation control so anyone can read and publish to any table This is expected to be added in future releases 5 2 2 The R GMA Browser The R GMA browser is usually installed on each R GMA server It allows the user to easily navigate the schema to see what tables are available and how they are defined see all available producers for a table and query the selected producers All this can be achieved using a web interface Figure 9 shows this R GMA browser web interface It is accessible via the following URL https lcgmon01 gridpp rl ac uk 8443 R GMA index html You can replace the hostname with the name of your local server to get a better response In the left hand bar you have a list of predefined tables to query selecting one of them will give a drop down list of matching items you can select or you can just hit the Query button to see everything Alternatively selecting the Table Sets link gives a complete list of all tables in the schema Clicking on a table name gives a page where you can query the table definition enter an SQL query select the query type and see and select from a list of all producers for that table If you simply hit the Query button you get a Latest query for all data in the table which corresponds to the intuitive idea of the current content of the table The best way to get some understanding of R GMA is to play with the query interface for one of the standard tabl
174. pceisOl cern ch port 7772 Logging to host pceis01l cern ch port 9002 KKKKKKKKKKKKKKK KK KK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKKKKKKKKKKKK JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server Use edg job status command to check job current status Your job identifier edg_jobld is https pceis0l cern ch 9000 IxKsoi81I7f XbygN5 6dNwug The Interactive Streams have been successfully generated CERN LCG GDEIS 722398 Manuals Series Page 82 with the following parameters Host 137 138 228 252 Port 37033 Shadow process Id 7335 Input Stream location tmp listener 1xKso1817fXbygN56dNwug in Output Stream location tmp listener 1xKsoi817fXbygN56dNwug out Error Stream location tmp listener 1xKso1817fXbygN56dNwug err KKEKKKKKKKKKKKK KK KK KK KARA KK KK KK RRA AAA AAA KA KK KARA ARA KA A KR RRA AAA KK KK KARA Once the job has been submitted the dialog sh script can be invoked passing the four arguments as de scribed earlier The code of the script is quite simple as it just reads from the output pipe and waits for the user s input which in this case will be just one string This string the user s name is the only thing that our job interactive sh needs to complete its work A more general tool should keep waiting for further input in a loop until the user instructs it to exit Of course some error checking should be also added The code of dialog sh fol
175. r success 1 for a general error 2 for an existing file that can not be overwritten and 3 for user permission error Example 7 6 3 2 Retrieving information about a file CERN LCG GDEIS 722398 Manuals Series Page 114 srm get metadata is a useful command for grid users to get the metadata of SRM stored data The following example shows how to infer such information Ssrm get metadata srm srm grid sara nl 8443 pnfs grid sara nl data lhcb zz_zz gt lic SRMClientVl getFileMetaData contacting service httpg srm grid sara nl 8443 srm managervl FileMetaData srm srm grid sara nl 8443 pnfs grid sara nl data lhcb zz_zz RequestFileStatus SURL srm srm grid sara nl 8443 pnfs grid sara nl data lhcb zz_zz size 12553 owner 1781 group 1164 permMode 420 checksumType adler32 checksumValue 65aa6b8f isPinned false isPermanent true isCached true state fileld 0 TURL estSecondsToStart 0 sourceFilename destFilename queueOrder 0 Attention On d cache the srmcp command grants by default read and write permissions to any member of the group the user belongs to Nobody else is granted to access the file The globus url copy has the same behavior On CASTOR based storages srmcp sets read write access to the owner and to the members of the same group and read access to the others Globus url copy by default sets read write access to the owner and only read access to both members of the group and the others
176. rce Broker but may be retrieved via GridFTP to look at its contains CERN LCG GDEIS 722398 Manuals Series Page 127 i APPENDIX C JoB STATUS DEFINITION As it was already mentioned in Chapter 6 a job can find itself in one of several possible states Also only some transitions between states are allowed These transitions are depicted in Figure I2 SUBMITTED N A Le CLEARED Figure 12 Possible job states in the LCG 2 CERN LCG GDEIS 722398 Manuals Series Page 128 And the definition of the different states is given in this table Status Definition SUBMITTED The job has been submitted by the user but not yet processed by the Network Server WAITING The job has been accepted by the Network Server but not yet processed by the Workload Manager READY The job has been assigned to a Computing Element but not yet transferred to it SCHEDULED The job is waiting in the Computing Element s queue RUNNING The job is running DONE The job has finished ABORTED The job has been aborted by the WMS e g because it was too long or the proxy certificated expired etc CANCELED The job has been canceled by the user CLEARED The Output Sandbox has been transferred to the User Interface CERN LCG GDEIS 722398 Manuals Series Page 129 APPENDIX D USER TOOLS D 1 INTRODUCTION This section introduces some tools that are not really part of the mi
177. rent gLite 3 0 are summarized in the next table Protocol Type GSI secure Description Optional GSIFTP File Transfer Yes FTP like No gsidcap File I O Yes Remote file access Yes insecure RFIO File I O No Remote file access Yes secure RFIO gsirfio File I O Yes Remote file access Yes In the current gLite 3 0 release every SE must have a GSIFTP server 38 The GSIFTP protocol offers basically the functionality of FTP i e the transfer of files but enhanced to support GSI security It is responsible for secure fast and efficient file transfers to from Storage Elements It provides third party control of data transfer as well as parallel streams data transfer GSIFTP is currently supported by every Grid SE and it is thus the main file transfer protocol in gLite 3 0 However for the remote access and not only copy of files stored in the SEs the protocols currently supported by gLite are the Remote File Input Output protocol RFIO and the GSI dCache Access Protocol gsidcap RFIO was developed to access tape archiving systems such as CASTOR CERN Advanced STORage man ager gt CASTOR does not implement yet a GSI enabled version of RFIO as DPM does and therefore can only be used to access data from any WN within the Local Area Network LAN the only authentication is through UID In the literature the terms GridFTP and GSIFTP are sometimes used interchangeably Strictly speaking GSIFTP
178. ries Page 117 gt drw 512 generated gt rw 12553 iy ay dy gt rw 12553 ZZ_22Z2 invoking srm advisory delete for two of these entries srm advisory delete srm srm grid sara nl 8443 pnfs grid sara nl data lhcb gsi_zz srm srm grid sara nl 8443 pnfs grid sara nl data lhcb zz_zz the files are removed successfully srm get metadata srm srm grid sara nl 8443 pnfs grid sara nl data lhcb gsi_zz srm srm grid sara nl 8443 pnfs grid sara nl data lhcb zz_zz as it is shown in the output of the srm get metadata command for both of these entries gt SRMClientVl getFileMetaData failed to parse SURL java lang RuntimeException could not get storage gt CacheException rc 666 msg Pnfs error can t get pnfsId not a pnfsfile gt SRMClientV1 copy try again Example 7 6 3 4 Retrieving info about a Storage Element The command srm storage element info provided with a valid endpoint wsdl url allows user for retriev ing information about the status of a storage system managed by SRM Ssrm storage element info httpg srm grid sara nl 8443 srm infoProviderl_0 wsdl StorageElementInfo totalSpace 5046586572800 4928307200 KB usedSpace 1933512144967 1888195454 KB availableSpace 3476820574537 3395332592 KB 7 7 JOB SERVICES AND DATA MANAGEMENT With both LCG and gLite WMS some specific JDL attributes allow the user to specify requirements on the input data CERN LCG GDEIS 722398 Manuals Series Page 118
179. rt ce 03 cnaf infn it 2119 blah lsf pps GlueInformationServiceURL ldap cert ce 03 cnaf infn it 2170 mds vo name res ource o grid GlueSchemaVersionMajor 1 GlueSchemaVersionMinor 2 CERN LCG GDEIS 722398 Manuals Series Page 54 cert ce 03 cnaf infn it 2119 blah 1sf pps resource grid dn GlueCEUniqueID cert ce 03 cnaf infn it 2119 blah lsf pps mds vo name resou rce o grid objectClass GlueCETop objectClass Gluec objectClass GlueSchemaVersion objectClass GlueCEAccessControlBase objectClass GlueCEInfo objectClass GlueCEPolicy E 5 1 5 The top BDI A top BDI collects all information coming from site GI Ses BDIles and stores them in a permanent database The top BDII can be configured to get published information from resources in all sites or only from some of them In order to find out the location of a top BDII in a site you can consult the GOC page of this site The BDI will be listed with the rest of the nodes of the site refer to Figure 5 o grid EX mds vo name local Mas Vo name taipeileg2 GlueClusterUniquelD Ixn11 Mds Vo namestriumf Icg2 x _ a _ Mds Vo name nikheficgprod objectClass GlueClusterTop E Mds Vo name cemicg2 GlueCluster a GlueSEUniquelD wacdr002d cern ch GlueSEAccessProtocalType gsiftp GlueSEAccessProtocolType rfio GluelnformationService GlueSARoot alice alice GlueSARoot atlas atlas GlueSARoot cms cms GlueClusterUniquelD lxn1164 cern ch SlueSARool IMcb Men GlueCl
180. ry PLOXVY sicario sca wedge tend MF le dene lee aed wane bene di ae ea Done Contacting lcg voms cern ch 15002 C CH O CERN OU GRID CN host lcg voms cern ch cms Done Greating PERRY ii a a a ahha A MEY bus ada ae a lee ee Done Your proxy is valid until Thu Mar 30 06 17 27 2006 One clear advantage of VOMS proxies over normal proxies is that the middleware can find out to which VO the user belongs to from the proxy while using a normal proxy the VO has to be explicitly specified by other means To create a proxy with a given role e g production and primary group e g cms Heavylons the syntax is voms proxy init voms lt alias gt lt group name gt Role lt role name gt where alias specifies the server to be contacted and the VO see below and usually is the name of the VO For example CERN LCG GDEIS 722398 Manuals Series Page 38 voms proxy init voms cms cms Heavylons Role production voms proxy init uses a configuration file whose path can be specified in several ways if the file is a direc tory the files inside it are concatenated and taken as the actual configuration file A user level configuration file which must be owned by the user is looked for in these locations e the argument of the userconf option e the file SHOME glite vomses If it is not found a system wide configuration file which must be owned by root is looked for in these locations e the argument of the confile option
181. s done by adding an expression like the following other GlueCEInfoTotalCPUs gt lt NodeNumber gt amp amp Member MPICH other GlueHostApplicationSoftwareRunTimeEnvironment CERN LCG GDEIS 722398 Manuals Series Page 85 to the the JDL requirements expression remember that this addition is automatically performed by the UI so you do not have to do it yourself Attention The executable that is specified in the JDL must not be the MPI application directly but a wrapper script that invokes this MPI application by calling mpirun This allows the user to perform some pre and post execution steps in the script This usually includes also the compilation of the MPI application since the resulting binary may be different depending on the MPI version and configuration It is good practice to specify the list of machines that mpirun will use not to risk the usage of a default possibly stale list This cannot be the case if mpiexec is used but this command is only available on Torque or PBS systems One more thing to say regarding MPI jobs is that some of them require a shared filesystem among the WNs to run In order to know if a CE offers a shared filesystem the variable VO_ lt name_of_VO gt _SW_DIR defined in each WN can be checked The variable will contain a name of a directory for CEs with shared file systems while it will just hold a if there is no shared file systems The following example shows a complete
182. s ued Buns wols eA Buus swen Buus gireso7 AL Zaur sosa Buys o1 pads einpayps ZoPUPeUW LLasuodsay S 0M Penesootsnas Zewrewuesucdsaypayewisa Ego Buys 4aLMOIECpIo ZELUISQOPELOL buuis 410 E1eg Buy amo rED07 Zaur sepr Burret Bu sug uopesipdy Buusarecpio zemsoorbujuuny Un 351 M1 30 Buus gi 1e007 smeisa ISMAS Zeusu LLasuodsay Is 0p zerursu Lesuodsay PaeWISa bulis 3S1MEJ30 Buu suigereg aur saorBusren Bus aig uopeorddy zer saorbujuuny IRA J6utisa my3sBg PALLODSS2007 zau gosar pauBbpsy LJ burs Buystosuo Zelursndopaubpsy bus sabe LEI ZEMI LNLO zolo adam e o zur seprBuuunyxeyi Bunsen SoH ZAU SOP Tajo LEY 1 Buys uomaa YH D ZELULSW LN Oey I Buys LO 513 ASVIy1 J6uusamy ZEUSU LO DIE Mey I Psunrad sv1y7 esmianacmcomsa fovea bunts 1d int ITTY NES LOPLI Ju buno Buuys ewen Buus sndoreobo7 1 Buns qianbiun gigg accel ods Wewerbunnduoo Buys sweN uus aianbiun bus LNAA Buns 10d bunis Sup Buuys qianbiun Figure 16 DIT for the computing resources Page 163 Manuals Series CERN LCG GDEIS 722398 as Piau o JN LOBA ANO AN 0 Loppsog LAM parag TEMLVENIY TEMUCO TEHUN E _siemucsuopesindy PS suo Bu ge ule Guzcomerar weunue Bu ge ouer CEE Figure 17 DIT for the worker nodes Page 164 Manuals Series CERN LCG GDEIS 722398 BSP PINISWLUOD 10 JON UOHIP3 AIUN
183. ser tools are intended for being used against GSIFTP servers on SEs edg gridftp exists URL Checks the existence of a file or directory on a SE edg gridftp ls URL Lists a directory on a SE edg gridftp mkdir URL Creates a directory on a SE edg gridftp rename sourceURL destURL Renames a file on a SE edg gridftp rm URL Removes a file from a SE edg gridftp rmdir URL Removes a directory on a SE globus url copy sourceURL destURL Copies files between SEs CERN LCG GDEIS 722398 Manuals Series Page 112 The commands edg gridftp rename edg gridftp rm and edg gridftp rmdir should be used with ex treme care and only in case of serious problems In fact these commands do not interact with any of the catalogs and therefore they can compromise the consistency coherence of the information contained in the Grid Also globus url copy is dangerous since it allows the copy of a file into the Grid without enforcing its registration All the edg gridftp commands accept gsiftp as the only valid protocol for the TURL Some usage examples are shown They are by no means exhaustive To obtain help on these commands use the option usage or help General information on gridFTP is available in 38 Example 7 6 2 1 Listing and checking the existence of Grid files The edg gridftp exists and edg gridftp 1s commands can be useful in order to check if a file is physi cally in a SE regardless of its presence in the Grid catalogs leg lr
184. sion version of the service major minor patch eServiceEndpoint network endpoint for the service eServiceStatus status of the service OK Warning Critical Unknown Other eServiceStatusInfo textual explanation of the status eServiceWSDL URI of the service WSDL eServiceSemantics URL of a detailed description of the service CERN LCG GDEIS 722398 Manuals Series Page 152 GlueServiceStartTime time of last service start GlueServiceOwner owner of the service e g the VO The attributes GlueServicePrimaryOwnerName GlueServicePrimaryOwnerContact GlueServiceHostingOrganization GlueServiceMajorVersion GlueServiceMinorVersion GlueServicePatchVersion GlueServiceAccessControlRule and GlueServiceInformationServiceURL are deprecated from version 1 2 of the GLUE schema G 3 ATTRIBUTES FOR THE COMPUTING ELEMENT These are attributes that give information about a CE and its composing WNs In the GLUE Schema they are defined in the UML diagram for the Computing Element Base class for the CE information objectclass GLueCETop No attributes Base class for the cluster information objectclass GlueClusterTop No attributes Computing Element objectclass GlueCE GlueCEUniquelbD unique identifier for the CE GlueCEName human readable name of the service General Info for the queue associated to the CE objectclass GlueCEInfo GlueCEInfoLRMSType name of the local batch system Glu
185. srm advisory delete Remove a file from a SRM storage srm storage element info Retrieves information about a SRM managed storage srm get request status Retrieves status about a given requestID All these commands do not come with man pages available even though basic functionalities can be displayed using the option h Examples of their use are thereafter described Example 7 6 3 1 Copy a file to a SRM Storage The srmcp command allows for copying files from a local disk to a SRM storage or between two SRM storage elements two party or third party transfers It can be invoked either by specifying the SRM URLs source destination pair or by passing a list of URLs pairs in a files through the copyjobfile option It can also accept gsiftp TURLs The following example shows how to correctly invoke the command for copying a local file to a remote SRM storage Ssrmcp file pwd zz_zz f srm srm grid sara nl pnfs grid sara nl data lhcb zz_zz Please note that there are four slashes between file protocol and the input file expressed in absolute path The output looks like E execution of CopyJob source file afs cern ch user s santinel zz_zz f destination gsiftp srm grid sara nl 2811 pnfs grid sara nl data lhcb zz_zz completed setting file request 2147196960 status to Done doneAddingJobs is true copy_jobs is empty stopping copier which clearly indicates that the file has been copied successfully The command returns 0 fo
186. srm_put request hidden to the user which will remain open unless explicitly closed at least in SRM v1 currently deployed It is important to know that some SRM Storage Elements are limited in the maximum number of open requests Further requests will fail once this limit has been reached It is therefore good practice to close the request once the TURL is not needed anymore This can be done with the 1cg sd command which needs as arguments the TURL of the file the requestID and fileID lcg gt srm castorsrm cern ch castor cern ch grid dteam generated 2005 04 12 file fadle7f b 9d83 4050 af51 4c9af7bb095c gsiftp gt gsiftp castorgrid cern ch 2811 shift lxfsrk4705 data02 cg stage filefadle7fb 9d 83 4050 af51 4c9af7bb095c 43309 gt 337722383 gt 0 CERN LCG GDEIS 722398 Manuals Series Page 110 do something with the TURL lcg sd gsiftp castorgrid cern ch 2811 shift lxfsrk4705 data02 cg stage filefad1 e7f fb 9d83 4050 af51 4c9af7bb095c 43309 337722383 0 Example 7 6 1 6 Deleting replicas A file that is stored on a Storage Element and registered with a catalog can be deleted using the 1cg del command If a SURL is provided as argument then that particular replica will be deleted If a LFN or GUID is given instead then the s lt SE gt option must be used to indicate which one of the replicas must be erased unless the a option is used in which case all replicas of the file will be deleted and unregi
187. ssful Otherwise the location of the generated log file containing error messages is printed on the standard output This option has been provided to make easier use of the glite job submit command inside scripts as an alternative to the o option Example 6 3 1 2 Listing Computing Elements that match a job description It is possible to see which CEs are eligible to run a job specified by a given JDL using the glite job 1ist match for gLite WMS or edg job list match for LCG RB The rank option can be used to display the ranking value of each matching resource glite job list match rank Hello jdl Selected Virtual Organisation name from proxy certificate extension atlas Connecting to host cert rb 01 cnaf infn it port 7772 KKEKKKKKKKKKKKKK KK KK KKK KK KKK KKK KKK KKK KKK KKK KK KK AAA AAA AAA KK KARA RARA KA KK KK AAA COMPUTING ELEMENT IDs LIST The following CE s matching your job requirements have been found CEId Rank ce03 pic es 2119 jobmanager lcgpbs pps 58 prep ce 01 pd infn it 2119 blah 1sf atlas 3810 CERN LCG GDEIS 722398 Manuals Series Page 74 cg01 ific uv es 2119 blah pbs atlas 9850 ce02 pic es 2119 blah pbs pps 999999 kkxkxkxkxkxkxkxkxkkkkxkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkxkkkkkkkkxk The o lt file path gt option can be used to store the CE list on a file which can later be used with the i lt file path gt option of glite job submit 6 3 2 Job Operations After
188. ssion owner group owner others the latter correspond to lists of supplementary users groups The existence of supple mentary users groups does require also an ACL mask be defined Two different types of extended ACLs access and default e access ACLs can be set on both files and directories and are used to control the access e default ACLs can be set on directories and are inhereted as access ACLs by every file or sub directory underneath unless explicitely specified differently The default ACLs are also inhereted as default ACLs by every sub directory LFC and any SE HPSS with SRM v2 interface support Posix ACLs 7 8 3 Protection of files Permission to delete a file depends on the S_ISVTX bit in the parent directory one always needs to have write permission on the parent directory Furthermore if the S ISVTX bit in the parent directory is set one further needs to be the owner of the file or write permission on the files Hint In case of user analysis files in order to guarantee privacy it is suggested to set the S_ISVTX bit on parent directory and the mode of the file to 0644 or even to 0600 if the file must only be seen by the owner DN Conversely in case of production files the bit of the parent directory must be set and the permission mode to 0644 the file must been accessible at least by the members of the group A special VOMS role Ihcbdataadmin can finally be defined to grant users with this role accessing all d
189. stFileLifeTime time for which the file will stay on the storage device GlueHostFileOwner name of the owner of the file e Location objectclass GlueLocation GlueLocationLocalID local ID for the location GlueLocationName name GlueLocationPath path GlueLocationVersion version e VO View objeclass GlueVOView GlueVOViewLocalID local ID for this VO view G 4 ATTRIBUTES FOR THE STORAGE ELEMENT These are attributes that give information about a SE and the corresponding storage space In the GLUE Schema they are defined in the UML diagram for the Storage Element It is worth noting that the GlueSE object class which maps to the StorageService element in the GLUE Schema publishes information of the manager service of a SE the GlueSL object class which maps to the StorageLibrary element publishes information related to the access node for the SE and the GlueSA object class which maps to the StorageSpace element gives information about the available space in the SE e Base Class for the storage service objectclass GLueSETop No attributes e Base Class for the storage library objectclass GlueSLTop No attributes e Base Class for the storage space objectclass GlueSATop CERN LCG GDEIS 722398 Manuals Series Page 157 No attributes e Storage Service objectclass GlueSE Glu Glu Glu Glu Glu Glu Glu eSEUniqueld unique identifier of the storage s
190. stered on a best effort basis If all the replicas of a file are removed the corresponding GUID LFN mappings are removed as well leg lr vo dteam guid 91b89dfe ff 95 4614 bad2 c538bfa28fac gt sfn 1xb0707 cern ch flatfiles SE00 dteam generated 2004 07 12 file78ef5a13 166f 4701 8059 e70e397dd2ca gt sfn 1xb0710 cern ch flatfiles SE00 dteam generated 2004 07 12 file21658bfb 6eac 409b 9177 88c07bbla57c lcg del vo dteam s 1xb0707 cern ch guid 91b89dfe ff95 4614 bad2 c538bfa28fac leg lr vo dteam guid 91b89dfe ff 95 4614 bad2 c538bfa28fac gt sfn 1xb0710 cern ch flatfiles SE00 dteam generated 2004 07 12 file21658bfb 6eac 409b 9177 88c07bb1a57c lcg del vo dteam a guid 91b89dfe ff 95 4614 bad2 c538bfa28fac leg lr vo dteam guid 91b89dfe ff 95 4614 bad2 c538bfa28fac gt lcg_lr No such file or directory The last error indicates that the GUID is no longer registered within the catalogs of LCG 2 as the last replica was deleted Example 7 6 1 7 Registering and unregistering Grid files The lcg rf register file command allows to register a file physically present in a SE creating a GUID SURL mapping in the catalog The g lt GUID gt allows to specify a GUID otherwise automatically created lcg rf v vo dteam g guid baddb707 0cb5 4d9a 8141 a046659d243b sfn 1xb0710 cern ch flatfiles SE00 dteam generated 2004 07 08 file0dcabb46 2214 4db8 9ee8 2930delabbef gt guid baddb707 0cb5 4d9a 8
191. sts of two catalogs the Local Replica Catalog LRC and the Replica Metadata Catalog RMC The LRC keeps mappings between GUIDs and SURLs while the RMC keeps mappings between GUIDs and LFNs Both RMC and LRC support the use of metadata User metadata should all be confined in RMC while LRC should contain only system metadata file size creation date checksum etc Please refer to the LCG 2 User Guide for further readings and acknowledgments about RLS structure and usage The LFC was developed to overcome some serious performance and security problems of the old RLS catalogs it also adds some new functionalities such as transactions roll backs sessions bulk queries and a hierarchical name space for LFNs It consists of a unique catalog where the LEN is the main key see Figure 11 Further LFNs can be added as symlink to the main LFN System metadata is supported while for user metadata only a single string entry is available rather a description field than real metadata Migrations from RLS to LFC made possible through specific tools have permitted all VOs to gradually adopt LFC as unique File Catalog in LCG and to abandon definitely RLS IMPORTANT A file is considered to be a Grid file if it is both physically present in a SE and registered in the file catalog In this chapter several tools will be described In general high level tools like Icg_utils see Sec 7 6 1 will ensure consistency between files in the SEs and entries in the
192. supported by the SE Figure 5 shows the relationship between the different names of a file The mappings between LFNs GUIDs and SURLs are kept in a service called a File Catalog while the files CERN LCG GDEIS 722398 Manuals Series Page 25 themselves are stored in Storage Elements The only file catalog officially supported in WLCG EGEE is the LCG File Catalog LFC The Data Management client tools are described in detail in Chapter 7 They allow a user to move data in and out of the Grid replicate files between Storage Elements interact with the File Catalog and more LCG high level data management tools shield the user from the complexities of Storage Element and catalog implementations as well as transport and access protocols Low level tools are also available but should be needed only by expert users 3 2 7 Workload Management The purpose of the Workload Management System WMS is to accept user jobs to assign them to the most appropriate Computing Element to record their status and retrieve their output The Resource Broker RB is the machine where the WMS services run 241 Jobs to be submitted are described using the Job Description Language JDL which specifies for example which executable to run and its parameters files to be moved to and from the worker node input Grid files needed and any requirements on the CE and the worker node The choice of CE to which the job is sent is done in a process called match
193. t are not in the specified state This two lasts options are mutually exclusive although they can be used with from and to In the following examples the first command retrieves all jobs of the user that are in the state DONE or RUNNING and the second retrieves all jobs that were submitted before the 17 35 of the current day and that were not in the CLEARED state glite job status all s DONE s RUNNING glite job status all e CLEARED to 17 35 NOTE for the a11 option to work it is necessary that an index by owner is created in the LB server otherwise the command will fail since it will not be possible for the LB server to identify the user s jobs Such index can only be created by the LB server administrator as explained in section 5 2 2 of 25 With the option o lt file path gt the command output can be written to a file Example 6 3 2 2 Canceling a job A job can be canceled before it ends using the command glite job cancel for the gLite WMS or edg job cancel for the LCG RB This command requires as arguments one or more job identifiers For example glite job cancel https cert rb 01 cnaf infn it 9000 IQM Vq20r9Rzgc5tUMWdg CERN LCG GDEIS 722398 Manuals Series Page 76 Are you sure you want to remove specified job s y n n y glite job cancel Success The cancellation request has been successfully submitted for the following job s https cert rb 01 cnaf infn it 90
194. t exist then this means that this LFC server does not support that VO Once the correct environment has been set the following commands can be used CERN LCG GDEIS 722398 Manuals Series Page 97 1fc chmod 1fc chown 1fc delcomment lfc getacl lfc lin Itc 1s lfc mkdir lfc rename lfc rm lfc setacl lfc setcomment lfc entergrpmap lfc enterusrmap 1fc modifygrpmap 1fc modifyusrmap 1fc rmgrpmap lfc rmusrmap Change access mode of a LFC file directory Change owner and group of a LFC file directory Delete the comment associated with a file directory Get file directory access control lists Make a symbolic link to a file directory List file directory entries in a directory Create directory Rename a file directory Remove a file directory Set file directory access control lists Add replace a comment Defines a new group entry in the Virtual ID table Defines a new user entry in Virtual ID table Modifies a group entry corresponding to a given virtual gid Modifies a user entry corresponding to a given virtual uid Suppresses group entry corresponding to a given virtual gid or group name Suppresses user entry corresponding to a given virtual uid or user name Many of the commands provide administrative functionality Their names indicate what their functions are Man pages are available for all the commands Most of the commands work in a very similar way to their Unix equivalents but operating on direc
195. t of the job can be modified using the Environment attribute For example Environment CMS_PATH SHOME cms CMS_DB CMS_PATH cmdb The VirtualOrganisation attribute can be used to explicitly specify the VO of the user VirtualOrganisation cms but is superseded by the VO contained in the user proxy if a VOMS proxy is used For normal proxies the VO can either be specified in the JDL in the UI configuration files or as argument to the job submission command see section 6 3 1 Note A common error is to write VirtualOrganization It will not work To summarise a typical JDL for a simple Grid job will look like Executable test sh Arguments fileA fileB Stdoutput std out StdError std err InputSandbox test sh fileA fileB OutputSandbox stdout std err where test sh could be bin sh echo First file cat 1 echo Second file cat 2 In section 6 3 1 it is explained how to submit such job CERN LCG GDEIS 722398 Manuals Series Page 68 Example 6 2 2 Specifying requirements on the CE The Requirements attribute can be used to express constraints on the resources where the job should run Its value is a Boolean expression that must evaluate to true for a job to run on that specific CE For that purpose all the GLUE attributes of the IS can be used by prepending the other string to the attribute name For a list of GLUE attributes see Appendix G Note
196. t workload management systems are deployed the legacy LCG 2 system derived from the EDG project and the new system from the EGEE project which is an evolution of the former and therefore has more functionalities In the following sections we will describe the basic concepts of the language used to describe a job the basic command line interface to submit and manage user jobs the more advanced command line interface that allows to submit groups of jobs some useful tools and how to use advanced job types 6 2 JOB DESCRIPTION LANGUAGE The Job Description Language JDL is a high level language based on the Classified Advertisement ClassAd language 30 used to describe jobs and aggregates of jobs with arbitrary dependency relations The JDL is used in WLCG EGEE to specify the desired job characteristics and constraints which are used by the match making process to select the best resource to execute the job The fundamentals of the JDL are given in this section A complete description of the JDL syntax is out of the scope of this guide and can be found in 32 The description of the JDL attributes for the EDG WMS is in 33 and for the gLite WMS is in 33 35 A job description is a file called JDL file consisting of lines having the format attribute expression and terminated by a semicolon Expressions can span several lines but only the last one must be terminated by a semicolon Literal strings are enclosed in double quot
197. teAvailableSpace 275474688 GlueSAStateUsedSpace 35469432 GlueChunkKey GlueSEUniqueID gw38 hep ph ic ac uk dn GlueSALocalID alice GlueSEUniquelD grid08 ph gla ac uk mds vo name UKI Sco tGrid Gla PPS mds vo name local o grid GlueSAStateAvailableSpace 3840000000 CERN LCG GDEIS 722398 Manuals Series Page 57 GlueSAStateUsedSpace 1360000000 GlueChunkKey GlueSEUniqueID grid08 ph gla ac uk dn GlueSALocalID alice GlueSEUniqueID grid1l3 csl ee upatras gr mds vo name Pr eGR 02 UPATRAS mds vo name local o grid GlueSAStateAvailableSpace 186770000 GlueSAStateUsedSpace 10090000 GlueChunkKey GlueSEUniqueID grid13 csl ee upatras gr dn GlueSALocalID alice alice GlueSEUniqueID castor grid sinica edu tw mds vo name Taiwan PPS mds vo name local o grid GlueSAStateAvailableSpace 3700000000 GlueSAStateUsedSpace 12440000000 GlueChunkKey GlueSEUniquelD castor grid sinica edu tw dn GlueSALocalID alice GlueSEUniquelD epbf004 ph bham ac uk mds vo name UKI S OUTHGRID BHAM PPS mds vo name local o grid GlueSAStateAvailableSpace 58460000 GlueSAStateUsedSpace 3160000 GlueChunkKey GlueSEUniqueID epbf004 ph bham ac uk Lace 5 2 R GMA As explained in section 3 2 5 R GMA is an alternative information system to MDS The standard Glue information is published in R GMA together with various monitoring data and the system is also available for users to publish their own data The system can be used via a com
198. team my_temp_file Open successful Write successful Close successful Reading back sfn 1xb0707 cern ch flatfiles SE00 dteam my_temp_file Open successful Read successful Value of readValues 0 0 Value of readValues 1 10 Value of readValues 2 20 Value of readValues 3 30 Value of readValues 4 40 Value of readValues 5 50 Value of readValues 6 60 Value of readValues 7 70 Value of readValues 8 80 Value of readValues 9 90 Close successful Done It is important to notice that the creation of a new file using GFAL does not imply the registration of that file This means that if the created file has to be used as a Grid file then it should be manually registered using 1cg rf Otherwise the file should be deleted using edg gridftp rm As seen by using GFAL an application can access a file remotely almost as if it was local substituting POSIX calls by those of GFAL For more information on GFAL refer to the manpages of the library gfal and of the different calls gfal_open gfal_write CERN LCG GDEIS 722398 Manuals Series Page 142 In addition to GFAL there is also the possibility to use the RFIO s C and C APIs which also give applica tions the possibility to open and read a file remotely Nevertheless RFIO presents several limitations in comparison to GFAL First of all it can only be used to access those SEs or SRMs that support the RFIO protocol wh
199. tem may be queried directly through the ldapsearch command or via the 1cg info wrapper the lcg infosites wrapper may be used or a monitoring tool e g a web page displaying information on Grid resources can be checked All these methods are described in Chapter 5 In what follows some usage examples are given Most command can run in verbose mode v or verbose option For details on the options of each command please use the man pages of the commands Example 7 6 1 1 Uploading a file to the Grid TThis variable can also be used by GFAL when resolving LFNs and GUIDs as described in Appendix F 8As discussed in these is not true for the LFC interaction commands 1fc which require that the LFC_HOST variable is defined explicitly CERN LCG GDEIS 722398 Manuals Series Page 106 In order to upload a file to the Grid i e to transfer it from the local machine to a Storage Element and register it in the catalog the 1cg cr command which stands for copy and register can be used lcg cr vo dteam d 1xb0710 cern ch file home antonio filel gt guid 6ac49lea 684c 11d8 8f12 9c97Tcebf582a where the only argument is the local file to be uploaded a fully qualified URI and the d lt destination gt option indicates the SE used as the destination for the file The command returns the unique GUID If no destination is given the SE specified by the VO_ lt VO name gt _DEFAULT_SE environmental variable is taken Such variab
200. the script the MPI application is first compiled then the list of hosts where to run is created by reading from the appropriate batch system information The variable CPU_NEEDED stores the number of nodes that are available The script also checks that ssh works for all listed nodes This step should not be required but it is a good safety measure to detect misconfigurations in the site and avoid future problems Finally mpirun is called with the np and machinefile options specified The retrieved output of a job execution follows KKEKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KK KK KK KR ARA AA KK KK AAA AAA Running on nodel6 4 farmnet nikhef nl As dteam005 kkxkxkxkxkxkxkxkxkkkkxkkxkkkkkkkkkkxkkkkkkkkkkkkkkxkkkkkkkkkkxkkkkkxkkkkkkkkkkkkkkkkx k Compiling binary MPItest mpicc o MPItest MPItest c PBS Nodefile var spool pbs aux 625203 tbn20 nikhef nl KEKKKKKKKKKKKKKK KK KKK KKK KKK KK KK KK KK KARA KK KK KARA AAA AA KK KK KA KA AA AA Node count 10 Nodes in var spool pbs aux 625203 tbn20 nikhef nl nodel6 4 farmnet nikhef nl nodel16 44 farmnet nikhef nl nodel6 45 farmnet nikhef nl nodel6 45 farmnet nikhef nl nodel16 46 farmnet nikhef nl nodel6 46 farmnet nikhef nl nodel16 47 farmnet nikhef nl nodel16 47 farmnet nikhef nl nodel6 48 farmnet nikhef nl nodel6 48 farmnet nikhef nl kkxkxkxkxkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkxk Checking ssh for each node Checking nodel6 4 farmnet
201. tion on Grid resources for the most common use cases USAGE lcg infosites vo lt vo name gt options v lt verbose level gt f lt Name of the site gt is lt BDII to query gt Description of the Attributes vo The name of the user vo mandatory options The tool admits the following options CERN LCG GDEIS 722398 Manuals Series Page 43 se closeSE Irc rmc lfc fcLocal tb dli vobox fts sitenames NOTE tag all The information related to number of CPUs running jobs waiting jobs and names of the CEs are provided All these data group all VOs together v 1 only the names of the queues will be printed v 2 The RAM Memory together with the operating system and its version and the processor included in each CE are printed The names of the SEs supported by the user s VO together with the kind of Storage System the used and available space will be printed v 1 only the names of the SEs will be printed The names of the CEs where the user s VO is allowed to run together with their corresponding closest SEs are provided The name of the Irc rmc corresponding to the user s VO The name of the local machine hosting the LFC catalog is printed Names of the RBs availables for each VO Names of the dlis for each VO Names of the VOBOX available for each VO Names of the fts Endpoints per VO Names of the LCG sites rb dli vobox fts support the f option
202. tisfied with the default ones The main configuration file is located under GLITE_WMS_LOCATION etc glite_wmsui_cmd_var conf ina gLite 3 0 UI and SEDG_WL_LOCATION etc edg_wl_ui_cmd_var conf ina LCG UI This file sets among other things the default VO the default location for job outputs and command log files and the default values of mandatory JDL attributes It also adds by default the requirement other GlueCEStateStatus Production so that CEs that are not in a condition to accept jobs e g Closed do not match It is possible to point to a different configuration file by setting the value of the environment variable CERN LCG GDEIS 722398 Manuals Series Page 89 SGLITE_WMS_UI_CONFIG_VAR in a gLite 3 0 UI SEDG_WL_UI_CONFIG_VAR in a LCG UI to the file path or by specifying the file in the config lt file gt option of the glite job or edg job commands which takes precedence In addition VO specific configurations are defined by default in the file SEDG_WL_LOCATION etc lt vo gt edg_wl_ui conf ina LCG UI SGLITE_WMS_LOCATION etc lt vo gt glite_wmsui conf ina gLite 3 0 UD consisting essentially in the list of Network Servers Proxy Servers and LB servers to be used when submitting jobs A different file can be specified using the variable SEDG_WL_UI_CONFIG_VO in a LCG UI SGLITE_WMS_UI_CONFIG_VO in a gLite 3 0 UD or the config vo lt file gt option of the glite job edg job commands This
203. to submit it is aborted This can easily happen for example if the job takes very long to execute or if it stays in the queue for very long The easiest solution to the problem is to use very long lived proxies but at the expense of an increased security risk Moreover the duration of a VOMS proxy is limited by the VOMS server and cannot be made arbitrarily long To overcome this limitation a proxy credential repository system is used which allows the user to create and store a long term proxy certificate on a dedicated server MyProxy server The WMS will then be able to use this long term proxy to periodically renew the proxy for a submitted job before it expires and until the job ends or the long term proxy expires To see 1f a WLCG EGEE site has a MyProxy Server the GOC database should be consulted MyProxy servers have as node type PROX As the renewal process starts 30 minutes before the old proxy expires it is necessary to generate an initial proxy long enough or the renewal may be triggered too late after the job has failed with the following error Status Reason Got a job held event reason Globus error 131 the user proxy expired job is still running The minimum recommended time for the initial proxy is 30 minutes and job submission is forbidden for proxies with a remaining lifetime less than 20 minutes an error message like the following will be produced Error UI_PROXY DURATION Proxy certificate will exp
204. tories and files of the catalog name space Where the path of a file directory is required an absolute path can be specified starting by or otherwise the path is prefixed by the contents of the SLFC_HOME environment variable Users should use these commands carefully keeping in mind that the operations they are performing affect the catalog but not the physical files that the entries represent This is especially true in the case of the 1fc rm command since it can be used to remove the LEN for a file but it does not affect the physical files If all the LFNs pointing to a physical Grid file are removed then the file is no longer visible to gLite users although it may still be accessed by using its SURL directly Example 7 4 1 1 Creating directories in the LFC Users cannot implicitly create directories in the LFNs name space when registering new files with LCG Data Management tools see for example the 1cg cr command in Section 7 6 1 Directories must be created in advance using the 1fc mkdir command This is currently the only main use of the 1fc commands for the average user For other tasks the lcg utils should be normally used lfc mkdir grid lhcb test_roberto MyTest lfc 1s 1 grid lhcb test_roberto drwxrwxr x 0 santinel z5 0 Feb 21 16 50 MyTest rWXrwWXr x 1 santinel z5 1183 Oct 14 11 33 gfal_test rWXrwWXr x 1 santinel z5 270 Jan 09 10 18 input_data drwxr xr x 50 santinel z5 0 Jun 09 2005 insert rWXrwWXr x
205. tsches Elektronen Synchrotron DESY and the Fermi National Accelerator Laboratory FERMI More information in CERN LCG GDEIS 722398 Manuals Series Page 93 e Mass Storage System it consists of a Hierarchical Storage Management HSM system for files that may be migrated between front end disk and back end tape storage hierarchies The migration of files between disk and tape storage is managed by a stager process The stager manages one or more disk pools or groups of one or more UNIX file systems residing on one or more disk servers The stager is responsible for space allocation and for file migration between the disk pool and tape storage The MSS exposes to the user a virtual file system which hides the complexity of the internal details A classic interface to a MSS consists in a GridFTP front end also a load share balance solution has been deployed which ensures file transfer capabilities The files access protocol depends instead on the type of Mass Storage System insecure RFIO for CASTOR gsidcap for a dCache disk pool manager front end etc It is responsibility of the application to figure out which type of protocol is supported for instance querying the information system and access files accordingly Nevertheless the GFAL API does this transparently see Appendix F The CASTOR MSS can also expose an SRM interface This is a desired solutions since it hides internal complexities inherent to access and transfer protocols I
206. tweight Directory Access Protocol LCG File Catalog Local File Name Large Hadron Collider LHC Computing Grid Local Replica Catalog Local Resource Management System Load Sharing Facility Monitoring and Discovery Service Message Passing Interface Mass Storage System Operating System Portable Batch System Physical File name Process Dentifier Pool of Persistent Objects for LHC Rutherford Appleton Laboratory Resource Broker Remote File Input Output Relational Grid Monitoring Archictecture Replica Location Index CERN LCG GDEIS 722398 Manuals Series Page 13 RLS Replica Location Service RM Replica Manager RMC Replica Metadata Catalog RMS Replica Management System ROS Replica Optimization Service SASL Simple Authorization amp Security Layer LDAP SE Storage Element SFT Site Functional Tests SMP Symmetric Multi Processor SRM Storage Resource Manager SURL Storage URL TURL Transport URL UT User Interface URI Uniform Resource Identifier URL Universal Resource Locator UUID Universal Unique ID VDT Virtual Data Toolkit VO Virtual Organization WLCG Worldwide LHC Computing Grid WMS Workload Management System WN Worker Node WPn Work Package n CERN LCG GDEIS 722398 Manuals Series Page 14 i 2 EXECUTIVE SUMMARY This user guide is intended for users of the gLite 3 0 middleware In these pages the user will find an introduction to the services provided by the WLCG
207. type can be either LATEST CONTINUOUS or HISTORY these are standard R GMA query types For LATEST and HISTORY queries some seconds are waited after the query is made After that the returned results or a no events message if none is got are printed In the case of CONTINUOS queries the status is checked every 5 seconds until the program is exited via Ctr1 C or the Done status is reached Information on this tool can be found under http goc grid sinica edu tw gocwiki Job_Status_Monitoring D 5 TIME LEFT UTILITY LCG GETJOBSTATS LCG_JOBSTATS PY The tools described in this section can be invoked by a running job to determine some statistics about itself Right now the parameters that can be retrieved are e How much CPU or wall clock time it has consumed e How long it can still run before reaching the CPU or wall clock time limits There are scripts CLI and python modules API that can be used for these purposes They are described in detail later They work by querying the CE s batch system using different ways depending on the batch system that the given CE uses CERN LCG GDEIS 722398 Manuals Series Page 131 The results returned by the tools are not always trustable since some sites do not set time limits and some LRMSs cannot provide resource usage information to the tool The underlying idea of these CLI and API is that the experiment decides how to use them and when to trust their results Please read
208. ue identifier for the cluster GlueClusterName human readable name of the cluster The attribute GlueClusterService is deprecated from version 1 2 of the GLUE schema e Subcluster objectclass GLueSubCluster GlueSubClusterUniqueID unique identifier for the subcluster CERN LCG GDEIS 722398 Manuals Series Page 154 GlueSubClusterName human readable name of the subcluster GlueSubClusterTmpDir path of temporary directory shared among worker nodes GlueSubClusterWNTmpDir path of temporary directory local to the worker nodes GlueSubClusterPhysicalCPUs total number of real CPUs in the subcluster GlueSubClusterLogicalCPUs total number of logical CPUs e g with hyperthreading e Host objectclass GlueHost Glue Glue ostUniqueld unique identifier for the host ostName human readable name of the host e Architecture objectclass GlueHostArchitecture Glue Glue ostArchitecturePlat formType platform description ostArchitectureSMPSize number of CPUs in a SMP node e Processor objectclass GlueHostProcessor Glue Glue Glue Glue Glue Glue Glue Glue Glue Glue Glue Glue Glue Glue Glue Glue Glue Glue Glue ost ost ost ost ost ost ost ost ost ost ProcessorVendor name of the CPU vendor ProcessorModel name of the CPU model ProcessorVersion version
209. uler GlueCEStateFreeJobSlots number of jobs that could start given the current number of jobs sub mitted e CE Policy objectclass GlueCEPolicy GlueCEPolicyMaxWallClockTime maximum wall clock time available to jobs submitted to the CE in minutes GlueCEPolicyMaxCPUTime maximum CPU time available to jobs submitted to the CE in minutes GlueCEPolicyMaxTotalJobs maximum allowed total number of jobs in the queue GlueCEPolicyMaxRunningJobs maximum allowed number of running jobs in the queue GlueCEPolicyPriority information about the service priority GlueCEPolicyAssignedJobSlots maximum number of single processor jobs that can be running at a given time e Access control objectclass GlueCEAccessControlBase GlueCEAccessControlBaseRule a rule defining any access restrictions to the CE Current semantic VO a VO name DENY an X 509 user subject e Job currently not filled the Logging and Bookkeeping service can provide this information object class GlueCEJob GlueCEJobLocalOwner local user name of the job s owner GlueCEJobGlobalOwner GSI subject of the real job s owner GlueCEJobLocal11D local job identifier GlueCEJobGloballId global job identifier GlueCEJobGlueCEJobStatus job status SUBMITTED WAITING READY SCHEDULED RUN NING ABORTED DONE CLEARED CHECKPOINTED GlueCEJobSchedulerSpecific any scheduler specific information e Cluster objectclass GlueCluster GlueClusterUniquelD uniq
210. ume the states of the job they belong to with the exception of the wait state indicating the file transfer has failed once and it is candidate for another try 7 5 3 FTS Commands Before submitting a job the user is expected to upload an appropriate credential the grid certificate into the MyProxy server used by FTS The upload has to be ade using the DN as username mode since the DN of the client who submits the job is later used by FTS to retrieve the credential form MyProxy This is achieved with the following options of the myproxy init command myproxy init s myproxy fts cern ch d The same password used is passed to FTS at the Job submission time The following user level commands for submitting querying and canceling jobs are described here glite transfer submit Submits a data transfer job glite transfer status Displays the status of an ongoing data transfer job glite transfer list Lists all submitted transfer jobs owned by that user glite transfer cancel Cancels a file transfer job For matter of completeness the following administrative commands are also briefly described Only few people within a VO explicitly configured on the FTS server are eligible to run them glite transfer channel add Creates a new channel with defined parameters on FTS glite transfer channel list Displays details of a given channel defined on FTS glite transfer channel set Allows administrators to set a channel Active or Inactive glite tr
211. user can specify the nogui option which makes the command provide a simple standard non graphical interaction with the running job Example 6 3 4 1 Simple interactive job The following interactive jdl file contains the description of a very simple interactive job Please note that the Output Sandbox is not necessary since the output will be sent to the interactive window it could be used for further output though JobType Interactive Executable interactive sh InputSandbox interactive sh The executable specified in this JDL is the interactive sh script which follows bin sh echo Welcome echo n Please tell me your name read name echo That is all name echo Bye bye exit 0 The interactive sh script just presents a welcome message to the user and then asks and waits for an input After the user has entered a name this is shown back just to check that the input was received correctly Figure 10 shows the result of the program after the user has entered his name in the generated X window Another option that is reserved for interactive jobs is nolisten it makes the command forward the job standard streams coming from the WN to named pipes on the UI machine whose names are returned to the user together with the OS id of the listener process This allows the user to interact with the job through his own tools It is important to note that when this option is specified the
212. usterName ixn1184 cern ch GlueSARoot dteam dteam H GlueSLUniquelD wacdr002d cern ch GlueClusterService GlueServiceURI http risalice cern ch 777 7 alice v2 2 edg local replica catalog services edg local replica catalog GlueSchemaVersionMajor F GlueServiceURI http risalice cem ch 777 7 alice v2 2 edg replica metadata catalog services edg replica metadata catalog GlueServiceURI http risatlas cern ch 7777 atlas v2 2 edg local replica catalog services edg local replica catalog GlueServiceURI http risatlas cern ch 7777 atlas v2 2 edg replica metadata catalog services edg replica metadata catalog GluelnformationServiceURL F GlueServiceURl http rIscms cern ch 7777 cms v2 2fedg local replica catalog services edg local replica catalog H GlueServiceURI http riscms cern ch 7777 cms v2 2 edg replica metadata catalog services edg replica metadata catalog F GlueServiceURI http rlsdteam cern ch 777 7 dteam v2 2 edg local replica catalog services edg local replica catalog GlueChunkKey F GlueServiceURI http risdteam cern ch 7777 dteam v2 2 edg replica metadata catalog services edg replica metadata catalog e OPE mana std a a F GlueServiceURI http risihcb cern ch 777 7 Incb v2 2 edg local replica catalog services edg local replica catalog solana felueCEUniquelbinnt 104 F GlueServiceURl http s rIsihch cern ch 7777 hcb v2 2 edg replica metadata catalog services edg replica metadata catalog B GlueSEUniquelD castorgrid cern ch GlueSLUniquelD castorgrid cern ch
213. utes will be returned A description of all objectclasses and their attributes to optimize the LDAP search command can be found in Appendix Example 5 1 3 2 Getting information about the site name from the GRIS on a CE ldapsearch x h 1xb2006 cern ch p 2135 b mds vo name local o grid objectclass GlueTop GlueSiteDescription version 2 filter objectclass GlueTop requesting GlueSiteDescription CERN_PPS local grid dn GlueSiteUniqueID CERN_PPS mds vo name local o grid GlueSiteDescription LCG Site search result search 2 result 0 Success numResponses 2 numEntries 1 By adding the LLL option we can avoid the comments and the version information in the reply CERN LCG GDEIS 722398 Manuals Series Page 52 ldapsearch LLL x h 1xn1181 cern ch p 2135 b mds vo name local o grid objectclass GlueTop GlueSiteDescription dn GlueSiteUniqueID CERN_PPS mds vo name local o grid GlueSiteDescription LCG Site 5 1 4 The Site GIIS BDII At each site a site GIIS or BDII collects information about all resources present at a site i e data from all GRISes of the site Site BDIIs are preferred to site GIISes and are the default in gLite 3 0 releases In this section we explain how to query a site GIIS BDII For a list of all sites and all resources present please refer to the GOC database Usually a site GIIS BDII runs on a Computing Element The port used to interr
214. ution of the job will not be created CERN LCG GDEIS 722398 Manuals Series Page 73 The CE is identified by lt CE_Id gt which is a string with the following format lt full_hostname gt lt port_number gt jobmanager lt service gt lt queue_name gt for a LCG CE lt full_hostname gt lt port_number gt blah lt service gt lt queue_name gt for a gLite CE where lt full_hostname gt and lt port gt are the hostname of the machine and the port where the Globus Gate keeper for LCG CE or CondorC BLAH gLite CE is running the Grid Gate lt queue_name gt is the name of one of the available queue of jobs in that CE and the lt service gt could refer to the LRMS such as 1sf pbs condor but can also be a different string as it is freely set by the site administrator when the queue is set up An example of CE Id is adc0015 cern ch 2119 jobmanager lcgpbs infinite for LCG CE prep ce 01 pd infn it 2119 blah 1sf atlas for gLite CE Similarly the i lt file_path gt allows users to specify a list of CEs from where the user will have to choose a target CE interactively Note The LCG Resource Broker is able to submit jobs ONLY to LCG Computing Elements The gLite WMS instead is capable to submit jobs to both the LCG and gLite CEs Lastly the nomsgi option makes the command display neither messages nor errors on the standard output Only the jobId assigned to the job is printed to the user if the command was succe
215. vertisement Condor CLI Command Line Interface CNAF INEN s National Center for Telematics and Informatics dcap DIT DLI DN EDG EDT dCache Access Protocol Directory Information Tree Data Location Interface Distinguished Name European DataGrid European DataTAG EGEE Enabling Grids for E sciencE ESM Experiment Software Manager FCR Freedom of Choice for Resources CERN LCG GDEIS 722398 Manuals Series Page 12 FNAL GFAL GGF GGUS GIIS GLUE GMA GOC GRAM GRIS GSI gsidcap gsirfio GUI GUID HDM ID INFN IS JDL kdcap LAN LB LDAP LFC LFN LHC LGC LRC LRMS LSF MDS MPI MSS OS PBS PFN PID POOL RAL RB RFIO R GMA RLI Fermi National Accelerator Laboratory Grid File Access Library Global Grid Forum Global Grid User Support Grid Index Information Server Grid Laboratory for a Uniform Environment Grid Monitoring Archictecture Grid Operations Centre Globus Resource Allocation Manager Grid Resource Information Service Grid Security Infrastructure GSI enabled version of the dCache Access Protocol GSI enabled version of the Remote File Input Output protocol Graphical User Interface Grid Unique ID Hierarchical Storage Manager Identifier Istituto Nazionale di Fisica Nucleare Information Service Job Description Language Kerberos enabled version of the dCache Access Protocol Local Area Network Logging and Bookkeeping Service Ligh
216. ware and will be modified as new gLite releases are produced In some parts of the document references to the foreseeable future of the gLite software are made 1 5 REFERENCE AND APPLICABLE DOCUMENTS REFERENCES 1 EGEE Enabling Grids for E sciencE http eu egee org CERN LCG GDEIS 722398 Manuals Series Page 8 2 3 a 4 ey 5 a 6 x 7 _ 8 9 10 11 12 13 14 15 16 17 gLite Lightweight Middleware for Grid Computing ttp cern ch glite gt Worldwide LHC Computing Grid ttp cern ch LCG The DataGrid Project ttp www edg org DataTAG Research amp technological development for a Data TransAtlantic Grid ttp cern ch datatag The Globus Alliance ttp www globus org gt GriPhyN Grid Physics Network ttp www griphyn org iVDgL International Virtual Data Grid Laboratory ttp www ivdgl org o Open Science Grid ttp www opensciencegrid org The Virtual Data Toolkit ttp vdt cs wisc edu NorduGrid ttp www nordugrid org gt Ian Foster Carl Kesselman Steven Tuecke The Anatomy of the Grid Enabling Scalable Virtual Organizations ttp www globus org alliance publications papers anatomy paf a gt M Dimou LCG User Registration and VO Management ttps edms cern ch document 428034 o EGEE CIC Operations Portal tip cic in2p3 fr
217. xamples Example 4 4 1 2 Printing information on a proxy certificate CERN LCG GDEIS 722398 Manuals Series Page 36 To print information about a proxy certificate for example the subject or the time left before expiration give the command grid proxy info The output if a valid proxy exists will be similar to subject O Grid O CERN OU cern ch CN John Doe CN proxy issuer O Grid O CERN OU cern ch CN John Doe type full strength 512 bits path tmp x509up_u7026 timeleft 11 59 56 If a proxy certificate does not exist the output is ERROR Couldn t find a valid proxy Use debug for further information Example 4 4 1 3 Destroying a proxy certificate To destroy an existing proxy certificate before its expiration it is enough to do grid proxy destroy If no proxy certificate exists the result will be ERROR Proxy file doesn t exist or has bad permissions Use debug for further information 4 4 2 Virtual Organisation Membership Service The Virtual Organisation Membership Service VOMS is a system which allows to complement a proxy certifi cate with extensions containing information about the VO the VO groups the user belongs to and the role the user has In VOMS terminology a group is a subset of the VO containing members who share some responsibilites or privileges in the project Groups are organized hierarchically like a directory tree starting from a VO wide root group
218. y indicated NOTE In the case where LFNs or GUIDs are used the library needs to contact the file catalogs to get the corresponding TURL Since the catalogs are VO dependant and since the calls do not include any argument to specify the VO GFAL requires the LCG_GFAL_VO environment variable to be set along with the pointer for the Information Service LCG_GFAL_INFOSYS Alternatively the endpoints of the catalogs may be specified directly by setting LFC_HOST LFC or RMC_ENDPOINT and LRC_ENDPOINT RLS OPEN listReplicas GUI D File Access Open Protocol File Access handle handle Protocol Figure 14 Flow diagram of a GFAL call This behaviour is illustrated in Figure 14 which shows the flow diagram of a gfal_open call This call will locate a Grid file and return a remote file descriptor so that the caller can read or write file remotely as it would do for a local file As shown in the figure first if a GUID is provided GFAL will contact a file catalog to retrieve the corresponding SURL Then it will access the SRM interface of the SE that the SURL indicates it will get a valid TURL and also pin the file so that it is there for the subsequent access Finally with the TURL and using the appropriate protocol GFAL will open the file and return a file handle to the caller Nevertheless GFAL sometimes exposes functionality applicable only to a concrete underlying technology or protocol if t
219. yment documentation UI Ixplus GridICE a monitoring service for the Grid http www infn it gridice Condor Classified Advertisements ttp www cs wisc edu condor classad The Condor Project http www cs wisc edu condor F Pacini Job Description Language HowTo http www inin it workload grid docs DataGrid 01 TEN 0102 0_2 Document pdf CERN LCG GDEIS 722398 Manuals Series Page 10 33 34 35 36 37 38 39 40 41 42 43 44 F Pacini JDL Attributes http www infn itworkload grid docs DataGrid 01 TEN 0142 0_2 pdf F Pacini Job Attributes Specification https edms cern ch document 555796 1 F Pacini Job Description Language JDL Attributes Specification https edms cern ch document 590869 1 The EDG Brokerinfo User Guide http www infn itworkload grid docs edg brokerinfo user guide v2 2 pdf Workload Management Software GUI User Guide http www inin it workload grid docs DataGrid 01 TEN 0143 0_0 pdf GSIFTP Tools for the Data Grid http www globus org toolkit docs 2 4 datagrid deliverables gsiftp tools html RFIO Remote File Input Output http doc in2p3 fr doc public products rfio rfio html CASTOR http cern ch castor dCache http www dCache org POOL Persistency Framework Pool Of persistent Objects for LHC http Icgapp cern ch project persist Learning POOL by examples a mini tutorial http Icgapp cer
220. ystem e Remote file system objectclass GlueHostRemoteFileSystem GlueHostLRemoteFileSystemRoot path name or other information defining the root of the file system GlueHostRemoteFileSystemSize size of the file system in bytes GlueHostRemoteFileSystemAvailableSpace amount of free space in bytes GlueHostRemoteFileSystemReadOnly true if the file system is read only GlueHostRemoteFileSystemType file system type GlueHostRemoteFileSystemName the name for the file system GlueHostRemoteFileSystemServer host unique id of the server which provides access to the file system e Storage device objectclass GlueHostStorageDevice GlueHostStorageDeviceName name of the storage device CERN LCG GDEIS 722398 Manuals Series Page 156 GlueHostStorageDeviceType storage device type GlueHostStorageDeviceTransferRate maximum transfer rate for the device GlueHostStorageDeviceSize size of the device GlueHostStorageDeviceAvailableSpace amount of free space e File objectclass GlueHostFile GlueHostFileName name for the file GlueHostFileSize file size in bytes GlueHostFileCreationDate file creation date and time GlueHostFileLastModified date and time of the last modification of the file GlueHostFileLastAccessed date and time of the last access to the file GlueHostFileLatency time taken to access the file in seconds GlueHo
Download Pdf Manuals
Related Search
Related Contents
prietest ECO - Robonik India.in La FAQ Inscription / Identification Manual Usuario - Osaka Solutions カタログダウンロードPDF(約2Mバイト) Dermatite atopique: update Transcend 128MB DDR266 ECC Unbuffer Memory MOMEC97 Molecular Modeling User's Guide ONLINE USER`S MANUAL Version 3.0 Manual - instalteccomercial.com.br Copyright © All rights reserved.
Failed to retrieve file