Home

OAR - XtreemOS

image

Contents

1. 30 4 5 Tuning the environment to build another one customizing authentification parameters 33 3 53 IST 033576 5 Customization Kald b ee Shee Sk 4 5 2 Creating a new environment from a customized environment 4 5 3 Recording this environment for future deployments 4 6 Adding software to an environment 4 7 Scripting a deployment A OAR reference manual AL arsta at as de pa Atla ses b s salma L bu de Nett AD oraq olo 100 HS Sad alee 7 ALS SOAS UD ad 0000000077 7 AZI oardel ser n R li Sana ib n RE qada a KRG Ge Ge Sane AG Oa resume 4 be Hadde bar FV RAE Ae h nd B Kadeploy reference manual Bil kaaddnode sen r s da heed Ga GAMMEN Sr BR ded 0 z st B2 Kadeinode oras Se k ni ae Ge ha ee eee As cha ae B 3 kadeplOy tagas caine aN bec duce sus adr BA kareboots BB kul eo dea ad s BS kaconsome Deh estes OE A eee ad Ar B 6 karecordenve oye edo oda a RK Boe da Geo ded aed es B 7 m A ORR Hee Go dea ad de Bid kacre teenv ar saranda sd Ho He kul eo de vante ca g XtreemOS Integrated Project D4 3 1 4 53 Chapter 1 Introduction From the XtreemOS description of work document
2. initrd img vmlinuz The user can also check that the filebase file is the expected one by checking its name and its checksum yJegoulparasol dev md5sum grid5000 images debian4all x86_64 2 tgz 6e0f8e8154da9b7d9f1lce56460blecl grid5000 images debian4al1 x86 64 2 tgz yjegou parasol dev A post install script adapts an environment to the site it is deployed on In the same way as for environ ments a user should be able to find a description of the post install script for each site For instance see figure 4 1 for the postinstall description on the Grid 5000 Rennes site 29 53 XtreemOS Integrated Project IST 033576 D4 3 1 4 3 Making a reservation on a deployable node For this example of the tutorial reservation made are interactive I on the deploy queue q deploy on only one machine 1 nodes 1 The reservation is limited to 4 hours with 1 walltime 4 yjegou parasol dev S oarsub I q deploy 1 nodes 1 1 walltime 4 Host Port parasol dev 32788 IdJob 29702 Interactive mode waiting Message from oar parasol dev on none at 14 48 Your Kadeploy reservation is beginning oar job id 29702 You can deploy your image on nodes parasol32 rennes grid5000 fr Kadeploy partitions are sda3 Th scape sequence for kaconsole is yjegoukparasol dev A For a reservation on the deploy queue a new shell is opened on the frontend node and not on the first machine of the reservation
3. Figure 3 1 Grid 5000 user portal Figure 3 2 Grid 5000 status XtreemOS Integrated Project D4 3 1 18 53 D4 3 1 IST 033576 Gandia Grid5000 Grid Report F Elle Edit View Go Bookmarks Tools Help da eb 7 BD z https helpdesk grid5000 fr ganglia r hoursis by 2520hostnames c Al se cl Bienvenue IRISA B xtreemos wiki Ho sas Temperature history 25 Ganglia Grid5000 Grid Re 32 Grids000 Home Grid5000 A ISS E t win Load El modes MCPUS W Running Processes iemory Used HE Shared E Memory cached 6 0 096 El Memory Buffered Hi Memory Swapped IM Total In Core Memory time 2006 10 24 17 23 Nancy physical view CPUs Total 58 Nancy Load last hour E Nancy Memory last hour E Hosts up 29 ie Sera ere eer T f t 1 T Hosts unknown 17 2 ally sf 1 fp Hosts down mot 3 e 0 i 16 40 17 20 16 40 17 00 17 20 sim E 1 min Load El Nodes H cpus HE Running Processes femory Used I Memory shared El Memory Cached E Memory Buffered IM Menory Swapped B Total In Core Memory 4 17 23 Rennes physical view CPUs Total 452 Rennes Load last hour Rennes Henory last hour 8 Hosts up 226 A _AA2A 8 is H Hosts unknown 38 E 8 t Hosts down 0 ki I jg sadi 16 40 17 00 H a 16 40 17 00 17 20 Avg Load 15 5 1mi El
4. A Grid testbed will be set up to experiment with XtreemOS software and test cases The objective of WP4 3 is to make available a grid experimentation platform to XtreemOS part ners Large scale distributed systems like grids are too complex to be evaluated using theoretical models and simulators Such systems must be experimented on real size real life experimental platforms In order to prove the effectiveness of the results the experimentations must be reproducible The purpose of WP4 3 is to setup an experimentation platform for XtreemOS Because XtreemOS is an operating system this platform must allow all experimentations on the whole software stack In order evaluate the scalability of XtreemOS this platform must offer thousands of computation nodes All partners involved in the development of an XtreemOS component or a software related to an XtreemOS use case will use for initial tests resources already available in their insti tution or experimental grid platforms they have access to The WP4 3 testbed is not a development platform Each XtreemOS partner will develop and run initial tests of his software on his own platform During the first half of the project the French Grid 5000 infrastructure will be used by all partners as the XtreemOS grid testbed Grid 5000 2 will be the first testbed used in XtreemOS It is an appropriate environment for experi menting XtreemOS The main purpose of the Grid 5000 pl
5. DSG m GT EN ST creates and registers the environment according to the parameters XtreemOS Integrated Project 52 53 Bibliography 1 2 3 4 5 6 x 7 8 Geant 2 web site lt http www geant2 net gt Grid 5000 web site lt http www grid5000 fr gt Kadeploy web site lt http www id imag fr Logiciels kadeploy index html gt Oar web site lt http oar imag fr gt Renater web site lt http www renater fr gt Taktuk web site lt http taktuk gforge inria fr gt Nicolas Capit Georges Da Costa Yiannis Georgiou Guillaume Huard Cyrille Martin Gr gory Mouni Pierre Neyron and Olivier Richard A batch scheduler with high level components In Cluster computing and Grid 2005 CCGrid05 2005 Franck Cappello Eddy Caron Michel Dayde Frederic Desprez Emmanuel Jeannot Yvon J gou Stephane Lanteri Julien Leduc Nouredine Melab Guillaume Mornet Raymond Namyst Pascale Primet and Olivier Richard Grid 5000 a large scale reconfigurable controlable and monitorable Grid platform In Grid 2005 Workshop Seattle USA November 13 14 2005 IEEE ACM 53 53
6. and a several monitoring tools Local observation of processor memory disk and network is difficult at the hardware level Since the users can use their own software configuration there is no way to provide a built in and trustable mon itoring system for CPU Memory and Disc So it is the responsibility of the users to proper install configure and manage the software observation tools they need for their experiments 2 2 Architecture The Grid 5000 architecture implements the principles described in the previous section Figure 2 1 presents an overview of Grid 5000 All sites are connected by high speed access links 10 Gbps by the end of 2005 to the RENATER national research and education network NREN XtreemOS Integrated Project 8 53 D4 3 1 IST 033576 500 500 Figure 2 1 Overview of Grid 5000 Numbers in the figure give the number of CPUs expected in 2007 for every site 2 3 of the nodes are dual CPU 1U racks equiped with 64 bits x86 processors AMD Opteron or Intel EM64T Each site hosts one or more clusters Each cluster is equiped with one special node the cluster head A cluster head supports the management tools for the cluster reservation tools deployment tools moni toring tools The cluster head has the same configuration hardware and software as the cluster nodes This node can be used as a development node deployment of experiments compilation All nodes are interconnected through 1 gigabit
7. 44 paravent20 eri 2 End 2006 10 24 20 10 00 paravent21 paravent21 paravent22 paravent22 rennes Sri EEE du aravent23 rennes rennes ES 105008 fri paravent24 rennes grid5000 fr 1 jparavent25 rennes grid5000 fr 0 paravent 2 i fri aravent2 aravent2i paravent27 paravent27 rennes Sri paravent28 rennes gri paravent 28 rennes 198900 fri paravent 2 nnes grid fro paravent2 nnes grid500 paravent 30 Sri d5004 paravent30 i paravent31 paravent31 ds aravent paravent3 paravent33 i baravent34 rennes grid5000 fr OL L https helpdesk grid5000 fr oar Rennes monika paravent cgi job 116345 I helpdesk grid5000 fr Df Figure 3 5 Gantt chart reservation history XtreemOS Integrated Project D4 3 1 20 53 D4 3 1 IST 033576 Walltime e Matching of resources job node properties e Hold and resume jobs e Multi scheduler support simple fifo and fifo with matching e Multi queues with priority e Best effort queues for exploiting idle resources e Check compute nodes before launching e Epilogue Prologue scripts e Activity visualization tools Monika e No Daemon on compute nodes e rsh and ssh as remote execution protocols managed by Taktuk e Dynamic insertion deletion of compute node e Logging e Backfiling e First Fit Scheduler with matching resource e Advance Reservation e Environnement o
8. a private network Except for some limited protocols ethernet broadcast for instance there is no limitation in inter site communications Hosting domains Grid 5000 sites are hosted in public research laboratories in France Each site has limited connectivity with his hosting domain e ssh from any Grid 5000 node to the domain portal e ssh from any domain node to local Grid 5000 node can be restricted to the Grid 5000 portal e local Grid 5000 servers not accessible to users can use some internet services from the hosting domain dns ntp http https smtp e web servers on the local Grid 5000 site can be accessed from the hosting domain Users authenticated on the information systems of these laboratories can log in their local Grid 5000 using ssh Satellite sites It is possible to open connections with sites outside Grid 5000 for experimentation purposes A satellite site must have a security policy equivalent to Grid 5000 policy Non priviledged ports can be opened in both directions Communication through priviledged ports is limited to ssh http and https Satellite sites must be aware that they can be subject to attacks from malicious Grid 5000 users Declaring a production network as a satellite site is a bad idea XtreemOS Integrated Project 10 53 IST 033576 D4 3 1 SOUS OOS PHD UO sapou Jo uonnq nsiq T JARL 2101 6 21 95 261 296 ES 607 971 601 19 T
9. a script traitement ash in the archive s root directory This archive is sent to the nodes decompressed in a ramdisk and then the post_install_script is executed on every node The script name is defined in the configuration file as post install script System modifications etc fstab a site dependant etc fstab base file should be copied in the postinstall archive tmp should have its rights modified XtreemOS Integrated Project 24 53 D4 3 1 IST 033576 Administrative aspects Many other basic files can be handled by putting them in the archive and replacing the existing ones Only the most important ones are listed here root ssh authorized keys should contain the administrator s public key and the public key of the user to allow him to get a root shell on every node In order to be stored on the node images the authorized_keys file has to be built and put in the postinstallation archive s root directory etc hosts etc hosts allow etc hosts deny should be set to fit the cluster s configuration and ensure network connection within the cluster s network Various modifications can be applied by the postinstallation script from authentification server to tailored modification depending on the node s IP It s a good idea to modify rc scripts to prevent the first boot hard disk drive verification because it is just a waste of time and avoid all the manual intervention that could occur on system boot f
10. deployment and Grid 5000 sites will be connected to it one by one as soon as it becomes available on the local POPs Six Grid 5000 sites are currently October 2006 connected to their local POP with the dark fibre infrastructure three of them at 10Gbit s Rennes Nancy Sophia and three others at 1Gbit s With the dark fibre infrastructures Grid 5000 sites will eventually see each other inside the same VLAN 10Gbit s Figure 2 4 shows the current state of the architecture along with the maximun number of lambdas that can be allocated on each link Inside the RENATER POPs Grid 5000 sites are directly connected to the switches see figure 2 5 by passing the routers which are used for standard IP traffic and Grid 5000 sites that are still using the EoMPLS initial solution 15 53 XtreemOS Integrated Project IST 033576 D4 3 1 9 Strasbourg Cotte Figure 2 4 Grid 5000 network Dark fibres RENATER POP 10 Gbit s lambdas ALCATEL 1696MS 1 Metro Span Te Tiflis for igiprojects ALCATEL 1626LM Light Manager IP router 10GE rarsparcers GE connections fo and Leased lines vegicne networks 2 5 G ltis GE 10GE connections for projects using the dark fibre Infrastructure Figure 2 5 Traffic separation in RENATER POPs XtreemOS Integrated Project 16 53 Chapter 3 User tools 3 1 Grid 5000 user portal The official Grid 5000 web site see figure 3 1 is accessible fr
11. gueue 19 53 XtreemOS Integrated Project IST 033576 GanttGh Firer BYE Eile Edit View Go Bookmarks Tools Help qa gt 8 A z https helpdesk grid5000 fr oar Rennes DrawOARGantt par Go L E Bienvenue IRISA EY XtreemOS wiki Ho zzz Temperature history z Ganglia Grid5000 Grid Re 22 Gantt Chart 235 Monika paravent rennes g Gantt Chart for paravent rennes grid5000 fr Origin 2006 Joct 122 l 00 00 Range 3 days Draw Default 90000 Sun 12h Mon 12h Tue 12h Wed aravent 01 rennes grid5000 fr_0 aravent 1 rennes grid5000 fr 1 paravent02 fr 0 57 H paravent paravent03 paravent 04 paravent 04 paraan 2 araveni EE ri paravent07 rennes grid5000 fr paravent07 rennes grid5000 r 1 paravent 08 rennes rid5000 r 0 aravent 08 rennes grid5000 fr 1 jaravento r 149000 r 0 paravento araventi0 aravent10 paravent11 baravent11 paravent12 paravent 1 gri paravent1 nnes Sr 1 d5000 paravent13 rennes grid5000 rennes 15 da araventid fr B raventig 118000 paravent15 rennes grid5000 fr 1 paravent 16 rennes grid500 paravent 16 nnes Sri d5000 aravent17 rennes grid5000 fr_0 aravent17 rennes r1d5000 fr 1 paravent18 rennes grid5000 fr 0 aravent 18 rennes grid5000 fr 1 araventi 8 86 135000 ro paravent i 006 10 23 18 47 05 paravent H Start 2006 10 24 10 10
12. network ing protocols improving point to point and multipoints protocols in the Grid context etc operating systems mechanisms virtual machines single system image etc Grid middleware application run times object oriented desktop oriented etc applications in many disciplines life science physics engineering etc and problem solving environments Research in these layers concerns scalability up to thousands of CPUs performance fault tolerance QoS and security For researchers involved in protocols OS and Grid middleware research the software settings for their experiments often require specific OSes Some researchers need Linux while others are interested in So laris10 or Windows For networking researches FreeBSD is preferred because network emulators like Dummynet and Modelnet run only on FreeBSD Some research on virtualization process checkpoint and migration need the installation of specific OS versions or OS patches that may not be compatible between different versions of the OS Even for experiments over the OS layers researchers have some preferences for example some prefer Linux kernel 2 4 or 2 6 because of scheduler differences Re searcher needs are quite different in Grid middleware some require Globus in different versions 3 2 4 DataGrid version while others need Unicore Desktop Grid or P2P middleware Some other researchers need to make experiments without any Grid middleware and test applications an
13. reserved nodes It is the user s duty to connect to the reserved nodes and perform his tasks 25 53 XtreemOS Integrated Project IST 033576 D4 3 1 File Edit View Go Bookmarks Tools Help A 0 80 El Bienvenue IRISA EY xtreemos wiki Ho helpdesk grids000 fr 6j Figure 3 6 GridPrems 3 6 Getting an account To get a user account on the Grid 5000 platform XtreemOS members should send a mail to lt mailto xtreemos proyectleader irisa fr with the following informations e First name last name ssh public key in a file attachement home fixed IP single IP or address range entrypoints on Grid 5000 all nodes only clusters heads requested protocols ssh http https non priviledged e in the case where connectivity to all nodes or non priviledged protocols have been requested the security policy on the home site must be described XtreemOS Integrated Project 26 53 Chapter 4 Grid 5000 deployment tutorial This short tutorial describes step by step how a Grid 5000 user connects to a Grid 5000 site locates a standard deployable operating system image tunes this image to his needs registers this new environ ment and deploys it on Grid 5000 nodes 4 1 Connection to Grid 5000 The user must first connect to the frontend node of the cluster he is granted access to outside
14. sequence for instance It needs the appropriate command to be defined in the configuration file in order to be able to open a remote console on the given node see configuration file section for further details Options m machine hostname specifies a host This option cannot be used more than once Configuration file etc kadeploy deploy cmd conf For each node it is possible to define the command to open a remote console on the specified node of the cluster Here is the description of the expected syntax host name console command Example for idpot1 of idpot cluster idpotl console ssh t localuser ldap idpot kermit 1 dev ttyUSBO b 38400 c In this example kaconsole m idpot1 means to open a serial connection on idpot 1 node via kermit software from Idap idpot Example kaconsole m node opens a console on node if appropriately equiped B 6 karecordenv karecordenv n name registration name v version version t default is 1 d description description a author author email fb filebas nvironment image path 49 53 XtreemOS Integrated Project IST 033576 D4 3 1 ft filesit post installation file path s size size Mb 1 initrdpath initrdpath default is none k kernelpath kernel path p param kernel param fd fdisktype fdisk type default is 83 fs filesystem file system default is ext2 karecordenv registers an environment in order to be able to u
15. 4 Ghz memory 2 Gb network 2 1 Gb ethernet storage 37 Gb SCSI paraci cluster 13 53 XtreemOS Integrated Project IST 033576 D4 3 1 model Sun Fire V20z Nb nodes 64 CPU AMD Opteron 248 2 2 Ghz parasol cluster memoi 2 6b network 2 x 1 Gb ethernet storage 73 Gb SCSI model HP Proliant DL145G2 Nb nodes 99 CPU AMD Opteron 246 2 0 Ghz memory 2 Gb network 2 x 1 Gb ethernet 33 nodes with Myrinet 10G 66 nodes with Infiniband storage 80 Gb SATA paravent cluster model Apple Xserve G5 Nb nodes 32 CPU PowerPC 2 0 Ghz memory 1 Gb network 2 1 Gb ethernet storage 80 Gb IDE tartopom cluster The parasol and paravent clusters are equiped with remote management capacity users can deploy their own operating system distribution on these two clusters The Cisco 6509 switch of the platform is in charge of separating IP traffic e traffic to other Grid 5000 sites is routed to the 10G fiber e all other allowed traffic is routed to the laboratory IRISA router which supports Internet connec tivity e the rest of the paquets are dropped In order to detect network misconfiguration the source of incoming traffic is also checked only traffic from Grid 5000 sites is accepted from the 10G fiber and paquets from Grid 5000 sites received from the Internet laboratory fiber are dropped 2 4 Grid 5000 interconnection RENATER is the French National Telecommunication Network for Technolo
16. 4 nin Load Bodes CPUS II Running Processes memory used HE Menory Shared El memory Cached 1 1 O menory Buffered MN Memory Swapped W Total In Core Memory Localtime 2006 10 24 17 23 Sophia physical view CPUs Total 388 Sophia Load last hour 8 Sophia Memory last hour E Hosts up 139 A Hosts unknown 10 3 I 1 1 Hosts down 12 E slig 40 17 00 17 20 pe E 1 min Load E nodes H CPUS HE Running Processes EI Memory Used M memory shared El Memory Cached 2 2 tenor Buffered IM Memory Swapped IM Total In Core Memory Localtime 2006 10 24 17 23 Toulouse physical view CPUs Total 100 Toulouse Load Tast hour 8 Toulouse Henory last hour 8 Hosts up 50 5503 https helpdesk grid5000 r ganglia cz NaneySimz sr helpdesk grid5000 fr E Figure 3 3 Ganglia platform monitoring iya paravs Ele Edit View Go Bookmarks Tools Help ec 6 61 EJ Bienvenue IRISA B xtreemos wiki Ho Oco httpa helpdesk grid5000 fr 0ar Rennes monika paravent cgi Temperature history E Ganglia Grids000 Grid Report Monika paravent rennes g l Monika paravent rennes grid500 y OAR node status Free Busy Total Nodes s 58 99 1 JE CPUs 82 116 198 Reservations o eee ov Le Free am GS ar FSS Fres ree tee am tt FREE BBS femme ct FSS ron ee uu nezze o feae oxa ESS ron fees a ESTES nee nezem o feces x
17. E TE SOPON 8 g snornor 191 SOT erydos 6ST 19 66 19 TE souuoy THE THE 9 Lv Ly A UEN 921 0 UO T EL 07 ES SIT SET 601 TE SP 8 y xne p o q UIGTE OTE STE OSV I IG 009 OSLI 950091 OOIVX ZOTA J9AIISO AT T AT S yueryolg JIISOJUI 8 8 CD AT SX sad 810 9115 r UNS 3114 UNS yal yal Nal dH dH MOd 1904 MOd H q giddy 9PON XtreemOS Integrated Project 11 53 D4 3 1 IST 033576 Processor AMD AMD AMD AMD AMD Intel Intel Xeon PowerPC Site total types Opteron Opteron Opteron Opteron Opteron Itanium 2 IA32 246 248 250 252 275 Bordeaux 96 64 96 Grenoble 206 270 Lille 106 40 146 Lyon 112 140 252 Nancy 94 94 Orsay 432 252 684 Rennes 198 128 128 64 518 Sophia 210 224 434 Toulouse 116 116 Nodes 1046 446 392 40 224 206 192 64 2610 total Table 2 2 Processor types on Grid 5000 sites 12 53 XtreemOS Integrated Project D4 3 1 IST 033576 IRISA Gigabit Fiber Renater 10G Switch Dark Optical Cisco 6509 Switch management Fiber Gigabit Fiber mr me mms ci Gigabit Ethernet 336 ports 32 IP forwarding 7777 etho etho eth0 eth0 1 5 Ne parasol srv parasol paravent srv paravent paraci tartopom server nodes S serve
18. XtreemOS Information Society Enabling Linux Technologies for the Grid Project no IST 033576 XtreemOS Integrated Project BUILDING AND PROMOTING A LINUX BASED OPERATING SYSTEM TO SUPPORT VIRTUAL ORGANIZATIONS FOR NEXT GENERATION GRIDS First version of XtreemOS testbed user manual D4 3 1 Due date of deliverable November 30 2006 Actual submission date December 19 2006 Start date of project June 1 2006 Type Deliverable WP number WP4 3 Responsible institution INRIA Editor amp and editor s address Yvon J gou IRISA INRIA Campus de Beaulieu Rennes Cedex FRANCE Version 0 1 Last edited by Yvon J gou Dec 19 2006 Project co funded by the European Commission within the Sixth Framework Programme Dissemination Level PU Public y PP Restricted to other programme participants including the Commission Services RE Restricted to a group specified by the consortium including the Commission Services CO Confidential only for members of the consortium including the Commission Services Revision history Version Date Authors Institution Section affected comments 0 1 06 10 27 Yvon J gou INRIA Initial document 0 9 06 11 24 Yvon J gou INRIA first review 1 06 11 27 Yvon J gou INRIA cleanup 1 1 06 12 19 Yvon J gou INRIA final comments Abstract Operating system developments must be validated on real hardware The b
19. as for standard reservations When this shell exits the reservation ends The shell is populated with OAR_x environment variables yjegou parasol dev S printenv grep OAR OAR_JOBID 29702 OARDIR opt oar OAR NODENUM 1 OAR USER yjegou OAR WORKDIR home rennes yjegou OARUSER 0ar OAR_NB_NODES 1 SUDO_COMMAND bin sh c opt oar oarexecuser sh tmp OAR_29702 1 29702 yjegou bin bash home rennes yjegou I OAR FILE NODES tmp OAR 29702 OAR NODEFILE tmp OAR 29702 OAR_NODECOUNT 1 OAR O WORKDIR home rennes yjegou y 3egou parasol dev r As usual if the reservation is successfull the reserved machine names are stored in file OAR FILE NODES T yjegou parasol dev S cat OAR FIL parasol32 rennes grid5000 fr parasol32 rennes grid5000 fr yjegoukparasol dev 1 NODES 4 4 Deploying an environment First the user needs to select a deployable partition To this end each site describes the partition scheme it uses in its message of the day that is displayed upon succesful login XtreemOS Integrated Project 30 53 D4 3 1 IST 033576 yJegoulparasol dev cat etc motd This cluster is parasol 64 dual Opteron nodes 2 2Ghz Node names parasol 01 64 rennes grid5000 fr Operating System Ubuntu 5 04 32 64bit Deploy Partitions sda3 lt NEW 21 09 2006 Local scratch dir local sda5 NFS scratch dir site data0 Disk quotas active
20. atform is to serve as an experimental testbed for research in Grid Computing It allows experiments in all the software layers between the network protocols up to the applications including the OS layer Further it is already operational and already gives access to one thousand of nodes The extension of the initial XtreemOS testbed to DAS 3 a grid for research in the Netherlands and to the China National Grid CNGrid will be studied during the next months The issue of mobile device testbeds will be addressed in future documents once the requirements and specifications for XOS MD are defined 5 53 IST 033576 D4 3 1 Chapter 2 of this document describes the Grid 5000 harware platform and chapter 3 the user tools Chapter 4 is a small user tutorial Appendices A and B contain the reference manuals of Grid 5000 OAR and Kadeploy user softwares XtreemOS Integrated Project 6 53 Chapter 2 Grid 5000 platform 2 1 Motivations The design of Grid 5000 8 derives from the combination of 1 the limitations observed in simulators emulators and real platforms and 2 an investigation about the research topics that the Grid community is conducting These two elements lead to propose a large scale experimental tool with deep reconfigu ration capability a controlled level of heterogeneity and a strong control and monitoring infrastructure The experiment diversity nearly covers all layers of the software stack used in Grid computing
21. bian4all jegou Sayed d customized debian4all by Yvon Jegou a Yvon Jegou irisa fr fb file home rennes yjegou archive tgz ft file usr local kadeploy Scripts postinstall post XtreemOS Integrated Project 36 53 D4 3 1 IST 033576 install4all gen tgz i boot initrd img 2 6 12 1 amd64 k8 smp k boot vmlinuz 2 6 12 1 amd64 k8 smp fd 83 fs ext3 Checking variable definition Environment registration successfully completed 1 5 When customizing an already existing environment it could be easier to restart from the original record since karecordenv also accepts a file as a single parameter following the output format of kaen vironments e For instance yjegou parasol dev S kaenvironments e debian4all gt myImage env Checking variable definition user selected yjegou yjegoukparasol dev The user may then edit the content of the my Image env file 1 update the name description author and filebase lines 2 delete the id size version and user lines 3 save 4 and then record the environment with frontend karecordenv fe myImage env 4 6 Adding software to an environment Grid 5000 machines cannot access the outside public internet in general But for practical reasons some holes are poked to be able for instance to update an environment with the latest packages from the distributor or to add usefull software i
22. counts not configured home dir mounted no scratch dir mounted no swap yes etc resolv conf to irisa fr etc hostname yes etc mailname yes Figure 4 1 Rennes postinstall wiki page Debian4RMS Debian environment with several RMS aka batch scheduler systems installed OAR e Torque MAUI e SGE Sun One Grid Engine Debian4kexec Debian environment vvith kexec tool for testing purposes Globus4 The globus 4 Toolkit has been deployed on the debian4all image It is shipped with scripts to automate configuration Ubuntu4all Ubuntu environment that should comply with all basic needs and all sites This environ ment is to be taken as the base to build more complex Ubuntu environments Available for x86_64 and also for 1386 Fedora4all Fedora environment that should comply with all basic needs and all sites This environ ment is to be taken as the base to build more complex Fedora environments Rocks4all Rocks environment that should comply with all basic needs and all sites This environment is to be taken as the base to build more complex Rocks environments Each environment is described on a wiki page reference file valid platforms and sites Figure 4 2 shows the wiki page of Debian4all To deploy an environment the user must know its name as registered in the Kadeploy database It is the first information on the environment description page This tutorial uses the debian4all environm
23. ction on a precise server as a special user e deploy boot is a call to the deployment system setup pxe p1 perl script with appropriate arguments Examples No reboot kareboot n m nodel debian setups pxe grub and t ftp stuff for rebooting nodel on debian environment if already installed kareboot n m node p hda5 setups pxe grub and t ftp stuff for rebooting node7 on partition hda5 if an environment is found Simple reboot kareboot s m nodel executes a software reboot of nodel kareboot h m node m node6 executes a hardware reboot of nodes 7 and 6 kareboot m node executes a software reboot of node7 Deploy reboot kareboot d m node2 soft reboots on deployment kernel kareboot d h m node2 hard reboots on deployment kernel Environment reboot kareboot e debian m node3 looks for environment debian on node3 and soft reboots on it if it exists XtreemOS Integrated Project 48 53 D4 3 1 IST 033576 kareboot h e debian m node3 looks for environment debian on node3 and hard reboots on it if it exists Partition reboot kareboot s p hda6 m node4 checks the environment installed on partition hda6 of node4 and soft reboots on it if appropriate B 5 kaconsole kaconsole m machine hostname kaconsole opens a console on a remote node This tool is designed to help the user that would try to find out the reason of a deployment failure or to follow a boot
24. d mechanisms in a multi site multi cluster environment before evaluating the middleware overhead According to this inquiry on researchers needs Grid 5000 provides a deep reconfiguration mechanism allowing researchers to de ploy install boot and run their specific software images possibly including all the layers of the software stack In a typical experiment sequence a researcher reserves a partition of Grid 5000 deploys his soft ware image on it reboots all the machines on this partition runs the experiment collects results and frees the machines This reconfiguration capability allows all researchers to run their experiments in the software environment exactly corresponding to their needs As discussed earlier the Grid 5000 architecture provides an isolated domain where communication be tween sites is not restricted and communication with the outside world is not possible Mechanisms based on state of the art technology like public key infrastructures and X509 certificates produced by 7 53 IST 033576 D4 3 1 the Grid community to secure all resources accessed are not suitable for the Grid 5000 The GSI high level security approach imposes a heavy overhead and impacts the performances biasing the results of studies not directly related to security Therefore a private dedicated network PN or a virtual private network VPN are the only solutions to compose a secure grid backbone and to build such a confined infrastructu
25. d the job walltime will be smaller Best effort jobs A best effort job will be launched only if there is enough resources and will be deleted if another job wants its resources Example oarsub t besteffort path to prog 3 4 Kadeploy KADELOY Kadeploy is the deployment system which is used by Grid 5000 It is a fast and scalable deployment system for cluster and grid computing It provides a set of tools for cloning configuring post installation and managing a set of nodes Currently it successfully deploys linux BSD Windows and Solaris on x86 and 64 bits computers The deployment system is composed of several tools or commands whose aim is the execution of all the needed actions to carry out deployments and other administrative tasks 23 53 XtreemOS Integrated Project IST 033576 D4 3 1 The environment to be deployed can either already exist and be registered in the database or already exist but not be registered or neither exist nor be registered The first case is the simplest since nothing has to be done prior to the deployment itself In the other cases kacreateenv kaarchive karecordenv are used to create and register create register an environment in the database The deployment itself i e of a given environment on target partitions of a set of cluster nodes is done using the kadeploy tool A complete deployment is composed of the following steps reboot on the deployment kernel via pxe protocol pre installa
26. e e and tmp for a few temporary files Example kadeploy e debian m node p hda3 XtreemOS Integrated Project 46 53 D4 3 1 IST 033576 deploys the debian on partition hda3 of node 7 kadeploy e debian m nodel m node2 p hdb6 deploys the debian on partition hdb6 of nodes 1 and 2 B 4 kareboot kareboot n noreboot s soft h hard d deploy ml machine hostname e environment name 1 pl partition partition kareboot can execute software or hardware reboots on given nodes It can execute five types of query no reboot simple reboot deploy reboot environment reboot and partition reboot For all of them the user can specify one or several hosts Here is a brief description of the effects of each reboot type reboot setups pxe grub and t ftp stuff for the given nodes without rebooting simple reboot executes a normal classic reboot of the given nodes deploy reboot reboots the given nodes on the deployment kernel environment reboot looks for the specified environment on the given nodes and reboots on it The environment should thus already be installed partition reboot reboots the given nodes on the requested partition if an appropriate environment is found Consult the example section below for more explicit usage descriptions kareboot needs the appropriate command softboot hardboot deployboot to be defined in the configuration file in order to be able to execute the reque
27. e hostname specifies a host This option can be used more than once to specify several hosts By default it tries to deploy on all the nodes p partition partition the target partition This option can t be used more than once and the partition is the same for all the nodes Configuration files etc kadeploy deploy conf Contains variables that should be defined prior to the command execution These variables are grouped into several sections A very short description is available for each variable and sometimes example values are given in commentary The file is read once at the beginning of the execution and prior to any other action The syntax is the following f section name short description var_namel var_valuel short description var name var value2 if example valu Also an example of this file can be found at in the tools cookbook folder etc kadeploy deploy_cmd conf Contains entries describing reboot command needed for deployment Please see kareboot man page for further details kadeploy needs read and write access to several folder and files to acheive a deployment These rights should be granted to the deploy user Here are the files and folder list SKADEPLOYDIR boot contains utilities binaries etc for grub pxe reboot stories e the tftp folder where grub and pxe files will be written and that should be defined in the etc kadeploy deploy conf configuration fil
28. e on all sites They are shared on any given cluster through NFS but distribution to another remote site is done by the user through classical 9 53 XtreemOS Integrated Project IST 033576 D4 3 1 file transfer tools rsync scp sftp etc Data transfers outside of Grid 5000 are restricted to secure tools to prevent identity spoofing and Public key authentication is used to prevent brute force attacks 2 2 2 Hardware 2 2 3 Programming environment Maintaining a uniform programming environment accross all Grid 5000 sites would be a heavy task Moreover selecting a common programming environment for all users is impossible Grid 5000 users can define and deploy their own system image This possibility relaxes the dependency of user ex perimentations on site configurations Grid 5000 sites manage their clusters independently Each site provides a basic Linux installation on his clusters This basic installation provides a standard program ming environment Fortran and C C compilers geb java Various Linux distributions are present on Grid 5000 Ubuntu at Rennes Fedora core at Toulouse Debian at Grenoble 2 2 4 Accessibility policy As mentionned earlier communication between Grid 5000 nodes and the outside world is very restricted The accessibility policy considers Grid 5000 sites hosting domains satellite sites internet servers and external user sites Grid 5000 sites The Grid 5000 sites communicate through
29. eep sh Host Port parasol dev 33147 IdJob 29767 Reservation mode waiting validation Reservation valid gt OK yjegoukparasol dev Once his reservation has started the user can request for an interactive shell on the first node of the reservation parasol dev oarsub I c 29767 Connect to OAR reservation 29767 via the node parasol29 rennes grid5000 fr yjegou parasol29 S Getting job status parasol dev oarstat j 29767 Job Id 29767 queueName default reservation Scheduled XtreemOS Integrated Project 22 53 D4 3 1 IST 033576 submissionTime 2006 11 24 15 15 31 accounted NO bpid State Running user yjegou weight 2 startTime 2006 11 24 15 17 54 nbNodes 2 command bin sleep 3600 checkpoint 0 jobType PASSIVE message properties maxTime 00 59 06 idFile autoCheckpointed NO stopTime 0000 00 00 00 00 00 launchingDirectory home rennes yjegou infoType parasol dev 33147 parasol dev Deleting a job parasol dev oardel 29767 Deleting the job 29767 REGISTERED The job s 29767 will be deleted in a near futur parasol dev Submitting a moldable job It is possible to use several 1 oarsub option one for each moldable description By default the OAR scheduler will launch the moldable job which will end first The scheduler can decide to start a job later because they will have more free resources an
30. ehavior of a grid operating system depends on so many parameters such as the number of nodes their heterogeneity memory cpu devices the structure of the Grid small or large clusters the interconnection network structure latency bottlenecks the dynamicity of the grid the stability of the grid node failures the efficiency of grid services service bottlenecks etc that their is no possible way to evaluate it through simulation Grid operating systems such as XtreemOS must be validated on a realistic grid platform The grid testbed must provide a significant number of computation nodes for scalability evaluation The testbed nodes must allow full reconfiguration of the software stack from the low level communication layers up to the grid services The Grid 5000 platform is distributed on 9 sites in France provides a significant number of computation nodes gt 1000 dual nodes end 2006 expected 2500 dual quad nodes end 2007 and allows full reconfig uration of the node software from the operating system kernel to the grid middleware Grid 5000 users can reserve nodes on Grid 5000 sites define new operating system environments kernel and distribution and deploy these environments on the nodes Grid 5000 will be the initial testbed for the XtreemOS project This deliverable describes the Grid 5000 platform and the tools made available to users for the construc tion registration deployment and debugging of their operating
31. ent Being able to reproduce the experiments that are done is a desirable feature It is possible to check the exact version of the environment present on the target cluster using kaenvironments on the cluster frontent XtreemOS Integrated Project 28 53 D4 3 1 IST 033576 Debian4all version 2 Maintainer Julien Leduc Based on Debian version testing unstable for pure 64 bits Reference file home grenoble jleduc Images image_debian4all tgz at Orsay md5sum is 6e0f8e8154da9b7d9f11ce56460b1ec1 Valid on HP DL145G2 IBM e325 Sun v20z IBM e326 a Available on bordeaux idpot grenoble lille Lyon grillon nancy gdx orsay parasol rennes sophia toulouse Authentication Remote console on ttySO at 38400 bps Services Idap no nfs no Accounts root grid5000 Applications a ssh has X11 forwarding enabled Text Editors echo sed vi vim p source list configured for all grid5000 sites Figure 4 2 Debian4all wiki page yjegou parasol dev S kaenvironments environment debian4all debian4all v2 name debian4all id 108 version 2 description Image Debian 3 1 minimale generique author julien leduc lri fr filebase file grid5000 images debian4all x86_64 2 tgz filesite file grid5000 images generic postinstall v3 tgz size 1000 initrdpath kernelpath kernelparam fdisktype 131 filesystem ext2 siteid 1 optsupport 0 user kadeploy yjegoukparasol dev
32. ent troubleshooting guide 33 53 XtreemOS Integrated Project IST 033576 D4 3 1 When playing with unstabble environments it is possible to loose services such as sshd or rshd and then become unable to get a shell on the deployed machine An alternative solution to the connection through ssh is to open a machine console using kaconsole command for the OS it is seen as physical access thanks to remote console capabilities provided nobody else has already opened a write access to the remote console and that the post install script has remote console properly configured yjegou parasol dev S kaconsole m parasol22 rennes grid5000 fr Checking variable definition Checking command definition This is ttyS0 on parasol22 Type A to terminate the session root Password Last login Thu Mar 2 12 27 00 2006 from devgdx002 orsay grid5000 fr on pts 0 Linux none 2 6 12 1 amd64 k8 smp 1 SMP Wed Sep 28 02 57 49 CEST 2005 x86_64 The programs included with the Debian GNU Linux system are free software the exact distribution terms for each program are described in the individual files in usr share doc x copyright Debian GNU Linux comes with ABSOLUTELY NO WARRANTY to the extent permitted by applicable law none kaconsole shows the console screen as if the user was locally connected onto the machine Remote console should be kept working all along the boot process which is useful to debug an environment see also e
33. es Performance evaluation in grid computing is a complex issue Speedup is hard to evaluate in hetero geneous hardware In addition hardware diversity increases the complexity of the deployment reboot and control subsystem Moreover multiplying the hardware configurations directly increases day to day management and maintenance costs Considering these 3 parameters it has been decided that 2 3 of the total machines should be homogeneous However as grid are heterogeneous by definition a reasonable amount of the machines one third are not homogeneous in Grid 5000 Grid 5000 will be used for Grid software evaluation and fair comparisons of alternative algorithms soft ware protocols etc This implies two elements first users should be able to conduct their experiments in a reproducible way and second they should be able to access probes providing precise measurements during the experiments The reproducibility of experiment steering includes the capability to 1 reserve the same set of nodes 2 deploy and run the same piece of software on the same nodes 3 synchronize the experiment execution on the all the involved machines 4 if needed repeat sequence of operations in a timely and synchronous way 5 inject the same experimental conditions synthetic or trace based fault injection packet loss latency increase bandwidth reduction Grid 5000 software set provides a reservation tool OAR 7 4 a deployment tool Kadeploy 3
34. ethernet Other network equipments can be present on some clusters for instance Myrinet 10g for 33 machines and Infiniband for 66 machines in Rennes Grid 5000 users can access all cluster heads without restrictions Before using a cluster node a user must reserve this node with OAR section 3 3 or GridPrems section 3 5 for the duration of his experiment 2 2 1 User view and data management As previously mentioned communications are done with minimal authentication between Grid 5000 machines The logical consequence is that a user has a single account across the whole platform How ever each Grid 5000 site manages its own user accounts Reliability of the authentication system is also critical A local network outage should not break the authentication process on other sites These two requirements have been fulfilled by the installation of an LDAP directory Every site runs an LDAP server containing the same tree under a common root a branch is defined for each site On a given site the local administrator has read write access to the branch and can manage his user accounts The other branches are periodically synchronized from remote servers and are read only From the user point of view this design is transparent once the account is created the user can access any of the Grid 5000 sites or services monitoring tools wiki deployment etc His data however are local to every site also the pathname of his home dir is the sam
35. f demand support Ka tools integration e Grid integration with Cigri system e Simple Desktop Computing Mode The main OAR commmands are oarsub submit a job to the scheduler oarstat get job state oarnodes get informations about cluster nodes oardel delete a job OAR manages interactive and batch jobs The standard output of a batch job is returned in file OAR BaseName Jobld stdout in the user homedir Errors are in file OAR BaseName Jobld sterr 21 53 XtreemOS Integrated Project IST 033576 D4 3 1 3 3 1 OAR examples Allocation of 1 node in interactive mode yjegou parasol dev S oarsub 1 nodes 1 I Host Port parasol dev 33022 IdJob 29739 Interactive mode waiting yjegou parasol08 S On success of an interactive reservation request the user is automatically logged on the first allocated node parasol 08 in this example The reservation ends when the user exits this initial shell Running bin date in batch mode on a dynamically allocated node yJegoulparasol dev oarsub 1 nodes 1 bin date IdJob 29741 y 3egou parasol dev Result of job execution in OAR BaseName Jobld stdout yjegou parasol dev cat OAR date 29741 stdout Fri Nov 24 11 29 10 CET 2006 yjegoukparasol dev Job submission in the future and interactive connection to the job once it is running yJegoulparasol dev oarsub I 1 nodes 2 walltime 1 00 00 r 2006 11 24 15 17 00 home rennes yjegou sl
36. g fs usr local bin DKsentinelle lroot crsh c timeout 10000 v m131 254 202 132 umount dev sda3 Done No kernel parameter taking default ones defined in the configuration file generating pxe VERBOSE fork process for the node 131 254 202 132 VERBOSE Execute command rsh 1 root 131 254 202 132 reboot detach VERBOSE job 11652 forked VERBOSE Child process 11652 ended exit value 0 signal num 0 dumped_core 0 131 254 202 132 GOOD 0 0 0 rebooting the nodes Transfert 21 nmapCmd usr bin nmap You want this node parasol32 rennes grid5000 fr aiting for all the nodes to reboot during 300 seconds lt BootEnv 0 gt there on check 131 254 202 132 lt BootEnv 1 gt Deployment finished lt Completed gt Last Reboot 121 Deploy State 14742 terminated Node State Error Description if any XtreemOS Integrated Project 32 53 D4 3 1 IST 033576 parasol32 rennes grid5000 fr deployed Sumary first reboot and check 128 preinstall 0 transfert 21 last reboot and check 121 yjegoukparasol dev Node discovery can be automated yJegoulparasol dev kadeploy e debian4all Y m head 1 OAR FILE NODES p sda3 Once kadeploy has run successfully the allocated node is deployed under the requested debian4all environment On most environments there is at least one configured account for login through ss
37. gy Education and Research More information can be found on RENATER s web site 5 RENATER offers about 30 POPs Points Of Presence in France at least one POP for each region on which metropolitan and regional networks are connected More than 600 sites universities research centers are interconnected through RENATER The network in its current phase RENATER 4 was completely deployed in November 2005 The standard architecture is based on 2 5Gbit s leased lines and provides IP transit connectivity intercon nection with GEANT 2 1 overseas territories and the SFINX Global Internet exchange figure 2 3 XtreemOS Integrated Project 14 53 D4 3 1 IST 033576 IP transit 2x2 5Gbit s SFINX 2x10gbit s Martinique Guadeloupe Guyane Figure 2 3 Renater network The initial design of Grid 5000 sites interconnection has been adressed within the RENATER backbone using a Ethernet Over MPLS EoMPLS solution It is a full mesh topology based on MPLS tunnels LSPs established between the RENATER POPS on which are connected the Grid 5000 sites In practice sites are interconnected through 1 Gbit s VLANs RENATER 4 also introduced a dark fibre infrastructure allowing to allocate dedicated 10Gbit s lamb das for specific research projects It also provides interconnection with GEANT 2 with increased ca pacity compared to GEANT 1 and dedicated interconnection for projects This new infrastructure is still under
38. h kadeploy checks that sshd is running before declaring success of a deployment This account parameters user password can be found in the environment description page yjegou parasol dev ssh root parasol32 Password Last login Thu Nov 23 15 22 24 2006 from parasol dev irisa fr Linux none 2 6 12 1 amd64 k8 smp 1 SMP Wed Sep 28 02 57 49 CEST 2005 x86 64 The programs included with the Debian GNU Linux system are free software the exact distribution terms for each program are described in the individual files in usr share doc copyright Debian GNU Linux comes with ABSOLUTELY NO WARRANTY to the extent permitted by applicable law Last login Thu Nov 23 15 22 51 2006 from parasol dev irisa fr on pts 0 none uname a Linux none 2 6 12 1 amd64 k8 smp fl SMP Wed Sep 28 02 57 49 CEST 2005 x86_64 GNU Linux none cat etc issue Debian GNU Linux testing unstable An 1 none 4 5 Tuning the environment to build another one customizing authentifi cation parameters Once a standard environment has been successfully deployed it it possible to tune modify it and to record the result in a new environment It is possible to log in the deployed environment through ssh and the default account see exemple above or through a machine console In case of connection difficulty there is a troubleshooting on wiki page Sidebar gt User Docs gt Kadeploy gt Kadeploy documentation gt Deploym
39. he first processus of the job number of seconds before the walltime signal signal name specify the signal to use when checkpointing t type name specify a specific type deploy besteffort cosystem checkpoint 41 53 XtreemOS Integrated Project IST 033576 D4 3 1 d directory path specify the directory where to launch the command default is current directory n job name specify an arbitrary name for the job a job id anterior job that must be terminated to start this new one notify method specify a notification method mail or command ex notify mail name domain com notify exec path to script args stdout file name specify the name of the standard output file stderr file name specify the name of the error output file resubmit job id resubmit the given job to a new one force cpuset name cpuset name Specify the cpuset name instead of using the default job id for the cpuset WARNING if several jobs have the same cpuset name then processes of a job could be killed when another finished on the same computer Examples oarsub 1 nodes 4 test sh This example requests the reservation of 4 nodes in the default queue with the default walltime and then the execution of the test sh shell script on the first node of the allocated node list oarsub q default 1 walltime 50 30 00 nodes 10 cpu 3 walltime 2 15 00 p switch sw1 home u
40. is account again each time he deploys it The first step to create an environment is to create an archive of the customized node Because of the various implementations of the dev filesystem tree this can be a more or less complex operation The state of the art command for archive creation can be found on the wiki Sidebar gt User Docs gt Kadeploy gt Kadeploy documentation gt Main archive creation yJegoulparasol dev ssh root parasol22 rennes grid5000 fr tar posix numeric ovner one file system zcf fe N gt archive tgz Password tar Removing leading from member names tar Removing leading from hard link targets tar dev log socket ignored yjegoukparasol dev 35 53 XtreemOS Integrated Project IST 033576 D4 3 1 If dev is mounted using udev it s probably the case if a dev udevdb directory exists the directions to follow can be found on the wiki at Sidebar gt User Docs gt Kadeploy gt Environments requirements gt udev and devfs Warning Kadeploy insists archives be named witha t gz extension and not tar gz Moreover the system user deploy should have read access to the archive file this hence also means execute access on each subdirectories from the user homedir to that file in order to deploy an environment based on this archive 4 5 3 Recording this environment for future deployments This is done using the karecordenv command It allows t
41. ite myLogin myPath auto_deploy sh To follow progress of the job frontend oarstat j jobId or frontend tail f OAR auto deploy sh jobId stdout To see the associated error log frontend less OAR auto deploy sh jobId stderr 39 53 XtreemOS Integrated Project Appendix A OAR reference manual This manual section has been extracted from the official OAR web site at 4 More information can be found on this site All OAR user commands are installed on cluster head nodes oarstat This command prints jobs in execution mode on the terminal Options f prints each job in full details job id prints the specified job id information even if it is finished g dl d2 prints history of jobs and state of resources between two dates D formats outputs in Perl Dumper X formats outputs in XML Y formats outputs in YAML Examples oarstat oarstat j 42 f A 2 oarnodes This command prints information about cluster resources state which jobs on which resources resource properties Options 40 53 D4 3 1 IST 033576 a shows all resources with their properties SE shows only properties of a resource s shows only resource states 1 shows only resource list D formats outputs in Perl Dumper X formats outputs in XML Y formats outputs in YAML Examples oarnodes oarnodes s A 3 oarsub The user can submit a job with the oarsub command So what is a job in our conte
42. n deb or rpm forms One of these holes is the apt cacher available on most the sites at apt site grid5000 fr Nancy Orsay or Rennes for example Therefore the debian4a11 environment has the following lines in the etc apt sources list file deb http apt orsay grid5000 fr apt cacher ftp de debian org debian N sid main These lines enable the apt get commands to work as expected It is therefore possible to update the environment to add missing libraries or remove unnecessary packages so that sizes down the image and speeds up the deployment process etc using e apt get update e apt get upgrade e apt get install list of desired packages and libraries e apt get purge remove list of unwanted packages 37 53 XtreemOS Integrated Project IST 033576 D4 3 1 e apt get clean See apt get man pages for more infos 4 7 Scripting a deployment It is often necessary to script most of the work to avoid too many interactive steps The main points to note are the following e anon interactive reservation on queue deploy will see the given script running on the node from which deployments are run usually the OAR or frontend node e kadeploy accepts the f parameter to read the list of machines to deploy on from a file e in non interactive mode scripts started by OAR write their output in file whose names are OAR scriptName jobId sdterr and OAR scriptName jobld sdtout The following shell sc
43. nfert OK Use of uninitialized value in string eq at opt kadeploy bin kadeploy line 376 Command Mcat opt kadeploy bin mcat_rsh pl p 14053 sc opt kadeploy sbin kasetup printpreinstallconf dc cat gt mnt rambin preinstalldisk conf tranfert OK Executing preinstall usr local bin DKsentinelle lroot crsh c timeout 10000 v m131 254 202 132 mnt rambin init ash fdisk EE x 31 53 XtreemOS Integrated Project IST 033576 D4 3 1 Done Preinstall 0 lt Transfert gt filebase grid5000 images debian4all x86_64 2 tgz Formatting destination partition dev sda3 on the nodes usr local bin DKsentinelle lroot crsh c timeout 10000 v m131 254 202 132 mkfs t ext2 dev sda3 usr local bin DKsentinelle lroot crsh c timeout 10000 v m131 254 202 132 mount dev sda3 mnt dest Done lt tar Transfert gt Sending Computing environment to the nodes Command Mcat opt kadeploy bin mcat_rsh pl p 14054 sc cat grid5000 images debian4all x86_64 2 tgz dc cat gt dest pipe tranfert OK Done Transfert 15 lt PostInstall gt Executing postinstall Command Mcat opt kadeploy bin mcat rsh pl p 14055 sc cat grid5000 images generic postinstall v3 tgz N dc cat gt post pipe tranfert OK usr local bin DKsentinelle lroot crsh c timeout 10000 v m131 254 202 132 rambin traitement ash rambin Done Postinstall 10 Umountin
44. nsole ttyS0 38400n8 ETH DRV tg3 fd 83 fs ext2 XtreemOS Integrated Project 50 53 D4 3 1 IST 033576 registers a debian image whose features are explicitly given in the parameter list B 7 kaarchive kaarchive X xclude from file z gzip N output directory output directory i image image_name root directory root directory kaarchive creates a environment image from a running one Options X exclude from file excludes files named into file from the generated environment z gz p filters the archive through gzip output directory output directory sets the output directory for the generated environment default is cur rent directory i image image name sets the output image name default is output image root directory root directory sets the root directory for the environment creation B S kacreateenv kacreateenv e environment environment name v version version X xclude from file z gzip output directory output directory root directory root directory author email address description description file base fil fil site fil size size kernel path kernel path fdisk type fdisk type number file system file system typ NFK EE ENG TO AE kacreateenv creates and registers a new environment from a running one Options environment environment name the registration and output image name fo
45. o ames Free is A ese ox EIES lt oo H SE SE ee OAR properties TT besteffort YES 7 desktopcomputing NO gridPrems YES 7 infiniBand YES 7 myri YES I deploy YES gridPrems NO FT infinBand NO 1 myri NO Display nodes for these properties Job details Eu User State Queue NbNodes weight Type Properties R servation Walltime Submission Time Start Time Scheduled Start e mravelo waiting default 3 2 PASSIVE Scheduled 08 00 00 2006 07 15 17 24 06 2006 11 18 09 00 00 2006 11 18 09 00 00 eg treraut Waiting deploy 2 PASSIVE Scheduled 2006 10 19 14 46 16 2006 10 25 07 2006 10 25 07 erodriguez Running deploy 40 2 FassvE Scheduled 2006 10 23 18 47 05 2006 10 24 10 2006 10 24 1 116347 mmartinasso Running default 12 2 INTERACTIVE None 2006 10 24 09 37 36 2006 10 24 09 2006 10 24 09 116350 hbouziane Running csr 1 2 PASSIVE p hostname IN paraventl 9 rennes grid5000 fr Scheduled 2006 10 24 13 29 49 2006 10 24 13 2006 10 24 1 116355 Icudennec Running default 5 2 PASSIVE Scheduled 2006 10 24 17 12 52 2006 10 24 17 12 58 2006 10 24 17 7 Done helpdesk grid5000 fr A Figure 3 4 OAR
46. o create a pointer in the Kadeploy database to the archive and associate to it a name that can be recalled by the kadeploy command as well as a post install archive to finely adjust some system files and make the system bootable on each different deployed node An environment thus consists of this association between an image archive a name and a post install archive along with some other minor parameters To find all the parameters of this command one method is to read the output of karecordenv h Usage karecordenv n name environment name v version version default is 1 d description description a author author s email fb filebas environment image path ft filesit post installation file path s size size Mo al initrdpath initrd path k kernelpath kernel path p kernel parameters fd fdisktype fdisk type default is 83 fs filesystem file system default is ext2 0 optsupport optimisation support default is 0 f file environment h get this help message Here the i and k parameters are pointers to system files within the user environment needed to build the correct boot loader Kadeploy uses grub The files in the follovving example correspond to the default kernel installed in the debian4al1 environment which are still valid in the customized environment unless the user has compiled another kernel yJegoulparasol dev karecordenv n de
47. om Internet at lt https www grid5000 fr gt This site gives access to the real time status of the platform see figure 3 2 the monitoring tools the job reservations on each site the reservation history and the software manuals 3 2 Grid 5000 monitoring tools Various views of the platform state can be obtained from the Grid 5000 web site global view figure 3 2 or ganglia view figure 3 3 User deployed nodes are not monitored by ganglia in general unless a ganglia configuration has been provided in the deployed environment only nodes running the basic Linux environment are monitored 3 3 OAR ER oar is the resource manager or batch scheduler used to manage Grid 5000 platform s resources OAR allows cluster users to submit or reserve nodes either in an interactive or a batch mode Adminis trators can tune OAR to force an exploitation policy See figures 3 4 and 3 5 for two screenshots of OAR visualization tools This batch system is built on a database MySql a script language Perl and an optional scalable administrative tool component of Taktuk framework 6 It is composed of several modules which interact only with the database and are executed as independent programs So formally there is no API the system is completely defined by the database schema This approach eases the development of specific modules Indeed each module such as schedulers may be developed in any language having a database access librar
48. on home check your status with quota yjegoukparasol dev The deploy partition should be selected from the deploy partitions list in general hda3 or sda3 ac cording to the most recent universal partition scheme that has been adopted on Grid 5000 yJegoulparasol dev kadeploy e debian4all N m parasol32 rennes grid5000 fr p sda3 Checking variable definition Checking database access rights WARNING there is no environment named debian4all with user yjegou specified env debian4all does not exist for user yjegou searching for env debian4all with user texttt kadeploy hecking user deployment rights nvalidating deployments older than 1000 arning 0 deployment have been corrected automatically hecking command definition hild orking child 10771 hecking variable definition K kernel duke vmlinuz x86_64 initrd from 84 to 84 nmapCmd usr bin nmap WARNING there is no environment named debian4all with user yjegou Waiting for all the nodes to reboot during 200 seconds lt BootInit 0 gt there on check 131 254 202 132 lt BootInit 1 gt All nodes are ready First Check 128 FOQ usr local bin DKsentinelle lroot crsh c timeout 10000 v m131 254 202 132 umount mnt dest lt PreInstall gt Retrieving preinstall Command Mcat opt kadeploy bin mcat rsh pl p 14052 sc cat opt kadeploy scripts pre_install tgz dc cat gt pre_pipe tra
49. or example by default many distributions ask the root password before checking a filesystem at boot time Two remote management tools are available to diagnose and if possible take control of cluster nodes that are in an unstable or undefined state after a deployment failure The kaconsole tool enables to open a console on a remote node It needs special hardware equipment and special command configuration The kareboot tool enables to do various reboot types on cluster nodes The possibilities are numerous the reboot on a given already installed environment or on a given partition or even on the deployment kernel for instance It also enables to hard reboot nodes appropriately equiped kaaddnode and kadelnode enable to add and remove nodes from the deployment system if the cluster composition changes 3 5 GridPrems GridPrems is a resource reservation middleware used on the clusters in Rennes GridPrems is a web application implemented using PHP and MySQL When making a reservation through GridPrems one can specifie e Which nodes to use Nodes can be chosen by the user or automatically selected e When to reserve Reservations may be repeated daily or weekly e Whether the reserved machines can be shared Exclusive reservations are typically used for bench marking whereas shareable reservations are used for testing and debugging programs GridPrems is not a batch scheduler Therefore it does not automatically run jobs on the
50. ploy user commands are installed on cluster head nodes kaaddnode kaaddnode info file kaaddnode registers nodes and their disk and partition features for the deployment system It can be used at the very beginning to describe a whole cluster after having completed the installation for instance or later when new nodes are added to a cluster already managed by the deployment system Parameter info_file info_file is the name of the file that describes all the information the system needs It contains two main parts and has to be formatted as described below first part node set description node_namel mac_addressl ip address1l node name mac address ip address2 node name3 mac address3 ip address3 N B each node is described on a separate line fields on each line are separated by tabs or spaces second part descriptive information about the type of disk and the partitioning device_nam device_siz partition numberl partition sizel deployed_envl partition_number2 partition_size2 deployed_env2 partition_number3 partition_size3 deployed_env3 44 53 D4 3 1 IST 033576 N B each partition is described on a separate line fields on each line are separated by tabs or spaces All size are given in Mbytes The specified deployed_env should already exist in the deployment system otherwise default value undefined will be switched to the declared one during the execution First and second part are separated by a
51. r nodes 1 cluster cluster 1 Sun V20z SunV20z 2 HP DL 385 HP DL 145G2 Dell PowerEdge Apple Xserve 3 eth1 SP 66 33 16 paravent01 paravent67 paraci17 to 66 to 99 to 32 10Mb Ethernet Hub CentreCom 3612TR Infiniband 10G Myrinet 10G Myrinet 2G Myrinet Switch M3 E32 Infiniband Switches ISR 9024 Myricom Switch Myrinet 10G CLOS ENCL ethernet Ethernet Switches HP 2848 SP Service Processor Figure 2 2 Rennes platform Internet servers Internet connectivity is helpful during software development and installation For instance direct access to Linux distribution mirroring sites facilitates system configuration Grid 5000 site administrators can provide indirect temporary access to such sites through proxies External user sites Grid 5000 can provide access to external users such as XtreemOS members It is possible to open limited connectivity ssh http https between a Grid 5000 cluster head and an Internet IP address or small network 2 3 Rennes Grid 5000 site All XtreemOS members from INRIA work in the PARIS research group in Rennes The PARIS research group is also in charge of the Grid 5000 platform in Rennes Rennes platform will be the access point to the Grid 5000 testbed for XtreemOS partners Rennes Grid 5000 platform provides 4 clusters model Dell PowerEdge 1750 Nb nodes 64 CPU Intel Xeon IA32 2
52. r the environment 51 53 XtreemOS Integrated Project IST 033576 D4 3 1 v version version the version number of the environment if needed default is I d description description a brief description of the environment a author author email the author email address fb filebase environment image path the complete path to the environment image ft filesite post installation file path the complete path to the post installation file s size size Mb the size of the environment in order to perform partition size check before a deployment k kernelpath kernel path the complete kernel path in the environment fd fdisktype fdisk type number the fdisk type default is 83 fs filesystem file system type the file system type default is ext2 root directory root directory sets the root directory for the environment creation X exclude from file excludes files named into file from the generated environ ment zy gzib filters the archive through gzip output directory output directory sets the output directory for the generated environment de fault is current directory Example kacreateenv e toto env description toto s special image author toto fai fr fb file home images_folder toto_env tgz ft file home images folder toto spec file tgz size 650 k boot vmlinuz root directory home toto img X path to file to exclude txt z
53. re Because researchers are able to boot and run their specific software stack on Grid 5000 sites and ma chines it is not possible to make any assumption on the correct configuration of the security mecha nisms As a consequence we should consider that Grid 5000 machines are not protected Two other constraints increase the security issue complexity 1 all the sites hosting the machines are connected by the Internet 2 basically inter site communication should not suffer any platform security restriction and overhead during experiments From this set of constraints a two level security design is implemented with the following rules a Grid 5000 sites are not directly connected to the Internet no communica tion packet from the Internet will be able to directly enter Grid 5000 and no communication packet from Grid 5000 will be able to directly exit to the Internet and b there is no limitation in the communica tion between Grid 5000 sites The first rule ensures that Grid 5000 will resist hacker attacks and will not be used as basis of attacks i e massive DoS or other more restricted attacks These design rules lead to build a large scale confined cluster of clusters Users connect to Grid 5000 from the lab where the machines are hosted First a strong authentication and authorization check is done to enter the lab A second authentication and authorization check is done when logging from the lab to the Grid 5000 nod
54. ript can be adapted for automatic deployment bin sh written by David Margery for the 2006 Grid 5000 spring school feel free to copy and adapt echo n Script running on x hostname T SOAR_FILE_NODES file may contain some nodes more than once We will create a second file with each node only once T OAR FILE NODES UNIQUE OAR FILE NODES unique cat OAR FILE NODES sort u 5 OAR FILE NODES UNIQUI Gl deploy our test environnement kadeploy e ubuntu4all whoami f SOAR_FILE_NODES_UNIQUE p sda2 do stuff with this deployed environment for i in cat OAR FILE NODES sort u do echo attempt to get information from 51 ssh root Si o StrictHostKeyChecking no cat etc hostname ssh root i o StrictHostKeyChecking no uname a done The latest version of this script is available in the subversion repository of the Grid 5000 on InriaGforge To get a copy of this script as vvell as other shared material outside svn checkout svn scm gforge inria fr svn grid5000 tools Grid5000_tools XtreemOS Integrated Project 38 53 D4 3 1 IST 033576 1 a user version of this script can be saved in user s home directory home mySite myLogin myPath auto_deploy sh 2 the environment must be updated in the script 3 this script can be submitted to OAR frontend oarsub 1 nodes 2 q deploy home myS
55. se it with the deployment system Options n name registration name the registration name for the environment v version version the version number of the environment if needed default is I d description description a brief description of the environment a author author email the author email address fb filebase environment image path the complete path to the environment image Lt filesite post installation file path the complete path to the post installation file s size size Mb the size of the environment in order to perform partition size check before a deployment i initrdpath the complete initrd path in the environment including intrd name k kernelpath kernel path the complete kernel path in the environment including ker nel name p arguments passed to kernel at boot parameter is set auto matically if left empty default kernel parameters can be configured fd fdisktype fdisk type the fdisk type default is 82 fs filesystem file_system the file system type default is ext2 Name kernel path environment image path and post installation file path must be defined Example karecordenv n debian v 2 d debian ready for installation a katools imag fr fb file home nis jleduc ImagesDistrib image Debian current tgz ft file home nis jleduc Boulot postinstall traitement tgz size 650 k boot vmlinuz i initrd p co
56. sers toto prog The home users toto prog script will be run on the first of 10 nodes with 3 cpus so a total of 30 cpus in the default queue with a walltime of 2 15 00 Moreover p option restricts resources only on the switch sw1 oarsub r 2004 04 27 11 00 00 1 nodes 12 cpu 2 A reservation will begin at 2004 04 27 11 00 00 on 12 dual nodes one oarsub C 42 Connects to the job 42 on the first node and set all OAR environment variables oarsub I Gives a shell on a resource XtreemOS Integrated Project 42 53 D4 3 1 IST 033576 A4 oardel This command is used to delete or checkpoint job s They are designed by their identifier Option c job id send checkpoint signal to the job signal was definedwith signal option in oarsub Examples oardel 14 42 Delete jobs 14 and 42 oardel c 42 Send checkpoint signal to the job 42 A 5 oarhold This command is used to remove a job from the scheduling queue if it is in the Waiting state Moreover if its state is Running oarhold can suspend the execution and enable other jobs to use its resources In that way a SIGINT signal is sent to every processes A 6 oarresume This command resumes jobs in the states Hold or Suspended 43 53 XtreemOS Integrated Project Appendix B Kadeploy reference manual This manual section has been extracted from the official Kadeploy web site at 3 More informations can be found on this site All Kade
57. ssh yjegou parasol dev rennes grid5000 fr yjegoukparasol dev If needed the user may then internally jump onto the frontend of another cluster following his deploy ment needs yjegou parasol dev Sssh oar another site grid5000 fr 4 2 Finding and deploying existing images From the cluster frontend the user can locate registered environments find deployable partitions make reservations on deployable nodes and deploy his own operating system image using kdeploy Image localization Each site maintains an environment library in the grid5000 images directory of the OAR node Image descriptions can be located from the Grid 5000 wiki at Sidebar gt User Docs gt Kadeploy gt Kadeploy environments directory The following environments are currently available Debian4all Debian environment that should comply with all basic needs and all sites This environ ment is to be taken as the base to build more complex Debian environments 27 53 IST 033576 D4 3 1 Postinstall generic rennes This page is an example of the standard description of a post install script General information Script s version 1 version 2 also exists Script s name generic post install Script s site Rennes reference file file grid5000 images generic postinstall tgz md5sum 01dd5faeb120e33ff30e11b432abf118 Pre requisites none Effect root ssh keys installed yes machine s usual ssh key installed yes Idap ac
58. sted reboot on the given node see configu ration file section for further details Options n noreboot setups pxe grub and t ftp stuff soft requests a software reboot default reboot type h hard requests a hardware reboot d deploy specify to reboot on the deployment kernel m machine hostname specifies a host This option can be used more than once to specify several hosts e environment gives an already installed environment to boot Only one environment can be specified p partition gives a partition to boot on Only one partition can be spec ified Configuration file etc kadeploy deploy cmd conf 47 53 XtreemOS Integrated Project IST 033576 D4 3 1 For each node it is possible to define the reboot commands for the soft hard and deploy reboot types in the etc kadeploy deploy_cmd conf configuration file Here is the description of the expected syntax host_name reboot_type command where reboot_type can be either softboot hardboot or deployboot Example for idpot1 of idpot cluster idpotl softboot ssh root idpotl o ConnectTimeout 2 sbin reboot idpotl hardboot ssh localuser ldap idpot commande_modules logiciel dio_reset 0x81 0 idpotl deployboot boot setup_pxe pl A 192 168 10 1 1abel deploy tg3 In this example e soft reboot is actually a simple reboot command done via a ssh connection as root e hard reboot is a hard signal sent via a ssh conne
59. strange and arbitrary line starting with the following char acters Empty lines and commentaries starting with are ignored Before doing any modification in the deployment system database kaaddnode tries to check if the expected description syntax is respected in both parts Example kaaddnode description txt with description txt as below Nodes nodel 00 02 73 49 9C 8D 192 168 10 1 node2 00 02 73 49 9C 8E 192 168 10 2 Vi Disk amp Parts hda 80000 1 5000 swap 2 20000 debian B 2 kadelnode kadelnode m machine hostname kadelnode unregisters nodes and their disk and partition features from the deployment system Options m machine hostname specifies a host This option can be used more than once to specify several hosts Example kadelnod m nodel Remove all information relative to node1 from the deployment system database B 3 kadeploy kadeploy el environment environment_name N ml machine hostname pl partition partition 45 53 XtreemOS Integrated Project IST 033576 D4 3 1 kadeploy deploys the specified environment on the requested nodes When successfully over the nodes are available under the new and freshly installed environment It needs critic variables to be defined in the etc kadeploy deploy conf configuration file see configuration file section for further details Options environment environment name the name of the environment to deploy m machin
60. system environments IST 033576 D4 3 1 XtreemOS Integrated Project 2 53 Contents 1 Introduction 5 2 Grid 5000 platform 7 ZA Motivations Ae kB ig ee AS Gr fe ANS ie a a 7 22 Architecture vener gh a Rc ea Tap ee ae PE Ae es SAE TG FS GSE eS 8 2 2 1 User view and data management 9 2 22 Hardware vs s ta sann deg et ewe oe Gade Oe and aoe NG 10 2 2 3 Programming environment 10 2 2 4 Accessibility policy 10 2 3 Rennes Grid 5000 site satan GP ARE pr SE EG dyr ne 13 2 4 Grid 5000 interconnection 14 3 User tools 17 3 1 Grid 5000 user portal 17 3 2 Grid 5000 monitoring tools rea 17 DI DAR p pA eg A hoe lied Ae cae dnd hee ge Ed Ah go 17 3 3 1 OAR examples av RA A A oh 22 3 4 Kadeplioy i cath ani par BAT a sie Sa ag BAT Ode Be 23 3 5 G idPremsis ha sas date sed ask ae tl on R SPOR eect an Gand eee Boa 25 3 6 Getting an account ee 26 4 Grid 5000 deployment tutorial 27 4 1 Connection to Grid 5000 27 4 2 Finding and deploying existing images 27 4 3 Making a reservation on a deployable node 30 4 4 Deploying an environment
61. t e g the generic user g5k and move in all experiment data at the beginning and back at the end of an experiment using scp or rsync A more elaborate approach is to locally recreate the user Grid 5000 account with the same uid gid on the deployed environment This second approach could simplify file rights management if temporary data must be stored on shared volumes The user Grid 5000 account parameters can be obtained from the frontend using id yjegou parasol dev id uid 19000 yjegou gid 19000 rennes groups 19000 rennes 19001 staff35 yjegou parasol dev Then as root this account can be created on the deployed machine none addgroup gid 19000 rennes Adding group rennes 19000 Done none adduser uid 19000 ingroup rennes yjegou Adding user yjegou Adding new user yjegou 19000 with group rennes Creating home directory home yjegou Copying files from etc skel Enter new UNIX password Retype new UNIX password passwd password updated successfully none Finally as root the user can place his ssh key and switch to his new account none cp root ssh authorized_keys2 home yjegou ssh none chown yjegou rennes home yjegou ssh authorized_keys2 none su yjegou yjegou none 4 5 2 Creating a new environment from a customized environment The user can now save this customized environment to be able to use th
62. tion actions bench partitionning and or file system building if needed etc e environment copy post installation script sending and execution e reboot on the freshly installed environment If the deployment fails on some nodes these are rebooted on a default environment Kadeploy allows the customization of each node by 2 means e preinstallation script executed before sending the system image e postinstallation script executed after having sent the system image Originally these two scripts are written in ash which is a lightweight bash but the way these scripts are designed could allow to add any script language Preinstallation script This script is common to all environments Its goal is to prepare the system to the hardware specification and the target hard disk drive for the deployment It can load a spe cific IDE controller driver improve deployment performance or make every kind of checks needed This script is defined in the configuration file as pre_install_script and the associated archive as pre_install_archive The preinstallation archive is a gzipped tar archive containing the pre install script in its root directory The directory structure allows to custom the tasks to the user needs Postinstallation script This script is associated to the environment to deploy Its goal is to adapt the raw system image to a bootable system It is composed of a gunzipped tar archive that contains all the site files and
63. utorials will be proposed Grid 5000 SC 06 2 SuperComputing 2006 will take place in Tampa Florida USA from November 11th to 17th Grid 5000 Paristic 2006 Y The 2006 s edition of Parsitic will be hosted by the LORIA in Nancy from November 22th to 24th PlugTest 20068 GRIDS work CoreGRID Conference Grid Plugtests and Contest will take place in ETSI Headquarters Sophia Antipolis from November 27th to December 156 gt Grid 5000 network interlink migration toward Renater4 The migration of our network interlink is in progress Rennes Nancy Toulouse Lille and Grenoble and Sophia are now interconnected using the new infrastructure dark fiber with already a 10Gb link between Sophia Rennes and Nancy read more news Grid 5000 at a glance Grid 5000 project aims at building a highly reconfigurable controlable and monitorable experimental Grid platform gathering 9 sites geographically distributed in France featuring a total of 5000 CPUs Sites Bordeaux Lyon Rennes Grenoble Nancy Sophia Antipolis Lille Orsay Toulouse Grid 5000 sites 7 The main purpose of this platform is to serve as an experimental testbed for research in Grid Computing This project is one initiative of the French ACI Grid incentive see below Funding Institutions which provides a large part of Grid 5000 funding on behalf of the French Ministry of Research amp Education www grid5000 fr E
64. xpert level practices Typing the proper escape sequence gets out of this remote console without having to kill the shell or terminal window 4 5 1 Customization For automated deployment of experiments the easiest is to enable password less connections to the machines Therefore this example shows how to create create an environment where the user ssh key 1s stored in the authorized_keys file of the root account Note post install scripts sometimes change this file In this case it is possible to use authorized_keys2 Therefore the user can 1 generate a passphrase less ssh key on the front end 2 connect as root to the machine where the environment is deployed 3 edit root ssh authorized_keys2 and add to it the new public ssh key e because oar submission does not transfer ssh authentication through agent forwarding a passphrase less ssh key is necessary 4 check that connection is now possible from his account on the frontend node with no interaction Using the root account for all experiments is possible but in general it is better to create a user accounts for all users of this new environment However if the number of users is greater than 2 or 3 it is faster to configure an LDAP client by tuning the post install script or using a fat version of this script beyond the scope of this example Two ways of doing things XtreemOS Integrated Project 34 53 D4 3 1 IST 033576 The simplest is to create a dedicated accoun
65. xt A job is defined by needed resources and a script program to run So the user must specify how many resources and what kind of them are needed by his application Thus OAR system will give him or not what he wants and will control the execution When a job is launched OAR executes user program only on the first reservation node So this program can access some environment variables to know its environment SOAR_NODEFILE contains the name of a file which lists all reserved nodes for this job OAR JOBID contains the OAR job identificator OAR RESOURCE PROPERTIES FILI contains the name of a file which lists all resources and their properties E SOAR_NB_NODES contains the number of reserved nodes Options q queuename specify the queue for this job I turn on INTERACTIVE mode OAR gives the user a shell instead of executing a script 1 resource description defines resource list requested for this job the different param eters are resource properties registered in OAR database see ex amples below walltime Request maximun time Format is hour mn sec hour mn hour after this elapsed time the job will be killed p properties adds constraints for the job format is a WHERE clause from the SQL syntax r 2007 05 11 23 32 03 asks for a reservation job to begin at the date in argument C job id connects to a reservation in Running state k duration asks OAR to send the checkpoint signal to t
66. y The main features of OAR are e Batch and Interactive jobs e Admission rules 17 53 IST 033576 File Edit View Go Gridso00 Home Grds000 Bookmarks Tools Help da ub amp E 9 E htpsywww grid5000 fr mediawki index php Grid5000 Home ale O co a EN Bienvenue IRISA B Xtreemos wiki Ho gs Temperature history 2 Ganglia Grid5000 Grid Report 25 Grid5000 Home Grid5000 A Create an account or log in r Grid 5000 project page discussion view source history Ka public portal Public Home Re Get an account People Timeline Hardware Network Software Experiments Publications Press releases users portal Community Home Users Charter jars Reports Platform events Platform status Bugzilla Support Users Docs FAQ committees portal Todo Meetings Members Admins Docs Reference Docs wik special pages Recent changes Wanted pages Upload fies in Al pages Wiki help toolbox What links here Related changes Upload file Special pages Done parta TT fare Grid5000 Home Welcome to Grid 5000 official web site 5000 CPUs distributed in 9 sites for research in Grid Computing eScience and Cyberinfrastructures Latest news Journ es Grid 5000 Lille October 30th and 31th Grid 5000 Lille site s days will take place in USTL campus Villeneuve d Ascq Project platform presentations and t

Download Pdf Manuals

image

Related Search

Related Contents

LED TV SERVICE MANUAL - Pdfstream.manualsonline.com  bp208 Installationshandbuch  Samsung Galaxy Tab S2 (9,7" Wi-Fi) Brugervejledning(Lollipop)    Yamaha DD-35 Musical Instrument User Manual  Manual DSC-Four 131016  Soehnle Slim Design Riva    Hewlett Packard Enterprise ProLiant DL380e Gen8 CTO  especificação de acabamento  

Copyright © All rights reserved.
Failed to retrieve file