Home

Tůmův text s mými vpisky

1. Chapter 4 Device Management 32 MSB Z do Product Revision Level 35 36 ee ee Vendor Specific 55 56 A TP Reserved 95 96 m mp eS Additional Vendor Specific n cat proc scsi scsi Attached devices Host scsi0 Channel 00 Id 00 Lun 00 Vendor QUANTUM Model ATLAS10K2 TY184L Rev DA40 Type Direct Access ANSI SCSI revision 03 Host scsil Channel 00 Id 05 Lun 00 Vendor NEC Model CD ROM DRIVE 466 Rev 1 06 Type CD RO ANSI SCSI revision 02 Host scsi2 Channel 00 Id 00 Lun 00 Vendor PLEXTOR Model DVDR PX 708A Rev 1 02 Type CD RO ANSI SCSI revision 02 References 1 SCSI 1 Standard 2 SCST2 Standard 3 SCSI 3 Standard 4 Heiko Eifsfeldt The Linux SCSI Programming HowTo Example PCI The PCI bus provides configuration in a form of a configuration space which is sep arate from memory space and port space Apart from the usual port read port write memory read memory write commands the C BE signals can also issue configura tion read 1010b and configuration write 1011b commands Because address bus cannot be used to address devices whose address is not yet known each slot has sep arate IDSEL signal which acts as CS signal for configuration read and configuration write commands Each device can have up to 8 independent functions each function can have up to 64 configuration registers the first 64 bytes are standardized T
2. Chapter 2 Process Management 1 vlakno vlakno outside critical section 1 krit sekce readers and writers n tokenu reader bere a vraci 1 token writer musi vzit vsech n tokenu a vratit inside critical section krit sekce Figure 2 31 Mutual Exclusion Petri Net Rendez Vous Rendez Vous models a scenario where several processes must reach a given state simultaneously before rendez vous rendez vous after rendez vous Figure 2 32 Rendez Vous Petri Net 64 Producer And Consumer bounded buffer problem problem konecneho bufferu Producer And Consumer models a scenario where several processes produce items and several processes consume items The items are stored in a buffer of a limited size The synchronization problem requires that the buffer neither underflows nor overflows or in other words that no producer attempts to put an item into a full buffer and that no consumer attempts to get an item from an empty buffer 2 placy FULL a EMPTY pocet tokenu pocet plnych prazdnych mist producent sezere prazdny misto vytvori plny konzument sezere plny misto vytvori prazdny 60 Chapter 2 Process Management Readers And Writers Readers And Writers models a scenario where several processes write shared data and several processes read shared data The synchronization problem requires that no two writers write the data simultaneously and that no reader reads the data w
3. Code of critical section comes here bIWantToEnter false Other variants of the two algorithms exist supporting various numbers of processes and providing various fairness guarantees When the only means for synchronization is a shared memory that supports atomic reads and writes any fair deterministic solution of the mutual exclusion problem for N processes has been proven to need at least N shared variables From practical point of view our assumption that shared memory can only be atom ically read and written is broadly correct but often too stringent Many processors offer atomic operations such as test and set or compare and swap which test wheter a shared variable meets a condition and set its value only if it does The utility of these operations is illustrated by fixing the naive solution to the mutual exclusion problem which is made safe by using the AtomicSwap operation The operation sets a new value of a shared variable and returns the previous value Alternativa while AtomicSwap bCriticalSectionBusy true Test amp Set atomick operace nastvai zamek na true a vr t jeho puvodni hodnotu gt true u bylo zam eno cyklujeme gt false zamknul jsem si z mek pro sebe hur Active waiting cycle until the value of the bCriticalSectionBusy variable has changed from false to true Code of critical section comes here bCriticalSectionBusy false When the only means fo
4. References 1 Agner Fog Software Resources http www agner org optimize Optimization vic registrovejch sad se navenek tvari jakc vic samostatnejch procesoru Edited by Foxit Reader Copyright C by Foxit Corporation 2005 2009 For Evaluation Only Chapter 1 Introduction Advances In Memory Architecture Virtual Memory Virtual memory makes it possible to create a virtual view of memory by defining a mapping between virtual and physical addresses Instructions access memory using virtual addresses that the hardware either translates to physical addresses or recog nizes as untranslatable and requests the software to supply the translation motivace procesor je rychlejsi Instruction And Data Caching nez RAM cim je pamet vetsi tim dal Memory accesses used to fetch instructions and their operands and results exhibit behaj signaly a tim padem je locality of reference A processor can keep a copy of the recently accessed instructions pomalejsi and data in a cache that can be accessed faster than other memory proto je cache rychlejsi uz jen A cache is limited in size by factors such as access speed and chip area A cache tim ze je mensi may be fully associative or combine a limited degree of associativity with hashing Multiple levels of caches with different sizes and speeds are typically present to ac pamet se adresuje adresou commodate various memory access patterns cache klicem j E The copy of th
5. Forward from local machines A FORWARD FROM LOCAL j ACCEPT Forward from world machines A FORWARD FROM WORLD m state state ESTABLISHED RELATED j ACCEPT A FORWARD FROM WORLD j REJECT COMMIT nat PREROUTING ACCEPT 0 0 POSTROUTING ACCEPT 0 0 OUTPUT ACCEPT 0 0 A PREROUTING s 192 168 0 128 25 p tcp dport http j REDIRECT to ports 3128 A PREROUTING s 192 168 0 128 25 p tcp dport smtp j REDIRECT to ports 25 A POSTROUTING o pppO s 192 168 0 128 25 j MASQUERADE COMMIT Use iptables L v to list the current rules kdy mam moc paket tak Packet Scheduling jak se rozhodnu bde mit prednost kterej Given that neither the network capacity nor the queues capacity is infinite it is pos sible to overload the network or the queues with packets To prevent that packet policing is used to discard input packets and packet scheduling is used to time out put packets 147 Chapter 6 Network Subsystem 145 Stochastic Fair Queuing The stochastic fair queuing algorithm is used when many flows need to compete for bandwidth The algorithm approximates having a queue for each flow and sending data from the queues in a round robin fashion Rather than having as many queues as flows however the algorithm hashes a potentially large number of flows to a relatively small number of queues To compe
6. Reads and writes are causally ordered Other memory ordering models include the strong ordering model where all reads and writes are carried out in the program order or the write back ordering model where writes to the same cache line can be combined The memory ordering models can be set on a per page and a per region basis The LFENCE SFENCE MFENCE instructions can be used to force ordering The LFENCE instruction forms an ordering barrier for all load instructions the SFENCE instruction forms an ordering barrier for all store instructions the MFENCE instruc tion does both LFENCE and SFENCE The PAUSE instruction can be used inside an active waiting cycle to reduce the poten tial for collisions in instruction execution and to avoid executing the active waiting cycle to the detriment of other work on hyperthreading processors References 1 Intel Intel 64 and 32 Architectures Software Developer Manual Example Memory Model On MIPS32 Processors The MIPS32 processors may implement a cache coherency protocol One of five mem ory coherency models can be set on a per page basis Uncached the data are never cached reads and writes operate directly on mem ory Noncoherent the data are cached reads and writes operate on cache no coherency is guaranteed Chapter 2 Process Management Sharable the data are cached reads and writes operate on cache write by one processor invalidates caches of other processor
7. 10 Explain the role of a disk partition table 11 List the features that a network interface hardware device typically provides 115 Chapter 4 Device Management Rehearsal Exercises 1 Navrhn te rozhran mezi ovlada em disku nab zej c m funkce pro ten a z pis bloku sektor schopn zpracov vat v ce po adavk sou asn a vy mi vrst vami opera n ho syst mu Pro v mi navr en rozhran popi te architekturu wus ovlada e disku kter je schopna obsluhovat p eru en a vol n z vy ch vrstev opera n ho syst mu v etn algoritmu o et en p eru en a algoritm funkc pro ten a z pis bloku sektor Notes Still a sketch Understanding is essential Understanding is essential Understanding is optional Understanding is optional Understanding is optional Understanding is essential Understanding is recommended soy Qe cogo OPUS Sos Just a curiosity A Understanding is recommended k k Understanding is recommended r N Just a curiosity A a Understanding is optional m n Just a curiosity kr O1 Understanding is recommended G Just a curiosity A N Understanding is recommended A oo Understanding is recommended A NO Understanding is recommended N e Just a curiosity N Understanding is recommended N N Understanding is optional N W Understan
8. CTS clear to send cca 11kB s 110 p eru en na ka dym Byte bajtu start bit U Vyu it hodin kalend pl nov n proces v preemptivn m multitaskingu tov n strojov ho asu alarmy pro user procesy watchdogs pro syst m profilov n zen Principy hodinov ho hardware odvozen od s t a od krystalov ho oscil toru Mo n funkce hardware tedy samotn ta one time counter periodic counter kalend Vyu it hodinov ho hardware pro r zn vyu it hodin pro kalend pro pl nov n ekvidistantn tiky chce interrupt pro eventy nastavov n one time counteru je ne jlep lze i jinak watchdog timer profiling bu statisticky koukat kde je program nebo p esn m it Example PC Clock First source Intel 8253 or Intel 8254 counter with 65536 default divisor setting yield ing 18 2 Hz interrupt which roughly corresponds to the original PC processor clock of 4 77 MHz divided by 4 Second source Motorola 146818 real time clock with CMOS RAM with 32768 kHz clock yielding 1024 Hz interrupt Other sources Keyboard Klasick p klad character device Kl vesnice bez adi e Kl vesnice s adi em p ek dov n kl ves type ahead buffer p ep n n focusu zm nka o X Windows na pomal ch po ta ch Example code keyboard handleru ready to send MYS p vodn na serial portu pos l po ta i dx a dy t eba 2 7
9. Chapter 6 Network Subsystem Rehearsal Questions 1 Popi te socket jako abstrakci rozhran opera n ho syst mu pro p stup k s ti podle Berkeley sockets Uve te z kladn funkce tohoto rozhran v etn hlavn ch argument a s mantiky 2 Vysv tlete el funkce select v rozhran opera n ho syst mu pro p stup k s ti podle Berkeley sockets 3 Vysv tlete k emu slou sockety v dom n PF UNIX 4 Vysv tlete k emu slou sockety v dom n PF NETLINK 5 Popi te princip funkce mechanismu vzd len ho vol n procedur a na rtn te obvyklou architekturu jeho implementace Network Subsystem Internals daj se tam dat zkratky aby ka dej packet nemusel Queuing Architecture proj t pfes n vrstev data se nemusej kopirovat furt sta ikovn sd let v pam ti The architecture of the network subsystem typically follows the architecture of the protocols used by the network subsystem At the lowest level the device drivers pro vide access to the network interfaces At the highest level the socket module imple ments the Berkeley socket interface In between the protocol modules implement the ARP IP UDP TCP and other protocols The modules typically communicate through filtry jednoduch pravia1gueues of packets jak se k meu chovat As described the architecture has two pitfalls both related to a potential loss of effi ciency when a large number of modules processes pac
10. T T VALUE ABS 4 ABS 4 t Ox080ddc44 Ox080a5810 gmon_start environ BC freeaddrinfo mbrlen Figure 2 7 ELF Relocations Example 080ddc5c 00095c5c 2xx 5 The program header table lists all the segments of the file Each segment has a type a position and length in the file an address and length in memory and flags The content of a segment is made up of sections Examples of important types include loadable a segment that will be loaded into memory Data within a loadable seg ment must be aligned to page size so that the segment can be mapped rather than loaded dynamic a segment that contains the dynamic linking information including relo cations symbol tables dynamic libraries and initialization and termination func tions interpreter a segment that identifies program interpreter When specified the pro gram interpreter is loaded into memory instead of the program The program in terpreter is responsible for executing the program The dynamic segment is used by a dynamic loader that is specified in the interpreter segment objdump x bin bash Program Header PHDR off 0x00000034 vaddr filesz 0x00000100 memsz INTERP off 0x00000134 vaddr filesz 0x00000013 memsz LOAD off 0x00000000 vaddr filesz 0x000904a0 memsz STACK off 0x00000000 vaddr filesz 0x00000000 memsz Dynamic Section NEEDED libtermcap so 2 NEEDED li
11. The SYSENTER instruction sets the stack pointer and instruction pointer registers to values specified by the operating system in advance to point to the operating system code executing at the most privileged level The SYSEXIT instruction sets the stack pointer and instruction pointer registers to values specified by the operating system in registers ECX and EDX to point to the application code executing at the least privileged level Note that the SYSENTER and SYSEXIT instructions do not form a complementary pair that would take care of saving and restoring the stack pointer and instruction pointer registers the way CALL and RET instructions do It is up to the code using the SYSENTER and SYSEXIT instructions to do that Chapter 2 Process Management Example Linux System Call API On Intel 80x86 The libraries wrapping the system call interface are called in the same way as any other libraries Ssize t read int fd void xbuf size t count int hFile ssize_t iCount char abBuffer 1024 iCount read hFile amp abBuffer sizeof abBuffer pushl 1024 sizeof abBuffer pushl SabBuffer amp abBuffer pushl hFile hFile call read plt call the library addl 12 esp remove arguments from stack movl eax iCount save result Figure 2 11 Library System Call Example The system call interface uses either the interrupt vector 80h which is configured to lead to the kernel through a trap gate or the SYSENTER
12. e byla schopna vr tit zaslan unik tn slo U1 za ifrovan jeho tajn m kl em KC Klient po le serveru zpr vu ve kter uvede tiket T Server po le klientovi zpr vu za ifrovanou kl em KR ve kter uvede unik tn slo U2 Chapter 7 Security Subsystem Klient po le serveru zpr vu za ifrovanou kl em KR ve kter uvede domluvenou transformaci unik tn ho sla U2 Server ov pravost klienta t m e byl schopen prov st transformaci unik tn ho sla U2 se znalost kl e KR Zb vaj c slabinou tohoto protokolu je mo nost vyd vat se za klienta v situaci kdy skute n klient nap klad havaruje a v pam ti z stane kl KR Kerberos tento prob l m e dopln n m asov ho raz tka tak aby tiket T a tedy kl KR bylo mo n pou t jen omezenou dobu po kter klient po d autoritu o obnoven References 1 Coulouris G Dollimore J Kindberg T Distributed Systems Concepts And Design Rehearsal Ouestions 1 Vysv tlete term n authentication Authorization co ten Glov k kterej se autentikoval smi d lat Authorization je probl m rozhodnut zda je dana aktivita proces uZivatel opr vn na ud lat n jakou akci nad n jak m prost edkem soubor za zen Activities do Actions on Resources Ov en pr v se s oblibou modeluje tak e se definuje mno ina aktivit mno ina prost edk a mno ina akc a pak se do
13. www radium ncsc mil tpep epl epl by class html existuj v roce 2000 tyto secure syst my Al Z dny opera n syst m Z dn aplikace dva routery od Boeing a Gemini Com puters B3 opera n syst my XTS 200 a XTS 300 od Wang Federal bin rn kompatibiln s UNIX System V na Intel platform ch ale aby m l B3 mus m t speci ln hardware pou v security a integrity levels ala Bell LaPadula Biba dn aplikace dn router B2 opera n syst my Trusted XENIX 3 0 a 4 0 od Trusted Information Systems bin rn kompatibiln s IBM XENIX dn aplikace router DiamondLAN od Cryptek Secure Communications B1 opera n syst my UTS MLS od Amdahl Corporation CA ACF2 MVS od Com puter Associates SEVMS VAX 6 od DEC ULTRIX MLS od DEC CX SX 6 od Har ris Computer Systems HP UX BLS 9 od HP Trusted IRIX B od SGI OS1100 2200 od Unisys aplikace INFORMIX Secure 5 od Informixu Trusted Oracle 7 od Oracle Secure SOL 11 od SyBase routery C2 opera n syst my AOS VS 2 od Data General OpenVMS VAX 6 od DEC OS 400 na AS 400 od IBM Windows NT 4 od Microsoftu Guardian 90 od Tandem aplikace Microsoft SOL2000 8 C1 se ji nevyhodnocuje Seznam obsahuje pouze komer n dostupn syst my nav c se zhruba od roku 2000 ji nepou v ale st le je zn m a proto zasluhuje zm nku 161 Chapter 7 Security Subsystem Example NIST CCEVS Soucasn pouZ van je Common Criteria Evaluatio
14. 1 10 1 z 10 tj v sou tu stejn dlouho po uplynut Cas kvanta ale ast jc wyd lim v echny ME dv ma ni priorita vy algoritmus Earliest deadline first tj kdy na n co ek tak stoup nahoru kdy furt b kles dolu asov kvantum bo b m lokdy Chapter 2 Process Management Example Archetypal UNIX Scheduler Nejprve star scheduler z UNIXu dynamick priority nepreemptivn kernel Pri orita je slo 0 127 ni slo vy priorita default 50 Ka d proces m current pri ority 0 127 pro kernel rezervov no 0 49 user priority 0 127 processor usage counter 0 127 a nice factor 0 39 ovlivn n p kazem nice Current priority v aplikaci je rovna user priority v kernelu se nastav podle toho na co proces ek nap klad po ek n na disk se nastav na 20 m se zaru e proces po ukon en ek n rychle vypadne z kernelu Processor usage count se inkrementuje p i ka d m kvantu spot ebovan m procesem zmen uje se podle magick formule jednou za as t eba na polovinu ka dou vte inu nebo podle load average aby p i zat en m procesoru processor usage count neklesal moc rychle Load je pr m rn po et spustiteln ch proces v syst mu za n jak as User priority se pak vypo t jako default priority processor usage counter 4 nice factor 2 vues Nev hody Prvn does not scale well Pokud b h
15. 443 Oxffff flowi 0 0 flowid 1 2 149 tc qdisc add dev pppO0 parent 1 3 tbf rate 128kbit buffer 100000 latency 100s Chapter 6 Network Subsystem tc filter add dev ppp parent 1 protocol ip prio 1 u32 match ip dport 25 Oxffff flowic tc filter add dev ppp parent 1 protocol ip prio 1 u32 match ip dport 465 Oxffff flowi The example first attaches a priority queuing discipline to ppp0 The queuing disci pline distinguishes three priority bands and schedules higher priority bands before lower priority bands Next the example attaches a shared fair queuing discipline as a child of the priority queuing discipline with priority 1 The queuing discipline schedules packets from multiple streams in round robin manner A series of filters then tells that ICMP IP protocol 1 SSH port 22 DNS port 53 and outgoing web replies port 80 packets belong to class 1 1 which is this queuing discipline Next the example attaches a shared fair queuing discipline as a child of the priority queuing discipline with priority 2 A filter then tells that all packets belong to class 1 2 which is this queuing discipline The filter has a priority 9 as opposed to priority 1 of other filters this makes it the last filter matched Next the example attaches a token bucket discipline as a child of the priority queu ing discipline with priority 3 The queuing discipline schedules packets with a band width limit A pair of filters then tells that outgoin
16. 7 pointer do pointer table 5 nebo 6 pointer do page table zbyte pointer do str nky 8 nebo 4 KB podle bitu P registru TCR Tak je k dispozici mo nost vymezit ty mi TTR registry ty i bloky virtu ln ch adres kter se nep ekl daj Jinak um v podstat v echno co Intel s jednou v c nav c toti di rectory table m e krom norm ln ho page descriptoru obsahovat je t indirect page descriptor kter ukazuje na skute n descriptor ulo en n kde jinde v pam ti To se hod pokud se jedna str nka sd l na v ce virtu ln ch adres ch pak toti m e st le m t jen jeden dirty bit Obr zek p ekladu je v MC68060 User s Manual Section 4 Memory Management Unit M lo zaj mav je obr zek 4 1 kter jen ukazuje e se odd luje adresov a da tov cache TLB se k ATC a maj 64 polo ek stupn asociace 4 ka d Obr zek 4 4 ukazuje form t translation control registru Obr zek 4 5 ukazuje form t transparent translation registr Obr zek 4 7 ukazuje rozd len virtu ln adresy Obr zky 4 10 a 4 11 ukazuj form t str nkovac ch tabulek G je global U je accessed M je dirty W je read only CM je cache PDT a UDT jsou typy polo ky s hlavn funkc rozli en present Obr zek 4 12 ukazuje p klad p ekladu adresy Obr zek 4 13 ukazuje p k lad p ekladu adresy sd len str nky Obr zek 4 14 ukazuje p klad p ekladu adresy copy on write str nky Obr zek 4 19 vysv tluje s
17. e t da z stat unregulated pokud nen over limit nebo pokud m p edka na rovni i kter nen over limit a ve stromu nejsou dn unsatisfied t dy rovn ni ne i Drobn nedostatek algoritmu je p li slo it podm nka regulace Proto se definuje Ancestor Only Sharing ve kter m t da z st v unregulated pokud nen over limit nebo pokud m p edka kter je under limit Nev hodou tohoto p stupu pocho piteln je e bude omezovat over limit t dy i tehdy pokud tyto moment ln nikomu nevad Dal variantou je Top Level Sharing kter definuje maxim ln rove ze kter si je t t dy sm p j ovat p enosov p smo T da pak sm z stat unregulated pokud nen over limit nebo pokud m p edka do dan rovn kter je under limit pravou maxim ln rovn se pak d tento algoritmus regulovat pro nekone nou rove je stejn jako Ancestor Only Sharing pro stejnou rove jako je nejmen rove un satisfied t dy je t m stejn jako Formal Sharing pro rove 1 algoritmus reguluje v echny over limit t dy a t m vyprazd uje fronty Pro nastavov n maxim ln rovn pro Top Level Sharing se zpravidla pou v heuristika Jedna z mo n ch funguje n sleduj c m zp sobem RED eer tfeba takze Chapter 6 Network Subsystem Kdykoliv p ijde paket t dy kter neni over limit maximum je 1 tj heuristika se sna zaru
18. padn inode number kter mu ale k s riov slo souboru const char xdir struct dirent xnamelist int xselect const struct dirent int compar const struct dirent xx const struct dirent x int scandir Funkce scandir prohled adres a vr t seznam polo ek funkce select k kter polo ky uva ovat funkce compare k jak uva ovan polo ky se adit int stat char path struct stat xbuf struct stat dev t st dev File device ino t st ino File inode mode t st mode Access rights nlink t st nlink uid t st uid Owner UID gid t st gid Owner GID dev t st rdev Device ID for special files off t st size Size in bytes blksize t st blksize Block size blkcnt t st blocks Size in blocks time t st atime Last access time time t st mtime Last modification time time t st ctime Last status change time The stat system call provides information about a single directory entry Example Windows Directory Operations HANDLE FindFirstFile LPCTSTR lpFileName LPWIN32 FIND DATA lpFindFileData BOOL FindNextFile HANDLE hFindFile LPWIN32 FIND DATA lpFindFileData typedef struct WIN32 FIND DATA DWORD dwFileAttributes FILETIME ftCreationTime FILETIME ftLastAccessTime 123 Chapter 5 File Subsystem FILETIME ftLastWriteTime DWORD nFileSizeHigh DWORD nFileSizeLow
19. ten 512 pro z pis ty se nep eskakuj Tedy programy kter intenzivn pracuj se za tkem disku blokuj programy kter pracuj jinde yews Linux 2 4 2 drivers block ll_rw_blk c amp elevator c Nov j Linux se polep il nov po adavky nejprve zkou p ipojit do sekvence se st vaj c mi s omezen m max im ln d lky sekvence pak je za ad podle sla sektoru nikoliv v ak na za tek fronty a nikoliv p ed dlouho ekaj c po adavky V sledkem je one direction sweep se st rnut m Linux 2 6 x The kernel makes it possible to associate a queueing discipline with a block device by providing modular reguest schedulers The three schedules imple mented by the kernel are anticipatory deadline driven and complete fairness gueue ing The Anticipatory scheduler implements a modified version of the Unidirectional Sweep strategy which permits processing of requests that are close to the current position of the disk head but in the opposite of the selected direction Additionally the scheduler enforces an upper limit on the time a request can starve The scheduler handles read and write requests separately and inserts delays be tween read requests when it judges that the process that made the last request is likely to submit another one soon Note that this implies sending the read requests to the disk one by one and therefore giving up the option of queueing read requests in hardware The De
20. www mosix org 2 openMosix http www openmosix org Network Global Memory Existuj dal v ci kter se daj se s t d lat Nap klad se na s da swapovat to m v hodu v low latency Tak distributed shared memory Single System Image Example Amoeba Drobny popis Amoeby distribuovany syst m od pana Tanenbauma pro komunikaci RPC generovan z AIL pfedpokl d dostatek CPU a dostatek pam ti Dva hlavni rysy file syst mu jsou odd len jmen od soubor a immutable soubory Naming separation Jm na m na starosti directory server kter nen v zan na zbytek file syst mu svazuje jm na s capabilities Immutable files Se souborem se sm d lat pouze CREATE READ DELETE SIZE vytvo soubor z dat p e te soubor sma e soubor vr t velikost souboru M to spoustu v hod nap klad caching a replication se nemus starat o konzistenci Proto e je dost pam ti v dycky to projde Kdy se z Amoeby za al st vat pou iteln syst m p iznalo se e ne v dycky m e b t dost pam ti Pak se soubory rozd lily na committed a uncommitted committed jsou viz v e uncommited jsou v procesu vytv en a d se do nich p ipisovat ne se commitnou Dal m drobn m stupkem je mo nost ten po stech Filesystem jsou Bullet Server jako e rychl a Directory Server jako e adres Bullet Server se star o soubory m operace CREATE s parametrem zda c
21. adres f open jako soubor read write spe l fce Scandir hled n Windows FindFirstFile FindNextFile adres f pfi pr ci nejde moc zamykat jsou funkce na to se zjistilo jestli se ten adres f nezm nil aby Chapter 5 File Subsystem Mimochodem stromovou strukturu adres vymysleli v AT amp T Bell Labs v roce 1970 Jako modern koncepce se dnes ukazuje pln odd len adres ov struktury od soubor Soubory jsou objekty kter obsahuj data programy operuj s referencemi V p pad pot eby je pak mo no v adres i sv zat takovou referenci se jm nem Adres ov polo ka zpravidla obsahuje jm no souboru a atributy jako jsou p s tupov pr va as vytvo en a zm ny n kter syst my dovoluj specifikovat libo voln atributy jako named values Z kladn operace na adres ch jsou otev en a zav en a ten i prohled v n ob sahu Pro z pis obsahu jsou zvl tn funkce kter vytv ej p ejmenov vaj a ma ou adres e a soubory a nastavuj jejich atributy aby aplikace nemohly po kodit struk turu adres e Example Linux Directory Operations DIR xopendir const char xname int closedir DIR xdir struct dirent xreaddir DIR xdir Pomoc t chto funkc je mo n st adres zaj mav je samoz ejm struktura di rent O t ale POSIX standard k pouze e bude obsahovat zero terminated jm no Z souboru a p
22. ch bajt Ovlada za zen podporuje mapov n do pam ti co aplikaci umo uje snadn p stup k jed notliv m bajt m jednoho frame nen v ak mo n dat za zen o p edchoz frames V UNIXu k major device number typ ovlada e minor device number po adov slo za zen zhruba Example Linux Driver Model The driver model facilitates access to common features of busses with devices and to drivers with classes and interfaces The structure maintained by the driver model is accessible via the sysfs filesystem Common features of busses include listing devices connected to the bus and drivers associated with the bus matching drivers to devices hotplugging devices suspend ing and resuming devices gt ls R sys bus sys bus pci pci express pcmcia scsi usb sys bus pci devices drivers sys bus pci devices 0000 00 00 0 0000 00 1a 7 0000 00 1c 3 0000 00 1d 7 0000 00 1f 3 0000 00 01 0 0000 00 1b 0 0000 00 1c 4 400000 00 1e 0 0000 01 00 0 sys bus pci drivers agpgart intel ata piix ehci_hcd ohci_hcd uhci_hcd ahci 261000 HDA Intel Common features of devices include listing interfaces provided by the device and linking to the class and the driver associated with the device The driver provides additional features specific to the class or the interfaces or the device gt ls R sys devices sys devices pci0000 00 sys devices pci0000 00 0000 00 19 0 sys devices pci0000 00 000
23. ch blok v syst mu soubor FAT Uve te p ednosti a nedostatky tohoto zp sobu ulo en informace 9 Popi te zp sob ulo en informace o um st n dat soubor v syst mu soubor EXT2 Uve te p ednosti a nedostatky tohoto zp sobu ulo en informace 137 Chapter 5 File Subsystem Notes 138 10 11 12 13 14 15 16 Popi te zp sob ulo en informace o struktu e adres v syst mu soubor EXT2 Uve te p ednosti a nedostatky tohoto zp sobu ulo en informace Popi te zp sob ulo en informace o um st n voln ch blok v syst mu soubor EXT2 Uve te p ednosti a nedostatky tohoto zp sobu ulo en informace Popi te zp sob ulo en informace o um st n dat soubor v syst mu soubor NTFS Uve te p ednosti a nedostatky tohoto zp sobu ulo en informace Popi te zp sob ulo en informace o struktu e adres v syst mu soubor NTFS Uve te p ednosti a nedostatky tohoto zp sobu ulo en informace Popi te zp sob ulo en informace o um st n dat soubor v syst mu soubor na CD Uve te p ednosti a nedostatky tohoto zp sobu ulo en informace Popi te zp sob ulo en informace o struktu e adres v syst mu soubor na CD Uve te p ednosti a nedostatky tohoto zp sobu ulo en informace Vysv tlete princip integrace v ce syst m soubor v opera n m syst mu do jednoho prostoru jmen Exercises 9o SVO OY SSG
24. e jsou proto ud lan tak e obsahuj v dy parent object ID local object ID offset unigueness m se zaru uje e v echny objekty budou pohromad u sv ho parenta nap klad directory entries z jednoho directory pohromad Bloky jsou alokovan near each other eviduj se v bitmap bitmapy jsou rozm st ny mezi datov mi bloky v dy jeden blok bitmapa a pak tolik blok data kolik se d popsat v jednom bloku bitmapy Je t zaj mav je konzistence Probl mem u takto slo it ho file syst mu je situace kdy se kv li vyv en stromu mus p episovat ji existuj c struktury Pokud v tu chv li syst m spadne hroz po kozen star ch dat P vodn verze file syst mu toto e ily tak e zavedly uspo d n na v ech z pisech na disk tak aby z pisy dat v nov ch pozic ch p edch zaly smaz n dat ve star ch pozic ch To bylo ale z v rem p li slo it tak e te se p i odebr n polo ky zm n pozice bloku tak aby se nep episovala star verze hled se nejbli voln m sto a star verze se ulo do preserve listu Preserve list se vypr zdn kdy v pam ti nejsou dn bloky do kter ch se p id valy polo ky Je t nov j verze maj log Krom zjevn spornosti m tak probl my jedna je s rychlost u soubor kter jsou m rn men ne bloky proto e ty se ukl daj jako dva direct items Druh je u pre serve listu p i velk m po tu mal
25. hardwarov podpora kurzoru kreslen apod sou et sinusovek DFT VGA re im 320x200x256 co byte to pixel paleta 256 barev nastavovan v registrech lidi sly ej video RAM souvisl oblast pam ti VGA re im 800x600x256 co byte to pixel paleta cca 5 Hz 15 kHz 256 barev nastavovan v registrech video RAM window posouvan po 64K Dal BES popis SOBWBOVEY re imy t eba bit planes nebo linear memory pot ebuju 2x tolik tj nap cca 40kHz Zde je mimochodem vid t jak se d standardizovat na r zn ch rovn ch stejn reg istry r zn registry ale stejn mapa video RAM parametrizovateln mapa video pak mam body jak to RAM grafick primitiva pak zase rekonstruovat bud spojim ty v ci p mo tj pi at zuby nebo ast ji Manhattan tjs Beene by Audio Devices gt highpass a lowpass filtr ten o e e vy harmonickyTo be done to je ve zvukovce Modern cards can do MPEG decoding shading lighting whatever jinak zvukovka skaln cer ale eal Disk Storage Devices PAD m A disk is a device that can read and write fixed length sectors Various flavors of disks ix 4d Riu du pe Bene differ in how sectors are organized A hard disk has multiple surfaces where sectors 168 je fajn tok of typically 512 bytes are organized in concentric tracks A floppy disk has one or ti ema 150kB 5 two surfaces where sectors of typically 512 bytes are organized in concentric tracks Wiss ah
26. info o sob metadata 0 dal v zna n indexy jsou 1 pro MFT mirror 2 pro transaction log 3 pro root superblock directory 4 pro allocationj bitmap 5 pro bootstrap 6 for bad cluster file atd odkaz na MFT master file o table obsahuje info o N kter ze skryt ch soubor lze vypsat chod p kazy dir ah bitmap dir ah v em co je na disku badclus dir ah mftmirr atd ov em krom vyps n v root adres i u tu m nic tj jako inode table nejde Ve Windows 2000 u zd se nejde ani tohle 1 polo ka 1 soubor pe 2 x Ka d soubor je set of attributes jeden z atribut je default to je data stream Ka d fajly identifikovan FLEIRI MFT obsahuji dat dobn NES i t eba ab oe coi j pglgha v MPT z znam v obsahuje magic update seguence podobn je pot eba aby p i podobn jako inode tpi reuse z znamu bylo mo n poznat star reference reference count flags poten ci ln pointer na base file record pokud toto je extension file record kdy se v e MFT je soubor p ekvapiv nevejde do base file recordu N sleduj file attributes u nich z znam obsahuje jm no m v sob info o sob typ a data data mohou b t bu resident v tom p pad n sleduj p mo v z znamu na za tku nebo non resident v tom p pad n sleduje v z znamu run list co je sekvence blok polo ky cluster podobn jako u HPFS 0 MFT self ref P E CES 2 A l MFT mirror
27. na lprocesoru ned v smysl Chapter 2 Process Management Atomic Operations To be done Barriers To be done Locks Z mky alias mutexy jsou jednoduch m prost edkem k synchronizaci na kritick sekci Z mek m metody lock a unlock se zjevnou funkc pouze jeden proces m e m t v kter mkoliv okam iku zamknut z mek Implementace jednoduch lock otestuje zda je zam eno pokud ne tak zamkne pokud ano tak nech proces ekat unlock spust dal ekaj c proces pokud nikdo ne ek tak odemkne Samoz ejm v implementaci jsou pot eba atomick testy Nakreslit implementaci z mku a uk zat jak je d le it atomick test a jak jej lze zajistit bu atomickou instrukc nebo z kazem p eru en Uk zat p klad jak lze pomoc mutexu vy e it n jakou synchroniza n lohu nejl pe prost vz jemn vylou en V Linuxu je k dispozici mutex od pthread Je reprezentov n datovou strukturou pthread mutex t inicializuje se vol n m int pthread mutex init pthread mutex t pthread mutexattr t ni se vol n m int pthread mutex destroy pthread mutex t na odem en m mutexu pro pr ci s n m jsou metody lock _trylock a unlock Atributy mutexu nastavuj co se stane pokud thread zkus znovu zamknout mutex kter ji jednou zamknul fast mutexy se deadlocknou error checking mutexy vr t chybu recursive mutexy si pamatuj kolikr t se zamklo Podobn situace
28. nim disku t eba hodim vy spolehlivost Partitioning po tu na jeden disk kdy se zapln tak Zm nit partitioning and logical volume management je v porode mirroring sp je to na nic striping strajping parita 1 paritni disk Example IBM Volume Partitioning LVM logical volume mgmt typicky parita physical volumes distribuovana RAID 5 To be done logical volumes mapovani N N FLASH striping ukl d se to po blok ch Example Linux Logical Volume Management snap hots ulo im stay diskus zmrazeni tfeba 256 KB pamatujou se k tomu jen diffy tj p vodn mal ivotnost des tky Physical volumes logical volumes extents size e g 32M mapping of extents linear tis c z pis or striped snapshots disk b d l hod se pro z lohov n ne tam n co ulo im musim za b hu vymazat p slu blok y wear leveling Memory Storage Devices Similar to disks are various memory devices based most notably on NOR and NAND types of FLASH memory chips These memory chips retain their content even when powered off but reading and writing them is generally slower and explicit erasing is reguired before writing Erasing is only possible on whole blocks at a time The NOR chips allow reading from and writing to arbitrary addresses the NAND chips allow reading and writing on whole blocks at a time only The individual blocks of the memory chips wear down by erasures with the ty
29. operation takes 4 5 miliseconds a division operation takes 200 miliseconds 1949 Electronic Discrete Variable Automatic Computer EDVAC a computer developed by University of Pennsylvania uses vacuum tubes program 1951 stored on magnetic wires and in internal memory multiplication and division operations take 3 miliseconds References 1 Weik M H The ENIAC Story http ftp arl mil mike comphist eniac Chapter 1 Introduction story html 2 The Columbia University Computing History Website http www columbia edu acis history 3 The Manchester University Computing History Website http www computer50 org 4 The EDSAC Website http www cl cam ac uk UoCCL misc EDSAC99 Transistors In 1950s computers used transistors The operation times went down from milisec onds to microseconds To maximize processor utilization specialized hardware was introduced to handle input and output operations The computers were running a simple operating system responsible for loading other programs from punch cards or paper tapes and executing them in batches Hardware Year Software Transistor a semiconductor device capable of amplifying or switching an electric current has 1947 been invented by William Shockley at Bell Laboratories IBM 701 a computer developed by IBM uses vacuum tubes multiplication and division operations take 500 microseconds The first 1952 computer
30. parallel application Describe the Readers And Writers synchronization task Draw a Petri net il lustrating the synchronization task and present an example of the task in a parallel application Describe the Dining Philosophers synchronization task Draw a Petri net illus trating the synchronization task and present an example of the task in a parallel application Explain how a deadlock can occur in the Dining Philosophers synchronization task Propose a modification of the synchronization task that will remove the possibility of the deadlock Explain the difference between active and passive waiting Describe when ac tive waiting and when passive waiting is more suitable Present a trivial solution to the mutual exclusion problem without considering liveness and fairness Use active waiting over a shared variable Explain the requirements that your solution has on the operations that access the shared variable Present a solution to the mutual exclusion problem of two processes including liveness and fairness Use active waiting over a shared variable Explain the requirements that your solution has on the operations that access the shared variable Describe the priority inversion problem Explain how priority inheritance can solve the priority inversion problem for simple synchronization primitives Present a formal definition of a deadlock Present a formal definition of starvation Present a formal definition of a wait
31. polo ka obsahuje name attributes first cluster Bad clusters maj extra zn mku ve FAT FAT tabulky 2ks jedna a jej kopie Nevyhody zahrnuj nepohodlnou pr ci s FAT je velk nelze z n snadno vyt h nout data t kaj c se jednoho souboru nepohodlnou pr ci s adres i kr tk jm na soubor m lo atribut patn prohled v n mo nost fragmentace adres vy data FEE vi 4 na za tku root adres Mazan mi jm ny re ie na velk clustery rozd len na clustery Modifikace s roz en m na v t sla cluster a del jm na soubor V t sla jsou adresace p es integer prost tak roz en na 32 bit dn probl m Del jm na soubor jsou ulo ena Ste FALIO PATJA do vhodn ch m st extra polo ek v adres i ozna en ch nesmyslnou kombinac kolik bit je clusteru gt omezen na velikost cluster 32 kB gt FAT12 max 131 MB pro C slov nitnbut v unicode Hypot za je Ze to je kv li kompatibilit se star mi syst my kdy se na diskety nahraje n co s dlouh m jm nem soubor skuponka cluster adres f jm no souboru Gislo prvn ho clusteru necht 123 FAT tabulka m polo ku pro ka dej cluster tam je slo n sleduj c ho clusteru je li to rozta en p es vic cluster tj zde v polo ce 123 je slo 456 v polo ce 456 je slo 300 v polo ce 300 je EOF sektor 512 B typicky N sektor na cluster m
32. t kandid t na spojen do v t ho voln ho bloku Nev hodou je potenci ln vysok intern fragmentace dan pevnou sadou d lek bloku Implementace buddy syst mu pot ebuje n kde uschov vat informace o bloc ch a sez namy voln ch blok To se d d lat nap klad v hlavi k ch u samotn ch blok m jsou vlastn bloky o n co men ale v hlavi ce nen pot eba p li mnoho informac Alternativn se vedle alokovan pam ti um st bitmapa s jedn m bitem pro ka d blok a ka dou rove buddy syst mu Mimochodem kdy u jsme u toho multiprocesorov syst my maj u alok tor podobn probl my jako pl nova e nad ready frontou tedy p li mnoho soub h je zpomaluje Proto se d laj hierarchick alok tory local free block pools kter se v p pad pot eby p el vaj do global free block poolu Example GNU LibC Heap Allocator GNU LibC 2 2 4 pouZ v malloc od Douga Leaho ktery m na zac tku a na konci kaZd ho bloku hlavitky Obsazeny blok m na zac tku d lku a flag Ze je obsazeny na konci m d lku Pr zdn blok m na za tku d lku a flag e je voln n sleduje pointer na p edch zej c a n sleduj c blok ve skupin blok stejn velikosti na konci m d lku Hlavi ky jsou takov proto aby bylo mo n od ka d ho bloku zah jit scelov n nebo proch zen seznamu blok Alok tor udr uje 128 seznam voln ch blok pro p esn velikosti o
33. ucpan interrupty ale prost vy d to co st h a zbytek interrupt ignoruje Drivery mohou instalovat handlery v kernelu T ma nest h n server a driver je je t o n co hlub Nad kernel schedulerem toti m b et je t Ouality of Service monitor aplikace kter um detekovat situace kdy server nebo driver nest h odpov dat na dotazy Processor share serveru se udr uje dostate n vysok na to aby st hal odpov dat to je v dy mo n proto e s rostouc m share se bl extr m kdy server se ere ve ker as procesoru a nepob driver kter mu dod v data je ho zat uj Processor share driveru se udr uje dostate n vysok na zpracov n p choz ch dat nejv e v ak takov aby se kv li n mu ne musel sni ovat processor share serveru A t m je vystar no po ta d l co stihne nadbyte n traffic prost ignoruje a nezahlt se Je t drobn detail o tom pro se dom ny aktivuj tak divn od stejn ho m sta Po t se s t m e ka d dom na bude m t v sob n co jako user threads aktivace dom ny pak spust user scheduler kter si sko na kter thread uzn za vhodn Syst m nab z dom n m mo nost specifikovat kam se m p i preempci ulo it kontext pro cesoru p edpokl d se e ka d dom na bude zvl ukl dat kontexty jednotliv ch thread Example Linux Dynamic Window Constrained Scheduler Dynamic window con
34. void skb trim struct sk buff skb unsigned int len References 1 Alan Cox Network Buffers and Memory Management Packet Filtering The networking layer must decide what to do with each packet A packet can be delivered to a local recipient forwarded to a remote recipient or even dropped This mechanism is configurable to avoid abuse of default rules for delivering forwarding discarding Example Linux Packet Filter The packet filter framework defines several points where a packet can be classified and a decision can be taken based upon the classification The points are identified by chains that are grouped into tables The filter table is for normal packets INPUT chain for incoming packets OUTPUT chain for outgoing packets FORWARD chain for packets that pass through The nat table is for packets that open new connections e PREROUTING OUTPUT e POSTROUTING The mangle table is for packets that need special modifications e PREROUTING INPUT OUTPUT FORWARD e POSTROUTING Each point contains a sequence of rules A rule can classify packets using informa tion from packet header source and destination address protocol or from packet processing source and destination interface Modules that classify packets can be added available modules include file conditions connection marks connection rates connection state security context random and others The action of the first
35. watchpoints mo nost dostat sign l p i p stupu na konkr tn adresu ovl d se p es proc file syst m tady jen pro zaj mavost Detaily Na thrashing se reaguje vyswapov n m cel ho procesu D l se page color ing kv li caches Shared pages se nevyhazuj dokud to nen nutn Kernel allocator d l slaby co jsou bloky dan d lky um reuse u inicializovan ch objekt dal info skipped Mauro McDougall Solaris Internals ISBN 0 13 022496 0 Example Mach And Spring O lep memory manager se pokusil nap klad Mach do relativn dokonal podoby byl cel mechanismus dota en ve Springu D se dob e odp edn et podle technical reportu z Evry Intern m velmi podobnou strukturu jako Spring tak Unix SVR4 ale tam nen p s tupn u ivateli Memory objekt m se k segments pagery jsou reprezentovan pomoc vnodes pokud p istupuj k objekt m file syst mu krom nich existuje je t anonymous pager pro memory objekty kter p mo neodpov daj dn mu objektu file syst mu Example Cluster Memory Management It can be observed that reading a page from a disk is not necessarily faster than read ing a page from a network It can also be observed that physical memory of a system is rarely used in its entirety except for caches These two observations give rise to the idea of using spare physical memory of a cluster of systems instead of a disk for paging A prototype o
36. 111b 555 movl 0 esp jmp 222b previous section __ex_table a Chapter 2 Process Management align 4 N long 111b 444b long 222b 555b previous define RESTORE ALL N RESTORE REGS N addl 4 esp 333 iret N section fixup ax N 666 sti N movl USER DS edx movl edx ds movl edx es pushl 11 call do exit previous Section X ex table a align 4 long 333b 666b previous define SAVE_ALL SAVE ALL SWITCH KERNELSPACE define RESTORE ALL SWITCH USERSPACE RESTORE ALL Example Kalisto Processor Context Switching Kalisto processor context switching code is stored in the head s file The SAVE REGISTERS and LOAD REGISTERS macros are used to save and load processor registers to and from memory typically stack The switch cpu context function uses these two macros to implement the context switch macro SAVE REGISTERS base sw zero REGS OFFSET ZERO base sw Sat REGS OFFSET AT base sw v0 REGS OFFSET VO base sw vl REGS OFFSET Vl base sw Sa0 REGS OFFSET A0 base sw Sal REGS OFFSET Al base sw a2 REGS OFFSET A2 base sw a3 REGS OFFSET A3 base sw Sgp REGS_OFFSET_GP base sw Sfp REGS OFFSET FP base sw ra
37. 1x a dynamicka data kopie pro kazdou instanci technika COW copy on write zavola se vyjimka pri zapisu a fozkopiruje se to samozrejme mi to kazi relokace pac pak nemuzu tak hezky sdilet kod takze je dobre ve vsech programech ocekavat stejne umisteni dane knihovn funkce treba malloc PLT mam tabulku pouzivanych knihovnich funkci pri volani skacu do te tabulky tj relativne loader nastavi adresy skoku do PLT ELF je univerzalni kontejner pro ruzne bloky ruzn ch dat teprve loadery se zabyvaji OBSAHEM tech dat hlavicka info o prg sekce dynamicke bloky dat tabulka symbolu a pod segmenty namapovat do pameti a spustit objdump vypise info o elfovi dulezite sekce bss neinicializovana data nenahravam pochopitelne data z disku jen alokuju nejakou nejak velkou pamet data inicializovana data got go to offfset table pro relokovane adresy text kod programu init fini de alokace pred spustenim mainu po skonceni programu segmenty jsou zarovnane takze je nemusim cist ale proste je namapuju pametove mapovany soubor Edited by Foxit Reader Copyright C by Foxit Corporation 2005 2009 For Evaluation Only Chapter 2 Process Management Example Executable And Linking Format ELF has been developed by UNIX System Laboratories a laboratory within AT amp T working on UNIX System V An ELF consists of an ELF header a section header table a program header table and multiple ELF section
38. 20 odpov d dvojn sobku Detaily v kernel sched c Linux m BSD style call getpriority pro zji t n priority procesu kter vracel prioritu nebo 1 jako indik tor chyby Ov em 1 je tak platn priorita tak e bylo pot eba p ed vol n m vynulovat errno a po vol n se pod vat jak to dopadlo V sou asn dob u getpriority vrac 20 nice tj hodnoty 40 1 pr v aby se zabr nilo vracen z porn ch hodnot 44 Krom getpriority a setpriority na nastaven nice value se d volat je t sched getparam sched setparam sched setscheduler a sched getscheduler pro nastaven a ten parametr scheduleru Example Linux Early 2 6 X Series Scheduler The early 2 6 series of kernels uses a scheduler that provides constant time scheduling complexity with support for process preemption and multiple processors The scheduler keeps a separate pair of an active and an expired gueue for each pro cessor and priority the active gueue being for processes whose guantum has not yet expired the expired gueue being for processes whose guantum has already expired For priorities from 1 to 140 this makes 280 gueues per processor The scheduler finds first non empty gueue pair switches the active and expired gueues when the active gueue is empty and schedules the first process from the active gueue The length of the guantum is calculated linearly from the priority with higher guanta for lower priority An interactivity
39. DWORD dwReserved0 DWORD dwReservedl TCHAR cFileName MAX PATH 260 TCHAR cAlternateFileName 14 WIN32 FIND DATA Funkce p evzat z CP M operace ten a z piu Sharing Support jsou atomicky tj 7 PET T xi R nemu e se st t e vidim Pokud p istupuje k souboru v ce proces je samoz ejm pot eba n jak definovat jak nap l zapsanej integer to bude vypadat Minim ln e en je zaji t n atomi nosti jednotliv ch operac co apod m jako default nap klad UNIX i MS DOS bez nata en ho share D mysln j e en je mo nost zamyk n cel ch soubor to je nap klad k dispozici v MS DOSu p i nata en m share P i vol n INT 21h fn 3Dh File Open se dalo zadat zda se povol dal otev r n pro ten a pro z pis Podobnou v c um UNIX pomoc vol n flock sys Je t o n co d mysln j je mo nost zamykat sti soubor pro ten i pro z pis Tohle um jak UNIX p es fentl tak t eba i ne astn MS DOS se share Zad se offset diss a d lka zamykan ho bloku a re im zamyk n ten je zpravidla shared alias read advisory Lon lock nebo exclusive alias write lock Zamyk n sti souboru m jednu nev hodu jen informace o zam en toti u ka d ho souboru se mus pamatovat seznam existuj c ch z mk kter se musi pokud se zept m jestli je kontrolovat p i relevantn ch operac ch zam eno tak mi to ekne pokud s
40. Data are transferred either through the Data register or r dk jak rychl je frekvusing Direct Memory Access s The packet commands interface relies on the command block registers interface to issue a command that sends a data packet which is interpreted as another command The packet commands interface is suitable for complex commands that cannot be DISK described using the command block registers interface a dal blokov za zen CAV konst ot ky norm l CLV konst rychlost Reguest Queuing hlavi ky u okraje disku to im pomalejc Because of the mechanical properties of the disk the relative speed of the computer and the disk must be considered A problem arises when the computer issues re CHS adresov n cylinder quests for accessing consecutive sectors too slowly relative to the rotation speed this head sector fyzick canbe solved by interleaving of sectors Another problem arises when the computer dnes LBA logick line rn issues requests for accessing random sectors too quickly relative to the access speed this can be solved by queuing of requests The strategy of processing queued requests ATA registry pro adresaci js important a ovl d n status regiszr ready busy The FIFO strategy of processing requests directs the disk to always service the vadn sektory si disk automafirst of the waiting requests The strategy can suffer from excessive seeking across pfe
41. Howard Aiken at Harward University John von Neumann at Princeton University and others The computers used relays or vacuum tubes the former notoriously unreliable the latter plagued with power consumption 1 Chapter 1 Introduction and heat generation The computers were used to perform specialized calculations which were initially programmed or rather wired into the computer using plug boards Plug boards were later replaced by punch cards or paper tapes There was no notion of an operating system Hardware Year Software Mark I or Automatic Sequence Controlled Calculator a computer developed by IBM and Harward University uses relays program stored on paper 1944 tapes a multiplication operation takes 6 seconds a division operation takes 12 seconds Electronic Numerical Integrator And Computer ENIAC a computer developed by University of Pennsylvania uses vacuum tubes program stored on plug boards a division operation takes 25 miliseconds 1946 Selective Sequence Electronic Calculator a computer developed by IBM uses relays and vacuum tubes program stored on paper tape and in 1948 internal memory a multiplication operation takes 20 miliseconds a division operation takes 33 miliseconds Electronic Delay Storage Automatic Calculator EDSAC a computer developed by University of Cambridge uses vacuum tubes program stored on paper tape and in internal memory a multiplication
42. I O ports at dc80 size 128 Region 1 Memory at ff3ffc00 32 bit non prefetchable size 128 Expansion ROM at ff400000 disabled size 128K Capabilities dc Power Management version 2 Flags PMEC1k DSI D1 D2 AuxCurrent 0mA PME D0 D1 D2 D3hot D3col Status D0 PME Enabl DSel 0 DScale 2 PME Check the example to see what the configuration registers reveal The identification of the device actually says class 200h vendor ID 10B7h device ID 9200h subsys tem vendor ID 1028h subsystem device ID OD8h This means class Ethernet vendor 3Com device 3C905C subsystem vendor Dell subsystem device unknown Chapter 4 Device Management Example USB The USB bus provides configuration in a form of a device descriptor Devices are ad dressed by unique addresses 0 127 communication uses message or stream pipes between endpoints A device connect as well as supported speed is recognized elec trically by a hub which indicates a status change to the host The host queries the hub to determine the port on which the device is connected and issues power and re set command to the hub for the port The host assigns a unique address to the device using the default address of 0 and the default control pipe with endpoint 0 and then queries and sets the device configuration gt lsusb Bus 1 Dev Dev E 1 Vendor 0x0000 Product 0x0000 2 Vendor 0x046d Product O
43. Just a curiosity Understanding is recommended Understanding is optional Understanding is essential Just a curiosity Understanding is optional Just a curiosity Understanding is optional Understanding is essential Understanding is recommended Understanding is recommended Understanding is optional Understanding is optional Understanding is essential 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Understanding is essential Understanding is optional Understanding is essential Understanding is essential Understanding is recommended Understanding is optional Understanding is essential Understanding is essential Understanding is essential Understanding is essential Understanding is essential Understanding is essential Understanding is essential Understanding is essential Understanding is recommended Understanding is essential Understanding is optional Understanding is recommended Understanding is recommended Understanding is recommended Understanding is recommended Understanding is recommended Just a curiosity Just a curiosity Just a curiosity Understanding is recommended Understanding is recommended Understanding is essential Understanding is essential Understanding is optional Underst
44. Operations socket je kernelovej 3 objekt po jeho Sockets vytvo en dostanu The most traditional interface of the network subsystem is the Berkeley socket inter pointr na kerneloej objektface Historically the Berkeley socket interface was developed at the University of California at Berkeley as a part of BSD 4 2 from 1981 to 1983 These days it is present in virtually all flavors of Unix and Windows The Berkeley socket interface centers around the concept of a socket as an object that facilitates communication The socket can be bound to a local address and connected to a remote address Data can be sent and received over a socket int socket int domain int type int protocol Domain specifies socket protocol class PF UNIX local communication PF INET IPv4 protocol family PF INET6 IPv6 protocol family PF IPX IPX protocol family PF NETLINK kernel communication PF PACKET raw packet communication Type specifies socket semantics e SOCK STREAM reliable bidirectional ordered stream TCP SOCK RDM reliable bidirectional unordered messages SOCK DGRAM unreliable bidirectional unordered messages p esn UDP e SOCK_SEQPACKET reliable bidirectional ordered messages TCP SOCK RAW raw packets Protocol specifies socket protocol 0 class and type determine protocol other identification of supported protocol The socket call creates the socket object An error is returned i
45. Reheatsal c 2 eR t bili Haslet iure eim elas Nose todo d 159 Authorza Om 3 oett ui cedet Ced ote tee ete ber beste K teet deese PROV 159 Activities do Actions on Resources cessere enne nens 159 Levels delimit Security and Integrity see 160 Example Security Enhanced Linux LE ODE PETERS 160 Rehearsal senne ecce SR EE lata tle BERE dot Jan 160 Security Subsystem Implementation sssadencecatie droit tesdpetoge at eunde 161 Example DoD TCSEC Classification sijesscis sesssccssssssossliavesnvcavesnadegivinsiosits 161 Example NIST C EVO sacri eck ct sanie sbub bss Slat senha cesi dieit dedi 162 vi Chapter 1 Introduction Foreword Origins This material originated as a bunch of scribbled down notes for the Charles Uni versity Operating Systems lecture As time went on and the amount of notes grew I came to realize that the amount of work that went into looking up the informa tion and writing down the notes is no longer negligible This had two unfortunate implications First verifying the notes to maintain the information within updated became difficult Second asking the students to locate the information within indi vidually became unrealistic This material is an attempt at solving both problems By extending and publishing the notes I hope to provide the students with a source of information and me with a source of feedback I realize some readers will find this material fragmented incomplete
46. and SYSEXIT instructions In both cases the EAX register contains a number identifying the reguested service and other registers contain other arguments of the reguested service Since the system call interface is typically called from within the system libraries having two versions of the system call code would reguire having two versions of the libraries that contain the system call code To avoid this redundancy the system call interface is wrapped by a virtual library called linux gate which does not exist as a file but is inserted by the kernel into the memory map of every process kernel vsyscall int 0x80 ret Figure 2 12 Linux Gate Library Based On INT 80h kernel vsyscall push ecx push edx push ebp resume mov esp ebp Sysenter jmp resume hack for syscall resume return pop ebp this is where pop edx the SYSEXIT pop ecx returns ret Figure 2 13 Linux Gate Library Based On SYSENTER And SYSEXIT References 1 Johan Petersson What Is linux gate so 1 http www trilithium com johan 2005 08 linux gate 2 Linus Torvalds System Call Restart http Ikml org Ikml 2002 12 18 218 29 Chapter 2 Process Management thread pool kolik asi threadu kdyz hlavne pocita pak staci tolik vlaken kolik je jader kdyz hlavne IO ceka na disk vetsinu casu tak vic ale furt ne moc desitky maximalne stovky kernel managed threads user managed threads green threads ne
47. because when no interrutps arrive no scheduling happens Since interrupts are used to service devices disabling interrupts can lead to failure in servicing devices As such disabling inter rupts is only permitted to privileged processes which should limit disabling inter rupts to short periods of time Disabling interrupts does not yield exclusive execution on systems with multiple processors Active Waiting Active waiting is an approach to process synchronization where a process that waits for a condition does so by repeatedly checking whether the condition is true In the following multiple solutions to the mutual exclusion problem based on active wait ing are developed to illustrate the concept Assume availability of shared memory that can be atomically read and written A shared boolean variable bCriticalSectionBusy can be used to track the avail ability of the critical section A naive solution to the mutual exclusion problem would consist of waiting for the variable to become false before entering the critical section setting it to true upon entering and setting it to false upon leaving while bCriticalSectionBusy Active waiting cycle until the bCriticalSectionBusy variable becomes false 61 Chapter 2 Process Management kdyz tady dojde k p eru en tak se mi to m e posrat tohle moc nech v na n hod a st le hroz livelock tohle u funguje celkem rozumn 62 bCr
48. ch rutin jsou to CL TICK volan z clock interrupt handleru CL FORK a CL FORKRET pro inicializaci nov ho procesu CL ENTERCLASS a CL EXITCLASS pro obsluhu situac kdy proces vstoup do t dy nebo ji opust CL SLEEP volan kdy proces ud l sleep CL WAKEUP volan kdy proces opust sleep P id n vlastn t dy znamen napsat t chto 7 rutin a p elo it kernel 43 Chapter 2 Process Management V hoda popsan ho scheduleru spo v jednak v tom Ze nikdy nep epo t v dn velk seznamy priorit dvak v tom e um podporovat realtime procesy Pravd podobn nejzn m j m syst mem s t mto schedulerem je Solaris ten m n kolik drobn ch zm n Solaris 7 nab z u ivatel m timesharing interactive a realtime classes podobn jak bylo pops no v e a na to e kernel je mostly preemptive co zlep uje responsiveness realtime proces Solaris 7 m p kaz dispadmin kter m se daj vypsat tabulky se scheduling parametry Parametry pro timesharing procesy jsou ve form tabulky s d lkou kvanta 200ms pro prioritu 0 20ms pro prioritu 59 priority po vypr en kvanta v dy o 10 men priority po sleepu 50 priority 10 pro v t inu priorit maxim ln d lka starvation 1 vte ina pro v echny priority a priorita p i p ekro en t to d lky 50 priority 10 pro v t inu priorit Parametry pro interactive procesy jsou stejn jako pro timesharing procesy Parametry
49. ch soubor je pot eba asto flushovat aby se mohl vypr zdnit T et jsou probl my s memory mapped files kdy soubor nen aligned Plus samoz ejm kdo to m ps t cel EXT2 m pod 200K zdroj k ReiserFS m v etn patch kernelu v c ne mega References 1 ReiserFS Whitepaper 2 Kurz G The ReiserFS Filesystem 3 Buchholz F The Structure of the ReiserFS Filesystem Chapter 5 File Subsystem Integration Of File Subsystem With Memory Management Caches Integration Of Multiple File Subsystems Zminit integraci vice file syst mu do kernelu princip mount points pro poskytnuti jednoho prostoru jmen Stackable file systems p es V nodes virtual file system VFS Example Linux Virtual File System interface OS pro praci nad n kolika r znejma FS Proveden v Linuxu je p mo ar P i vol n open se syst m pod v na za tek jm na typicky um pr nik fc souboru a podle toho zda je absolutn i relativn vezme dentry bu root directory nebo current directory Pak u se jen postupn parsuje jm no a ka d jeho st se zkus naj t v dentry cache pokud tam nen tak se pou ije lookup funkce parent den try stackable FS mam nad tim diff Do tohoto mechanizmu celkem p mo a e zapad i mounting Pokud se do adres e t eba mam cd nad tim n co namountuje jeho dentry bude obsahovat pointer na root dentry namounto n co jin ho v tom van ho file syst mu Tento dentry z st
50. context aside and later picking it up denoted as context switching Note that process context is not defined to be strictly equal to the process state but instead vaguely incorporates those parts of the process state that are most relevant to context switching The individual parts of the process state and the related means of context switching are discussed next Processor State sem patri poznamky shora The part of the process state that is associated with the processor consists of the pro cessor registers accessible to the process On most processors this includes general purpose registers and flags stack pointer as a register that stores the address of the top of the stack program counter as a register that stores the address of the instruc tion to be executed The very first step of a context switch is passing control from the executing process to the operating system As this changes the value of the program counter the original value of the program counter must be saved simultaneously Typically the processor saves the original value of the program counter on the stack of the process whose context is being saved The operating system typically proceeds by saving the origi nal values of the remaining registers on the same stack Finally the operating system switches to the stack of the process whose context will be restored and restores the original values of the registers in an inverse procedure When separate notions of proce
51. count kolik na to ukazuje hardlink pointr na ka dej blok soubor p mo v inode pokud v t immutable striktn read only indirect append lze jen pfid vat sam u posledn HL gt sma u fajl pokud max 12 blok alokuje se novej blok do kter ho se nasypou pointry Chapter 5 File Subsystem double indirect triple indirect Owner ID x Size in bytes x Access time x Creation time Modification time Deletion Time Group ID Links count Blocks count x File flags Ptrs to blocks File version for NFS File ACL x Directory ACL Fragment address x Fragment number x Fragment size LOCKS 1 EXT2 TIND BLOCK 1 Secure del Sync update x Immutable Only ap za to riskovat ze novej FS bude blbnout Directory entry length Name length x File name x T 13 odk z na data ul16 i mode File mode lst odkaz simple indirect ul6 i uig 2nd odkaz double indirect u32 i size 3rd odkaz triple indirect u32 i atime tj nejrychlejc se Saha u32 l ctime na data na za tku soubor U32 i mtime neum ct e mam n blok us2 i dtime souvisle za sebou im v ta Ml cu T n z fajl tim vic
52. dat s co nejmen kapacitn a asovou re i schopnost odolat v pad k m syst mu bez po kozen ulo en ch dat schopnost zabezpe it ulo en data p ed neopr vn n m p stupem schopnost koordinovat sd len ulo en ch dat s See ENTE Jel Abstractions And Operations Stream File Operations wus Mezi nejjednodu operace pat sekven n p stup k soubor m po z znamech nebo po bajtech n sleduj operace pro n hodn p stup T m v dy maj podobu p tice operac pro otev en a zav en souboru nastaven pozice v souboru ten a z pis T mto operac m v podstat odpov d dne n p edstava souboru jako streamu bajt p padn n kdy v ce stream bajt V jimkami jsou specializovan syst my soubor kter dovoluj vnit n len n soubor nap klad v podob stromu ale ty jsou sp e z urban legends Za vahu stoj pro jsou operace na soubory typicky rozd leny pr v do zmi ovan p tice Je toti zjevn mo n ud lat jen operace read a write kter budou specifikovat jm no souboru pozici a velikost bloku D vody pro p tici operac jsou mo nost odstranit p i b n m zp sobu pr ce se soubory opakov n operac jako je nalezen pozice na disku ze jm na souboru a poz ice v souboru mo nost spojit otev r n souboru s dal mi operacemi jako je zamyk n nebo kontrola opr vn n D vodem pro dvojici operac je mo nost implementov
53. dirty access rights nic zvl tn ho Pokud TLB neobsahuje p eklad hardware m e prohledat je t Virtual Hash Page Table co je jednoduch hash table pevn ho form tu Pokud ani VHPT neobsahuje p eklad hod se fault a opera n syst m napln TLB Zaj mav je syst m ochran Ka d polo ka TLB obsahuje kl p i jej m pou it se tento kl hled v Protection Key Registers kter obsahuj kl e p id len aktu ln mu procesu Pokud se kl nenajde hod se Key Miss Fault pro virtualizaci PKR pokud se najde zkontroluje se zda kl nezakazuje read write nebo execution access Tak zaj mav je syst m registr General purpose registers maj jm na GRO a GR31 Ty jsou jako v ude jinde Krom nich jsou je t k dispozici registry GR32 a GR127 kter funguj jako register stack Z register stacku je st vyhrazena jako input area st jako local area st jako output area P i vol n procedury se z output area vola j c ho stane input area volan ho velikost local area a output area volan ho je 0 po moc instrukce alloc se pak d nastavit local a output area simple Pokud dojde t ch 96 registr procesoru kter se intern pou vaj jako cyklick buffer existuje extra stack BSP na kter se ukl d co se nevejde Motorola 680x0 Address Translation Ze 32 bit adresy je 7 bit pointer do root table registry URP a SRP obsahuj user a supervisor root table pointer
54. exist A lock free solution guarantees some process will finish in a finite number of its own steps This is a somewhat weaker category with the practical implication that starvation of all processes will not occur and progress will not stop should any single process block or crash dne n procesory prefatching apod te nap ed p gt kaz to fungov n t ch synchroniza n ch algoritm e en procesor m memory model Java a pod m memory model zpo d n OS by m l nab dnout n jak API pro synchronizaci Chapter 2 Process Management e An obstruction free solution guarantees every process will finish in a finite number of its own steps provided other processes take no steps This is an even weaker cat egory with the practical implication that progress will not stop should any single process block or crash To be done Wait free hierarchy based on consensus number Shared registers 1 test and set swap queue stack 2 atomic assignment to N shared registers 2N 2 memory to memory move and swap queue with peek compare and swap infinity Impossibility results Visible object for N processes is an object for which any of N processes can execute a sequence of operations that will necessarily be observed by some process regardless of the operations executed by other processes An imple mentation of a visible object for N processes over shared registers takes at least N of those share
55. for kill SIGUSR1 10 User defined signal 1 SIGSEGV 11 Illegal memory access SIGUSR2 12 User defined signal 2 SIGPIPE 13 Broken pipe SIGALRM 14 Timer alarm SIGTERM 15 Request for termination SIGTERM 16 Illegal stack access SIGCHLD 17 Child process status changed SIGCONT 18 Request to continue when stopped SIGSTOP 19 Request to stop SIGTSTP 20 Request for stop sent from keyboard SIGTTIN 21 Input from terminal when on background SIGTTOU 22 Output to terminal when on background Figure 2 27 Standard Signals Signals are processed by signal handlers A signal handler is a procedure that is called by the operating system when a signal occurs Default handlers are provided by the operating system New handlers can be registered for some signals typedef void sighandler_t signal sighandler t int signum sighandler t handler Chapter 2 Process Management SIG DFL use default signal handler SIG IGN ignore the signal struct sigaction void xsa handler int void sa_Sigaction int siginfo t void x Sigset t sa mask int sa flags struct siginfo_t int si_signo Signal number int si_errno Value of errno int si_code Additional signal code pid_t si_pid Sending process PID uid_t si_uid Sending process UID int si_status Exit value clock t si utime User time consumed clock t si stime System time consumed sigval t si value Si
56. inodes An inode contains basic information about a file and points to file data and extended attributes using structures called a data fork and an attribute fork Depending on the size of the data referenced by the fork the data is stored directly within the fork inode The default size of an inode is 256 bytes out of which 100 bytes are used by the basic information leaving 156 bytes for the forks e in extents listed within the fork inode The default size of an inode provides enough space for up to 19 extents data in a tree When the file data is stored in a tree the keys of the tree are offsets within the file and the leaves of the tree are extents Directories use either short form block form leaf form node form or tree form depending on their size All forms have a variable length entry containing the file name and the file inode A short form directory stores entries directly within its inode A block form directory stores entries in a single extent which also contains a sorted array of file name hashes and an array of a few largest free entries in the extent A leaf form directory stores entries in multiple entry extents and single extent with a sorted array of file name hashes and an array of a few largest free entries in the entry extents A node form directory stores entries in multiple entry extents a tree of file name hashes and an extent with an array of a few largest free entries in the entry extent
57. into relatively independent parts that provide simple individual features thus keeping the complexity of the design manageable Besides managing complexity the structure of the operating system can influence key features such as robustness or efficiency The operating system posesses various privileges that allow it to access otherwise protected resources such as physical devices or application memory When these privileges are granted to the individual parts of the operating system that require them rather than to the operating system as a whole the potential for both acci dental and malicious privileges misuse is reduced Breaking the operating system into parts can have adverse effect on efficiency because of the overhead associated with communication between the individual parts This overhead can be exacerbated when coupled with hardware mechanisms used to grant privileges The following sections outline typical approaches to structuring the operating sys tem 16 Chapter 1 Introduction Monolithic Systems A monolithic design of the operating system architecture makes no special accommo dation for the special nature of the operating system Although the design follows the separation of concerns no attempt is made to restrict the privileges granted to the individual parts of the operating system The entire operating system executes with maximum privileges The communication overhead inside the monolithic oper ating system
58. is mapped into segments using 135 Chapter 5 File Subsystem 136 a segment array Cleaner process compacts the old segments by copying Checkpoint ing process keeps the number of B tree change records that have to be applied during tree reconstruction down to a reasonable limit References 1 Johnson J E Laing W A Overview of the Spiralog File System Digital Tech nical Journal 8 2 DEC 1996 2 Whitaker C Bayley J Widdowson R Design of the Server for the Spiralog File System Digital Technical Journal 8 2 DEC 1996 Example Reiser File System Dal m n obvykl syst m chod pod Linuxem a zam ruje se na efektivitu p i pr ci s velk m mno stv m mal ch soubor Probl m s mal mi soubory je overhead p i alokaci kter je tady e en tak e se na cel disk pohl jako na jeden B strom Uzly tohoto stromu jsou v bloc ch kter jsou n sobky velikosti sektoru uzel je bu nep m pak obsahuje pouze kl e a pointery na potomky nebo p m form tovan pak obsahuje seznam prvk ulo en tak e od za tku uzlu nar staj hlavi ky di rectory item indirect data direct data a od konce t la prvk nebo p m neform tovan pak obsahuje data velk ho souboru do n sobku velikosti bloku Cel tenhle cirkus zaru uje e se mal soubory budou ukl dat pohromad do jed noho bloku m se spo m sto To kam p esn se co ulo je dan kl em Kl
59. is the same as the communication overhead inside any other software considered relatively low CP M and DOS are simple examples of monolithic operating systems Both CP M and DOS are operating systems that share a single address space with the applica tions In CP M the 16 bit address space starts with system variables and the appli cation area and ends with three parts of the operating system namely CCP Con sole Command Processor BDOS Basic Disk Operating System and BIOS Basic Input Output System In DOS the 20 bit address space starts with the array of in terrupt vectors and the system variables followed by the resident part of DOS and the application area and ending with a memory block used by the video card and BIOS application eee area application area system variables system variables interrupt vectors CP M Figure 1 10 Simple Monolithic Operating Systems Example Most contemporary operating systems including Linux and Windows are also con sidered monolithic even though their structure is certainly significantly different from the simple examples of CP M and DOS References 1 Tim Olmstead Memorial Digital Research CP M Library http www cpm z80 de drilib html 17 Chapter 1 Introduction Layered Systems A layered design of the operating system architecture attempts to achieve robustness by structuring the architecture into layers with different privileges The most privi leged layer
60. it reads a copy of the sectors that will be overwritten into a paging file backed memory section that s associated with the corresponding shadow volume It services read operations directed at the shadow volume of mod ified sectors from this memory section and services reads to non modified areas by reading from the original volume Because the backup utility won t save the paging file or the contents of the system managed System Volume Information directory located on every volume the snap shot driver uses the defragmentation API to determine the location of these files and directories and does not record changes to them By relying on the shadow copy facility the Windows XP backup utility overcomes both of the backup problems related to open files The shadow copy driver is actually only an example of a shadow copy provider that plugs into the shadow copy service Windows System32 Vssvc exe The shadow copy service acts as the command center of an extensible backup core that enables ISVs to plug in writers and providers A writer is a software component that en ables shadow copy aware applications to receive freeze and thaw notifications in or der to ensure that backup copies of their data files are internally consistent whereas providers allow ISVs with unique storage schemes to integrate with the shadow copy service For instance an ISV with mirrored storage devices might define a shadow copy as the frozen half of a split mirrored v
61. it t d m jejich p enosov p smo Kdykoliv p ijde paket t dy kter je over limit ale m under limit p edka t dy ni ne je aktu ln maximum maximum je tato t da tj na rostouc zat en reaguje sni ov n m mo nosti p j ovat si p enosov p smo Kdykoliv t da ode le paket a m bu pr zdnou frontu nebo se stane regulovanou maximum je nekone no tj heuristika uvol uje omezen kdy se m n podm nky References 1 Floyd S Jacobson V Link Sharing and Resource Management Models for Packet Networks IEEE ACM Transactions on Networking 3 4 August 1995 za nu to zahazovat Random Early Detection ne mam pln fronty p i napln n 90 The goal of the random early detection queuing algorithm is to avoid anomalies as se aplikace d v sociated with algorithms that fill a queue first and drop a queue tail when the queue dozv di e je probl m is filled A weighted average of the queue length is kept and within a range of mini a za nou na to reagovat zmen en ok nka v as mum and maximum gueue lengths packets are marked or dropped with a probabil ity proportional to the current weighted average of the gueue length This gives the jednotliv potvrzov n flow control algorithms of the transport protocols an early warning before the queue tak e k zahlcen nedojde is filed References 1 Floyd S Jacobson V Random Early Detection
62. jit ovladat sw Snaha o efektivitu sm uje k user thread m Pokud je jejich implementace sou st zarizeni aplikace et se na overheadu vol n kernelu et se pam kernelu a v bec je system control to pohodln j lov k si m e t eba i ledacos doimplementovat p ep n n m e programy na b t kooperativn bez probl m s robustnost opera n ho syst mu zarizeni nesahaji Implementace user thread mus e it adu komplikac Proto e p ep n n userPrimo ale pres OS thread m na starosti aplikace pokud n kter z nich zavol kernel a z stane tam gt opet privileged p ep n n se zastav Pokud je p ep n n preemptivn m e narazit na probl my mode s reentrantnost knihoven i syscalls typicky malloc Thready se mohou vz jemn tj napr instrukce ru it skrz glob ln kontext typicky errno lseek brk Thready nemohou bez pod pro pristup k pory kernelu pou t v ce procesor zarizeni in out Intel jsou privileged pokud na zarizeni nejsou spec instr Jakmile je k dispozici p ep n n kontextu princip pl nov n proces je jasn Oper ale pristupuju k nim a n syst m si udr uje seznam proces kter maj b et a st dav jim p id luje po stejne jako k pameti tac Trocha terminologie seznam proces kter maj b et se jmenuje ready queue tak access rights procesy v tomto seznamu jsou ve stavu ready to run Ted se d kre
63. kna pamatovat si kolik kdo d lal down a kdy um e tak to vyupovat u z mk to neva pro Chapter 2 Process Management Semaphores Velmi podobn z mk m jsou semafory BTW vymyslel je Dijkstra n kdy v roce 1965 Semafor m metody signal a wait asto bohu el pr v d ky Dijkstrovi z holand tiny pojmenovan P passern proj t kolem a V vrijgeven uvolnit a initial count kter k kolik proces sm sou asn vlastnit semafor Op t stru n nast nit implementaci s atomickou operac a e en n jak ho synchro niza n ho probl mu nap klad producent a konzument V Unixech podle System V a tedy i v Linuxu jsou semafory poskytov ny syst mem Tyto semafory synchronizuj procesy s odd len m adresov m prostorem emu odpov d i jejich interface Semafor lze vytvo it vol n m int semget key number flags kter vr t sadu semafor glob ln pojmenovanou dan m kl em Se semafory se pak pracuje vol n m int semop key ops data ops number Ka d ze sady operac obsahuje slo semaforu v sad jednu ze t operac se semaforem a flagy Operace je bu p i ten sla k semaforu nezaj mav nebo test semaforu na nulu ek se do okam iku ne semafor dos hne nuly nebo ode ten sla od semaforu ek se do okam iku ne semafor bude m t dostate n velkou hodnotu Z flag jsou zaj mav IPC NOWAIT kter k e se nem ekat a S
64. matching rule is used An action is either a chain name or ACCEPT DROP QUEUE RETURN ACCEPT means process packet DROP means discard QUEUE means queue for user space application to decide RETURN means Chapter 6 Network Subsystem continue previous chain Modules that process packets can be added available mod ules include marking address translation and redirection logging routing and oth ers gt cat etc sysconfig iptables filter INPUT ACCEPT 0 0 FORWARD ACCEPT 0 0 OUTPUT ACCEPT 0 0 INPUT FROM LOCAL 0 0 INPUT FROM WORLD 0 0 FORWARD FROM LOCAL 0 0 FORWARD FROM WORLD 0 0 Sort traffic A INPUT i lo j INPUT_FROM_LOCAL A INPUT i ethO j INPUT FROM LOCAL A INPUT i tun0 j INPUT FROM LOCAL A INPUT i tunl j INPUT FROM LOCAL A INPUT j INPUT FROM WORLD A FORWARD i lo j FORWARD FROM LOCAL A FORWARD i eth0 j FORWARD FROM LOCAL A FORWARD i tun0 j FORWARD FROM LOCAL A FORWARD i tunl j FORWARD FROM LOCAL A FORWARD j FORWARD FROM WORLD Input from local machines A INPUT FROM LOCAL j ACCEPT Input from world machines A INPUT FROM WORLD p tcp dport ssh j ACCEPT A INPUT FROM WORLD p tcp dport http j ACCE A INPUT FROM WORLD p tcp dport smtp j ACCE A INPUT FROM WORLD m stat stat ESTABLISHED RELATED j ACCEPT A INPUT FROM WORLD j REJECT
65. n kdo jinej tak se m References nejd v zept jestli mu ten soubor m e d t 1 RFC 1094 NFS Network File System Protocol Specification a j tim p dem vim e mam shade sache 2 RFC 1813 NFS Version 3 Protocol 3 RFC 3530 NFS Version 4 Protocol for a lease period The server enters grace period longer than any lease period after crash and only grants lock renewals during the grace period SMB CIFS samba microsoft protokol Example Server Message Block And Common Internet File System n co jako NFS4 S r zn vychyt vky TODO Some description at least from RFC and SMB amp CIFS protocol AFS z i 24 ar Velda Example Andrew File System callbacks z vazek The Andrew File System or AFS is a distributed file system initially developed at upozornit klienta kdy CMU AFS organizes files under a global name space split into cells where a cell is se soubor zm n an administrative group of nodes Servers keep subtrees of files in volumes which can be moved and read only replicated across multiple servers and which are listed in volume location database replicated across database servers Clients cache files writes are propagated on close or flush A server sends file data together with a callback which is a function that notifies of outdated file data in cache When a write is propagated to the server the server notifies all clients that cache the file data that their callback has been broken C
66. only set once at the beginning prefatching procesor si tipne po jakych krocich prochazim pamet napr jdu po 20B tak se prefatchuje par dalsic veci gt je dobre ukladat dat ekvidistantne The PCI Peripheral Component Interconnect bus transfers multiple units of data in frames A frame begins with transporting an address and a command that describes what type of transfer will be done within the frame Next data is transferred in bursts of limited maximum duration In a single bus cycle the bus master activates the FRAME signal to denote the start data se netahaji po 1B ale of the cycle sets the C BE Command Byte Enable wires to describe the type of vic reknu adresu kdyz ma transfer 0010 for device read 0011 for device write 0110 for memory read 0111 for pamet data ready tak na memory write etc and sets the A D Address Data wires to the address to be kazde hrane hodin se prenesotbad from or written to pene es VERON pes mei ie address and the command is transferred the bus master uses the IRDY Initiator Ready signal to indicate readiness to receive or send data The target of the procesor transfer responds with DEVSEL Device Select to indicate that it has been addressed front side bus and with TRDY Target Ready to indicate readiness to send or receive data When north bridge RAM amp graficka karta ty nekdy komunikuji i Spolu direct media interface south bridge PCI SATA USB apod 1
67. p i limitu 8k na RPC request se mus n kter operace rozd lit opat en proti spadnut ne e se serveru aby klient se oF ae zanet que fit onst MNTPATHLEN 1024 maximum bytes in a pathname argument spadnuti serveru hs t beue const MNTNAMLEN 255 maximum bytes in a name argument x dosi deve dera d const FHSIZE 32 x size in bytes of a file handle tx bezestavovy tj nap dn oteviranityoedef opaque fhandle FHSIZE a zavir n souboru typedef string name MNTNAMLEN filehandle pou v slo typedef string dirpath lt MNTPATHLEN gt inodu inode fs id generation pro pripad Ze ynion fhstatus switch unsigned fhs_status mi n kdo sma e soubor pod case 0 rukou novy soubor se fhandle fhs_fhandle stejnym inode dostane default vy generation void l nemam open gt musim si ho emulovat na klientovi typedef struct mountbody xmountlist jen lok ln open struct mountbody ud l m lookup sezenu name ml hostname filehandle LOK LN vr tim dirpath ml directory n jakej sv j lok kln mountlist ml next handle l p stupov pr va typedef struct groupnode groups nemam stav proto testuju8truct groupnode FURT p i ka dym read name gr name Write groups gr next V zamyk n probl m server bezestavovej z mek stavovej emulace na klientovi ned v moc smysl tedy nezam
68. pot ebuje uspat syst m ho automaticky konvertuje na kernel thread ten usp a vzbud pinned thread Synchronous Requests Rozhrani pro tridy devices Problemy s kopirovanim dat na rozhranich User interface blokujici funkce funkce s asynchronn signalizac Probl my asyn chronn signalizace pfi chyb ch signalizace chyb indikace result k dem indikace glob ln prom nnou asynchronn indikace la DOS chyt ej asynchronn indikace 103 Chapter 4 Device Management Example Unix Driver Model Block devices p enos dat po bloc ch velikost blok d na vlastnostmi za zen Bloky jsou adresovateln p m p stup k dat m Maj cache maj fronty a obslu n strate gie Character devices p enos dat po jednotliv ch bajtech sekven n p stup Nemaj cache maj read a write rutiny V e uveden rozd len na character a block devices m ko eny v dob ch kdy se pro I O operace pou valy tzv kan lov procesory Ty podporovaly pr v dva re imy p enosu dat z perifern ch za zen a to bu po jednotliv ch bajtech nebo po bloc ch V sou asn dob nen toto lenen p li opodstatn n a rozhoduj c m initelem je sp e sekven n i n hodn p stup k dat m P kladem toho budi za zen pro digitalizaci videa ke kter mu se p istupuje jako ke znakov mu za zen kter v ak poskytuje data s granularitou cel ch frames nikoliv jednotliv
69. pro realtime procesy jsou ve form tabulky s d lkou kvanta 1 sekunda pro prioritu 0 100ms pro prioritu 59 Admin m e tyto tabulky m nit T dy interactive a timesharing sd lej tut tabulku parametr co by nazna o valo e sd lej i tent pl novac algoritmus ale nemus to b t pravda Letm m en dn rozd l neuk zala tak e je mo n e TS a IA t dy existuj kv li v t flexibilit ka d zvl t se d nastavit rozsah user priorit a by default jsou opravdu stejn Jako zaj mavost Solaris 7 nab z je t vol n kter umo nuje lightweight procesu po dat kernel aby mu do asn neodebral procesor To se hod t eba p i imple mentaci spinlock v user mode Detaily viz manpage schedctl init Example Linux 2 4 X Series Scheduler Co t eba Linux scheduler V p vodn m do 2 5 2 scheduleru jsou procesy rozd leny do dvou t d norm ln a realtime Realtime jsou pl nov ny bu round robin nebo FIFO P i ka d m pl nov n se vybere prvn proces s nejvy goodness value ta je u realtime proces 1000 realtime priorita u norm ln ch proces aktu ln priorita upravuje se je t aby se zohlednilo p epnut kontextu Aktu ln priorita je prost po et tik v timeslice kter procesu zb vaj ve chv li kdy v echny procesy se erou svoje tiky spo taj se jejich hodnoty znovu z nice API nice 0 odpov d zhruba 200ms timeslice API nice
70. procesy mohou zp sobit vykon n m sv ho nejdel ho kvanta dajn je to podle tohoto krit ria most fair algorithm known rok bude n kdy 1997 A hned druh d le it v c algoritmus nepot ebuje p edem zn t d lky kvant Tedy dn nastavov n period a podobn ch v c procesy si eknou o kvantum a v r mci sv v hy ho dostanou Example Mach Scheduler Mach pl nuje pouze na rovni thread a ignoruje existenci proces t m je v m rn textu threadu Z kladn princip pl nov n je st le stejn priority a zohledn n CPU usage zam me se na multiprocessor support Za prv Mach nem cross processor scheduler events Co to je napov d n zev pokud se na n kter m procesoru objev ud lost kter povol b h threadu s prioritou vy ne je priorita n jak ho jin ho threadu na jin m procesoru tento druh thread se nep eru ale poklidn dob hne sv kvantum ne se scheduler dostane ke slovu Za druh Mach zav d processor sets mno iny procesor ur en ch k vykon v n thread Ka d thread m p id len jeden processor set na kter m je pl nov n pro cessor sets definuje a p id luje admin Tak je mo n vyhradit nap klad rychl proce sory na realtime lohy i procesory se speci ln m hardware na lohy kter jej vyu ij Chapter 2 Process Management Procesory sice mohou pat it pouze do jednoho setu ale za b hu syst mu mohou mezi sety
71. se pou vaj term ny bottom half asynchronn volan st driveru kter se star p ev n o po adavky hardware a top half synchronn volan st driveru kter se star p ev n o po adavky software Toto ozna en odpov d ch p n architektury kde na nejni rovni je hardware n sleduj drivers pak opera n syst m pak aplikace Zm n n ozna en koliduje s term ny v Linuxu kter jako top half ozna uje okam it vykon vanou a jako bottom half odlo enou st obsluhy p eru en Zahl dl jsem ozna en t to terminologie jako Linuxov a t zm n n v e jako BSD ada text se zd se ob ma terminologi m vyh b Asynchronous Requests Example Linux Tasklets The interrupt handling code in the kernel is not reenterant When an interrupt han dler executes additional interrupts are disabled and a statically allocated stack is used This simplifies the code but also implies that interrupt handling must be short lest the ability of the kernel to respond to an interrupt promptly is affected The kernel offers four related tools to postpone work done while servicing an interrupt called soft irgs tasklets bottom half handlers and work gueues Soft irgs are functions whose execution can be reguested from within an interrupt handler All pending soft irgs are executed by the kernel on return from an inter rupt handler with additional interrupts enabled Soft irgs that were r
72. se to sam tak na druh stran k se tomu client a server stub p padn client stub a server skeleton Ulo en parametr do zpr vy se k marshalling opa n zase unmarshalling Z vis na typu parametr kter se p ed vaj Passed by value Jedin m probl mem m e b t nekompatibilita reprezentac parametru Ta se e bu stanoven m spole n ho form tu kr tk zpr vy n kdy oba zbyte n p ev d nebo uv d n m form tu ve zpr v flexibiln slo it stuby a del zpr vy yyy Passed by reference Nejt varianta U reference na dob e typovan mal data se d p ev st na obousm rn by value jednoduch a efektivn nem p esn tut s mantiku p i existenci cykl referenc u velk ch dat je vhodn j kdy server d dodate n klienta o data flexibiln j slo it j protokoly a neefektivn dereference N kter reference se prakticky nedaj p en st typick m p kladem je p ed n funkce jako parametru S p ed v n m jsou je t dal z ludnosti kter nejsou na prvn pohled z ejm Mezi n pat Global variables Pochopiteln nejsou u serveru dostupn ale ze s mantiky proce dure callu to nen zjevn tak se na to zapom n Hlavn to vad u takov ch v c jako jsou glob ln error resulty Ru n vytvo en stuby to um dod lat automat icky generovan u ne System
73. should be able to defend the need for process synchronization using a variety of practical examples You should be able to describe the practical examples using formalisms that abstract from the specific details but preserve the essential reguirement for synchronization You should be able to define precise reguirements on process synchronization related to both correctness and liveness You should be able to demonstrate how process synchronization can be achieved using a variety of practical tools including disabling interrupts atomic reading and atomic writing of shared memory atomic test and set over shared memory atomic compare and swap over shared memory message passing You should understand how process synchronization interacts with process schedul ing You should be able to explain how process synchronization can lead to schedul ing anomalies You should demonstrate familiarity with both implementation and application of common synchronization primitives including barriers signals locks semaphores condition variables monitors You should be able to select and apply proper syn chronization primitives to common synchronization problems Chapter 2 Process Management Questions volatile ze na prom nnou se Saha paraleln a musi bejt vzdycky v pam ti procesor d l prefatching m ze to bejt v r znym pofad gt pot ebuju memory model procesoru kdy d l m yv Sm ott jazyka NET Java
74. socket ten untouched p vodn furt poslouch ten novej je napojenej int connect int sockfd na tu protistranu const struct sockaddr serv_addr socklen_t addrlen send to pro UDP poslat The connect call connects a socket that is SOCK STREAM SOCK_RDM SOCK_SEQPACKET bez a kov n se socketamato a remote address and sets a remote address of the socket otherwise ssize t send int sockfd const void buf size t len int flags Ssize t sendto int sockfd const void buf size t len int flags serege poll const struct sockaddr xto socklen t tolen motivace accept je ssize t sendmsg int sockfd const struct msghdr msg int flags blokujci kdy ni dlouho nejde zbytecn tam visim struct msghdr select poll jako accept void xmsg_name optional address na MNO IN socket vr t socklen_t msg_namelen optional address length se kdy na JEDNOM se n co struct iovec msg_iov array for scatter gather objev m timeout size_t msg_iovlen array for scatter gather length void xmsg_control additional control data socklen_t msg_controllen additional control data length int msg_flags l The sena family of calls sends data over a socket Either the socket is connected or the remote address is specified The write call can also be used but the flags cannot be specified in that case ssize t recv int sockfd void xbuf size_t
75. spin init pthread spinlock t xlock int pshared int pthread spin destroy pthread spinlock t xlock int pthread spin lock pthread spinlock t lock 69 Chapter 2 Process Management int pthread spin trylock pthread spinlock t xlock int pthread spin unlock pthread spinlock t xlock With mutexes available in user space but threads implemented in kernel space it is unavoidable that some mutex operations have to call the kernel It is however possible to optimize mutex operations for the case without contention so that the kernel does not have to be called when no scheduling is needed Linux makes this futex sleepif prom nn optimization possible through its futex interface m n jakou hodnotu Fast Userspace muTEX z mek op t m tu vlastnost Ze musim sk kat do kernelu int sys futex void futex int op int val const struct timespec timeout When called with op set to FUTEX WAIT the interface suspends the current thread if the value of the futex equals to va1 When called with op set to FUTEX WAKE the interface wakes at most va1 threads suspended on the futex A simplified example of implementing a mutex using a futex is copied from Drepper class mutex private Mutex state variable zero means fr int val 0 public void lock int old Atomically increment the state and get the old value which should be zero if mutex was free while ol
76. start The next process to start is therefore yet another bootstrap pro cess which loads the program image of the next process to start This repeats until the operating system itself is loaded The reason for having a sequence of bootstrap pro cesses rather than a single bootstrap process that loads the operating system straight away is that loading the program image of the operating system requires knowing the structure of the program image both on disk and in memory This structure de pends on the operating system itself and hardcoding the knowledge of the structure in a single bootstrap process which resides in fixed memory would limit the ability of the computer to load an arbitrary operating system Relocating The act of loading a program image is further complicated by the fact that the pro gram image may have been constructed presuming that it will reside at a specific range of addresses and may contain machine code instructions or static variables that refer to specific addresses from that range using what is denoted as absolute addressing program nekam posunu oproti Predpokladane pozici posun pak prictu k adresam v kodu technika B vyhnot se konstrukcim s pevnou adresou kod je PIC jestli to jde zavisi na instrukcni sade procesoru relativni skoky je to slozitejsi a pomalejsi ale adresy jsou vsechny relativni a tedy v poradku Declaring and accessing a global variable in C static int i declare a global variab
77. tay Drbture vdd A compact disk has one surface where sectors of typically 2048 bytes are organized proto kdy vypnu pisni kulna spiral track tak to je t chv li hraje snaha o kompresi a dnes Addressing v P Pe Initially sectors on a disk were addressed using the surface track and sector num aspo diferenci ln ADPCM pers This had several problems First implementations of the ATA hardware inter TREND OPTEN PRECE face and the BIOS software interface typically limited the number of surfaces to 16 GPL sont xa pis kravin the number of cylinders to 1024 and the number of sectors to 63 Second the fact that MIDI m nasamplovany the length of a cylinder depends on the distance from the center of the disk makes it zvuky skute nejch nastrojiadvantageous to vary the number of sectors per cylinder Lately sectors on a disk are therefore addressed using a logical block address that numbers sectors sequentially Example ATA Disk Access An ATA disk denotes a disk using the Advanced Technology Attachment ATA or the Advanced Technology Attachment with Packet Interface ATAPI standard S TOVKA dostane paket po le paket m e mit buffer p m p stup do pam ti seater ga e p mo odpov na pingy bitmapa ovlada POWER MGMT vyp n ste n vyp n za zen aby et il elekt inu sni oval teplo CLOCK r zn states power levels framework co o tom rozhoduje RTC rea
78. that was mass produced as far as 19 computers can be considered a mass IBM 350 a harddrive developed by IBM capacity of 5 1956 MB at 50 rotating magnetic discs with a diameter of 61 cm IBM 709 a computer developed by IBM uses Fortran a programming vacuum tubes multiplication 1957 language developed by John W and division operations take Backus at IBM 240 microseconds IBM 7090 a computer developed by IBM uses transistors a multiplication operation takes 25 1958 microseconds a division operation takes 30 microseconds One of the most powerful computers of the time was IBM 7094 The computer could perform floating point operations in tens of microseconds and was equipped with 32k words of memory one word being 36 bits Specialized hardware provided chan nels for independent input and output operations that could interrupt the processor The IBM 7094 computer run the Fortran Monitor System FMS an operating sys Chapter 1 Introduction tem that executed sequential batches of programs A program was simply loaded into memory linked together with arithmetic and input and output libraries and ex ecuted Except for being limited by an execution timeout the program was in full control of the computer Executing sequential batches of programs turned out to be inflexible At MIT the first experiments with sharing the computer by multiple programs were made in 1958 and publishe
79. the exception by growing the block that contains stack and retrying the machine code instruction that caused the exception A multithreaded program requires as many stacks as there are threads This makes placing the block that contains stack more difficult with respect to growing the block later unless segmentation is used is ee 26 pro nepfid lovat rovnou Heap str nky pro to d lit na hadp stack The process heap is used for dynamically allocated variables The heap is stored in one or several continuous blocks of memory within the virtual address space These str nek chci obvykle hodneblocks include a data structure used to keep track of what parts of the blocks are najednou used and what parts of the blocks are free This data structure is managed by the X heap allocator of the process kdyz alokuju objekt tak m p r des tek byt a to hodn asto In a sense the heap allocator duplicates the function of the virtual memory man ager for they are both responsible for keeping track of blocks of memory Typi cally however the blocks managed by the heap allocator are many small short lived tj mechanismus alokace and aligned on cache boundaries while the blocks managed by the virtual memory str nek se nehod pro manager are few large long lived and aligned on page boundaries This distinction alokaci objekt makes it possible to design the heap allocator so that it is better suited for managing blocks containing d
80. the particular operating system and pro gramming language nemusim switchovat kontexExample Posix Process And Thread API unix linux na kernel ale nezvlada to vic advanced funkce jako prerusovani vlaken nebo praci na vicero procesorech kontext se switchuje v intervalech desitek az stovek milisekund 30 To create a process the Posix standard defines the fork and execve calls The fork call creates a child process which copies much of the context of the parent process and begins executing just after the fork call with a return value of zero The parent process continues executing after the fork call with the return value providing a unique identification of the child process The child process typically continues by calling execve to execute a program different from that of the parent process To terminate a process the Posix standard defines the exit and wait calls The exit call terminates a process The wait waits for a child process to terminate MB_ICONHAND f open fajl dostanu handle coz je pointer n kam v m m address spacu tj hned poznam jestli to je leg ln pointer a po jeho otevfen vidim jestli je to leg ln po adavek linkovani za behu za behu si zazadam o knihovnu funkce volam nazvem Chapter 2 Process Management and returns its termination code Additional ways for a process to terminate both voluntarily or involuntarily exist pid t fork void int ex
81. typical compiled procedural programming languages For a contemporary processor explain how the same machine instructions with the same arguments can access local variables and arguments of a proce dure regardless of their absolute address in the virtual address space Explain why this is important Explain what is the function of a heap allocator Explain why the implementation of the heap allocator for user processes usu ally resides in user space rather than kernel space Design an interface of a heap allocator Explain the problems a heap allocator implementation must solve on multi processor hardware and sketch appropriate solutions Explain the rationale behind the Buddy Allocator and describe the function of the allocation algorithm Explain the rationale behind the Slab Allocator and describe the function of the allocation algorithm Describe what a heap allocator can do to reduce the overhead of the virtual memory manager Explain the function of a garbage collector Define precisely the conditions under which a memory block can be freed by a reference tracing garbage collector Describe the algorithm of a copying garbage collector Describe the algorithm of a mark and sweep garbage collector Describe the algorithm of a generational garbage collector Assume knowledge of a basic copying garbage collector and a basic mark and sweep garbage col lector Hint Essential to the generational garbage collect
82. vytv mal voln bloky mali katejch blok k hovnu Kusil se tedy je t worst fit kter tak nebyl nic extra Udr ov n zvl tn ch sez je to nejtypi t j nam ast ch velikost se n kdy naz v guick fit Sem asi pat i buddy system to worst fit experiment ln jest d len partitions na polovi n seky u seznam blok obvykl ch velikost prob l m s re i blok d lek p esn mocnin dvou Statistiky overheadu pro konkr tn aplikace uv d j 4 pro best fist 7 pro first fit buddy system rozd luje na FIFO seznamu voln ch blok 50 pro first fit na LIFO seznamu voln ch blok voln m sta na p lky 60 pro buddy system ram a a gea E D TET M S Johnstone amp P R Wilson The Memory Fragmentation Problem Solved ACM dobry vasntosti SIGPLAN 34 3 3 1999 Pod vat se na P R Wilson amp M S Johnstone amp M Neely amp D Boles Dynamic Stor SLAB programy Casto maj lace Allocation A Survey And Critical Review International Workshop on Memory hodn objekt stejny belltess eben ok aspi agement September 1995 ftp ftp cs utexas edu pub garbage allocsrv ps typu kdy ud l m delete na objekt d se o ek vat ze vznikl blok se d pou t na dal tatdv objekt dokonce n kdy nemusim ani pou t t konstruktor Chapter 3 Memory Management Buddy system V hodou buddy syst mu m byt zejm na to Ze se pti uvol ov n bloku d snadno naj
83. wait The po11 call makes it possible to more precisely distinguish what events to wait for int getsockopt int sockfd int level int optname void xoptval socklen t xoptlen int setsockopt int sockfd int level int optname const void xoptval socklen t optlen References 1 Hewlett Packard BSD Sockets Interface Programmers Guide Example Unix Sockets Unix sockets represent a class of sockets used for local communication between pro cesses The sockets are represented by a file name or an abstract socket name struct sockaddr un 143 winsocks Chapter 6 Network Subsystem std rozhrani k aplikaci i k sitovym protokolum tj kdyZ si pfid m novej protokol t API nasadim ho na 144 sa_family_t sun_family set to AF_UNIX char sun_path PATH_MAX socket name int socketpair int domain int type int protocol int sockets 2 Important uses of the Unix sockets include the X protocol gt netstat unix all servers and established Proto RefCnt Flags Type State Path unix 2 ACC STREAM LISTENING var run acpid socket unix 2 ACC STREAM LISTENING tmp font unix fs7100 unix 2 ACC STREAM LISTENING tmp gdm socket unix 2 ACC STREAM LISTENING tmp X11 unix X0 unix 2 ACC STREAM LISTENING tmp ICE unix 4088 unix 2 ACC STREA ISTENING var run dbus system bus socket
84. 0 00 19 0 class config device driver irq net power vendor sys devices pci0000 00 0000 00 19 0 net eth0o sys devices pci0000 00 0000 00 19 0 net eth0 address broadcast carrier devic features flags mtu power statistics To be done Network devices include linux netdevice h struct net_device 104 Chapter 4 Device Management Block devices include linux blkdev h struct request_queue Character devices include linux cdev h struct cdev IOCTL with strace When a new device is connected to a bus the driver of the bus notifies the udevd daemon providing information on the identity of the device The daemon uses this information to locate the appropriate driver in the driver database constructed from information provided by the modules during module installation When the appro priate driver is loaded it is associated with the device which thus becomes ready to use The notifications can be observed using the udevmonitor command gt udevmonitor env EVENT 12345 67890 add devices pci0000 00 0000 00 1a 7 usb1 1 3 1 3 1 0 CTION add EVPATH devices pci0000 00 0000 00 1a 7 usb1 1 3 1 3 1 0 UBSYSTEM usb EVTYPE usb interface EVICE proc bus usb 001 006 RODUCT 457 151 100 TERFACE 8 6 80 MODALIAS usb v0457p0151d0100dc00dsc00dp00ic08isc06ip50 HVOUHUBRGE This information is current for kernel 2 6 23 References 1 Patrick Mochel The Linux Kernel Device
85. 0000 12345 root 600 123456 2 dest 0x00000000 123456 root 600 234567 2 dest 0x00000000 1234567 nobody 777 345678 2 dest Example POSIX Shared Memory To be done void xmmap void xstart size t length int prot int flags int fd off t offset int munmap void xstart size t length Example Windows Shared Memory To be done HANDLE CreateFileMapping HANDLE hFile LPSECURITY ATTRIBUTES lpFileMappingAttributes zprava prazdna prozvon ni 1 integer pole bytu strukturovana data send msg target recv gt msg synch asynch zda Geka na doru eni block nonblock Chapter 2 Process Management DWORD flProtect DWORD dwMaximumSizeHigh DWORD dwMaximumSizeLow LPCTSTR lpName Flag PAGE READONLY gives read only access to the committed region Flag PAGE READWRITE gives read write access to the committed region of pages Flag PAGE WRITECOPY gives copy on write access to the committed region Flag SEC COMMIT allocates physical storage in memory or in the paging file on disk for all pages of a section Flag SEC IMAGE says file is executable mapping and protection are taken from the image Flag SEC NOCACHE disables caching used for shared structures in some architectures Flag SEC RESERVE reserves all pages of a section without allocating physical storage reserved range of pages cannot be used by any other allocation operations until it is released If hFile is OXFFFFFFFF the calling process must al
86. 3 kazda sbernice ma svuj protokol svou frekvenci apod Chapter 1 Introduction both the initiator and the target are ready one unit of data is transferred each clock cycle CLK FRAME 1 i L L PO AD F ADDR pata DATA2 DATA3 DATA4 CBE i EAD BYTE ENABLE IRDY TRDY DEVSEL Figure 1 7 PCI Bus Read Cycle 14 Chapter 1 Introduction CLK FRAME AID ADDR DATA1 DATA2 DATA3 DATA4 CBE VWRTE BYTE ENABLE TRDY DEVSEL Figure 1 8 PCI Bus Write Cycle Note that there are many variants of the PCI bus AGP Accelerated Graphics Port is based on PCI clocked at 66 MHz and doubles the speed by transporting data on both the rising and the falling edge of the clock signal PCI X Peripheral Component Interconnect Extended introduces higher clock speeds and ECC for error checking and correction Bus Mastering Multiprocessor systems and complex devices do not fit the concept of a single proces sor controlling the processor bus An arbitration mechanism is introduced to allow any device to request control of the processor bus PCI has an arbitrator who can grant the use of the bus to any connected device Example ISA Bus The ISA Industry Standard Architecture bus DMA cycle can be extended to sup port bus mastering After the controller finished the DRO and DACK handshake the peripheral device could use the MASTER signal to request bus maste
87. ANDLE hEvent na hod manual reset event po k a se rozb hnou v ichni kdo na n ekaj a z v rem j shod HANDLE CreateEvent LPSECURITY ATTRIBUTES lpsa BOOL fManualReset BOOL fInitialState LPTSTR lpszEventName HANDLE OpenEvent DWORD dwDesiredAccess BOOL bInheritHandle LPCTSTR lpName BOOL SetEvent HANDLE hEvent BOOL ResetEvent HANDLE hEvent BOOL PulseEvent HANDLE hEvent PulseEvent dajn ob as nemus fungovat a jeho pou v n se nedoporu uje Monitors Monitor dovoluje omezit paraleln p stup k objektu v programovac m jazyce vymyslel ho pan Hoare v roce 1974 a najdete ho nap klad v Concurrent Pascalu Module Jav Z ejm nejjednodu bude demonstrovat monitory pr v na Jav Ta m kl ov slovo synchronized kter lze uv st u metod objektu Pokud je n jak metoda takto ozna ena p ed jej m vykon n m se zamkne z mek spojen s jej m objektem m je zaji t na synchronizace Mimochodem Java nab z tak synchronizaci bloku k du na explicitn uveden m objektu to je p vodem m rn star koncept kter v podstat dovoluje ozna it v k du kritick sekce Pro p pady kdy je pot eba ekat na n co pr v uvnit monitoru se k n mu dopl uj extra funkce Jedno z mo n ch proveden dopl uje funkce delay gueue pro um st n procesu do fronty
88. Adres e jsou ulo en jako soubory jejich obsah je B strom s referencemi na ob 2 log sazen soubory 3 root directory M rn zjednodu eno Nejv t nev hodou se zd b t fragmentace To prevent frag Tt bitmapa blok mentation of MFT NTFS makes sure a free area called MFT zone exists after MFT voln pouzity Each time the disk becomes full MFT zone is halved p r blbost sla soubor Multiple Streams pohled na soubor libovolnej po et stream Jednotliv streams v souboru jsou ozna en jm ny a lze k nim p istupovat n co jsou data n co otev en m souboru se jm nem file stream name stream type Default stream se jsou atributy nijak nejmenuje a typ m data tak e file a file data je tot to se pou valo pro ale API co na tom sed na tok na Microsoft Information Server kter u jmen s explicitn uveden m streamem to prd nepoznal p ponu tak e lov k mohl st zdroj ky skript Legra n d lka souboru odr pouze default stream tak e data v dal ch streamech zab raj m sto na disku atributy ale v adres ch nejsou vid t tak se leckdy nekop ruj Explorer OK ale FAR ne kr tk konst d lka v MFT nu prom nn d lky hlavn N kter konkr tn atributy krom DATA nep stupn aplikaci data v extentech d se p mo komprimovat Atribut Obsah VOLUME VERSION Volume version streamy se j
89. Allocate and free objects of the cache void xkmem cache alloc kmem cache t xcachep int flags void kmem cache fr kmem cache t xcachep void x objp 96 obrana proti memory leaks kdyZ programuju a nemam GC musim si to hl dat GC uvolnuje pointry kter jsou nedosa iteln ref counting u objektu mam count pointer kdy mam cyklickou z vislost mezi nedosa itelnejma objektam tak to nenajdu d se a Chapter 3 Memory Management Two implementations of the allocator exist in the kernel called SLAB and SLUB The allocators differ in the way they keep track of slabs and objects SLAB being more complex SLUB being more streamlined Usage statistics is available with both allocators gt slabinfo Name Objects Objsize Space Slabs Part Cpu 0 S O blkdev queue 24 1544 40 9K 5 0 2 5 1 blkdev requests 24 288 12 2K 3 1 2 14 0 dentry 77974 208 16 8M 4104 0 2 19 0 inode_cache 2940 592 2 0M 491 0 2 6 0 mm_struct 93 856 98 3K 24 2 2 4 0 sigqueue 8 160 8 1K 2 0 2 25 0 task_struct 248 1808 524 2K 64 4 2 41 This information is current for kernel 2 6 23 References 1 Jeff Bonwick The Slab Allocator An Object Caching Kernel Memory Alloca tor Garbage Collectors A traditional interface of a heap allocator offers methods for explicit allocating and freeing of blocks on the heap Explict allocating and freeing however is prone to memory leaks when a process fails to free an allocated block even though
90. CFS class scheduler contains a tree of processes in dexed by the weighed time consumed by each process The pick next task func tion picks the process with the least consumed time achieving fairness among pro cesses in the process class or the process group An extra kernel thread redistributes processes between processors Example Windows Scheduler Windows uses a priority based scheduler The priority is an integer from 0 to 31 higher numbers denoting higher priorities The range of 1 15 is intended for standard applications 16 31 for realtime applications memory management worker threads use priorities 28 and 29 The priorities are not assigned to threads directly Instead the integer priority is calculated from the combination of one of seven relative thread priorities and one of four process priority classes Idle Normal High Realtime Idle 1 1 1 16 Lowest 2 6 11 22 Below Normal 3 7 12 23 Normal 4 8 13 24 Above Normal 5 9 14 25 Highest 6 10 15 26 Time Critical 15 15 15 31 The priorities are further adjusted Threads of the Idle Normal and High priority class processes receive a priority boost on end of waiting inside kernel and cer tain other events and a priority drop after consuming the entire allocated quantum Threads of the Normal priority class processes receive a priority boost after their 45 Chapter 2 Process Management window is focused Similar adjustment i
91. DO 1 Popi te strukturu syst mu soubor FAT na disku Ilustrujte pou it t to struk tury v operac ch ten dat ze souboru p i zadan cest a jm nu souboru a pozici a d lce dat v souboru a z pisu dat do nov vytvo en ho souboru p i zadan cest a jm nu souboru a d lce dat Uve te p ednosti a nedostatky tohoto sys t mu soubor Popi te strukturu syst mu soubor EXT2 na disku llustrujte pou it t to struktury v operac ch ten dat ze souboru p i zadan cest a jm nu souboru a pozici a d lce dat v souboru a z pisu dat do nov vytvo en ho souboru p i zadan cest a jm nu souboru a d lce dat Uve te p ednosti a nedostatky tohoto syst mu soubor Navrhn te syst m soubor kter je schopen efektivn podporovat neomezen dlouh jm na soubor a linky Popi te strukturu dat ukl dan ch na disk a algoritmy p e ten a zaps n dat z a do souboru dan ho jm nem v etn cesty a pozic v r mci souboru Vysv tlete p ednosti va eho n vrhu Navrhn te syst m soubor kter je schopen efektivn podporovat velmi kr tk i velmi dlouh soubory Popi te strukturu dat ukl dan ch na disk a algoritmy p e ten a zaps n dat z a do souboru dan ho jm nem v etn cesty a pozic v r mci souboru Vysv tlete p ednosti va eho n vrhu Still a sketch Understanding is essential Understanding is essential Understanding is optional Understanding is opt
92. EM UNDO kter zaji uje e operace na semaforu bude vr cena zp t pokud proces kter j volal skon Operace se bu ud laj v echny nebo dn Pak je je t semctl pro r zn dal operace int semget key t key int nsems int semflg int semid int int semop int semtimedop struct sembuf xsops unsigned nsops semid struct sembuf xsops unsigned nsops key t ftok const char pathname int proj id V Unixu jsou je t semafory podle POSIX specifikace Jejich interface je pochopiteln podobn pthread mutex m inicializuj se sem init dal funkce jsou ek n pokus o ek n ten hodnoty signalizace zni en semaforu int sem init sem t sem int sem destroy int pshared sem t xsem unsigned int value sem t xsem open const char xname int oflag mode t mode unsigned int value int sem unlink const char xname int int int sem wait sem t xsem sem trywait sem t sem sem timedwait sem t restrict sem const struct timespec xabs timeout int sem post sem t xsem int sem getvalu sem t sem int xsval Ve Windows jsou semafory podobn jako mutexy jen nemaj vlastn ky a maj refer ence count HANDLE CreateSemaphore LPSECURITY ATTRIBUTE lpsa LONG cSemInitial LONG cSemMax LPTSTR lpszSemName HANDLE OpenSemaphore DWORD dwDesiredAccess BOOL bInheritHandle LPCTSTR lpName DWORD WaitForSingleO
93. EMR and IOW signals to synchronize the transfer odstavi disk cte a posila data na sbernici jakoby to posilal procesoru pamet cte ze sbernice jakoby to dostavala od procesoru DMA kontroler dva citace cita data a adresy diriguje zarizeni a pamet procesor je odstaven Advances In Processor Architecture adresace jakoby bych potreboval dvoje adresy proto na adresovy sbernici je adresa pameti a adresace zarizeni se dela pres control bus DMA channel 0 3 tohle plati pro ISA da se i jinak Instruction Pipelining co dela procesor jednak ma cache takze nemusThe basic architegfire described earlier executes an instruction in several execution lezt na sbernici phases typicall fetching a code of the instruction decoding the instruction fetching jednak se da na sbernici the operands xecuting the instruction storing the results Each of these phases only prokladat procesorovy prenos a DMA jednak nemusi byt sbernice jen jedna 11 Edited by Foxit Reader Copyright C by Foxit Corporation 2005 2009 For Evaluation Only Chapter 1 Introduction procesor ma vic casti nektera fetchuje instr a dekodje dalsi cast vykonava instrukci instrukce zavisle gt vkladam bubliny hloubka pajplajny dnes 10 az 31 ideal asi 10 az 20 preruseni obvykle se necha dobehnout pajplajna jump je pro pajplajnu neprijemna instrukce intel stornuju a skocim mips jeste se provede jedna instrukce tj muzu se nejdr
94. For Evaluation Only Chapter 1 Introduction Interrupt Controller To provide means of requesting attention from outside the processor is equipped with the interrupt and interrupt acknowledge signals Before executing an instruc tion the control unit of the processor checks whether the interrupt signal is set and if it is the control unit responds with setting the interrupt acknowledge signal and setting the program counter to a predefined address effectively executing a subrou tine call instruction To better cope with situations where more devices can request attention the handling of the interrupt request signal is delegated to an interrupt controller The controller has several interrupt request inputs and takes care of aggregating those inputs into the interrupt request input of the processor using priorities and queuing and provid ing the processor with information to distinguish the interrupt requests Direct Memory Access Controller To provide means of transferring data without processor attention the processor is equipped with the hold and hold acknowledge signals The control unit of the pro cessor checks whether the hold signal is set and if it is the control unit responds with setting the hold acknowledge signal and holding access to the processor bus until the hold signal is reset effectively relinquishing control of the processor bus To better cope with situations where more devices can transfer data without proce
95. Gateways for Congestion Avoidance Example Linux Packet Scheduling Linux uses gueuing disciplines associated with network devices to determine how packets should be scheduled Some gueuing disciplines can combine other gueu ing disciplines Oueueing disciplines are connected through classes Filters tell what packets go to what class Root qdisc is prio with 3 bands tc qdisc add dev pppO0 root handle 1 prio ban Band 1 qdisc is sfq and filter is ICMP amp SS tc qdisc add dev pppO0 parent 1 1 sfq perturb tc filter add dev pppO0 parent protocol ip tc filter add dev pppO0 parent protocol ip tc filter add dev pppO0 parent protocol ip tc filter add dev pppO0 parent protocol ip tc filter add dev pppO0 parent protocol ip tc filter add dev pppO0 parent protocol ip tc filter add dev pppO0 parent protocol ip ae ra a eal egere Band 2 qdisc is sfq and filter is anything tc qdisc add dev pppO0 parent 1 2 sfq perturb tc filter add dev pppO0 parent 1 protocol ip Band 3 qdisc is tbf and filter is outbound ds 3 16 prio 1 u32 prio 1 u32 prio 1 u32 prio 1 u32 prio 1 u32 prio 1 u32 prio 1 u32 unfiltered 16 prio 9 u32 SMTP match match match match match match match match ip ip ip ip ip ip ip u8 H amp DNS amp outbound HTTP protocol sport dport sport dport sport Sport 22 22 53 53 80 1 Oxff Oxffff Oxffff Oxffff Oxffff Oxffff flowic flowic flowic flowic flowic flowic
96. K 7 A quick overview of other features includes a bad block map kept in reserved inode 1 an administrator space reservation Pro odhad v Linuxu je na 8GB partition celkem 1M I nodes z toho jsou pro b n se yes soubory tak 4 pou it ch ka d blok m 130MB Extra atributy se daj st a m nit p es Isattr a chattr pat mezi n IMM immutable APP append only SYNC syn chronous update COMPRESS UNDELETE SAFEDEL tyto t i ale kernel ignoruje Journalling mode for data is either writeback ordered or journal Writeback means data are not journalled Ordered means data are written to normal location before corresponding metadata are journalled Journal means both data and metadata are journalled Journalling is done to a special file 129 Chapter 5 File Subsystem References 1 Tweedie S C Journaling the Linux ext2fs Filesystem NTFS Example NTFS File System inspirovanej ext Na za tku disku je jen bootsektor s rozm ry disku a pointerem na MFT jeho kopie je ulo ena je t kdesi na konci partition Cel zbytek disku se adresuje po clusterech datov oblast rozd len kter jsou podobn jako u FAT mocninn n sobky sektor Na disku nele nic ne na bloky konstantn d lky soubory informace o nich jsou ulo eny v MFT aneb Master File Table Ka d soubor clustery maj u sebe je jednozna n identifikov n indexem do MFT MFT sama je tak soubor s indexem
97. Ka d deskriptor m u sebe verzi i l pe po adov slo a pokud se v seznamu deskriptor najde v ce deskriptor t ho typu uva uje se ten s nejvy verz Proto e seznam deskriptor nen nemus b t ukon en lze na jeho konec p ipisovat nov deskriptory kter nahrad star S t m souvis je t koncept Virtual Allocation Table kter mapuje logick na fyzick sektory disku V echny daje na disku jsou v logick ch sektorech kdy je pot eba nap klad p epsat st souboru mohou se adres e i zbytek souboru nechat tam kde jsou a jen se uprav mapov n Example JFFS2 File System JFFS2 is a journalling file system that accommodates the specific nature of the flash memory devices which are organized in blocks that need to be erased before writing FLASH interni FS nezapisovat furt do stejnyho m sta te se rychle log structured vSechny zm ny jsou jakoby diffy piSu furt dal a dal pax Chapter 5 File Subsystem cyklicky neplatny co je u p episuju The file system views the entire flash memory device a log consisting of arbitrarily arranged blocks The log contains records called nodes nodes fill up blocks but may not span block boundaries so that independent garbage collection of blocks remains possible There are three types of nodes An inode node which contains metadata and optionally a fragment of data be longing to a file Compression of the fragments i
98. LPCWSTR lpFileName STREAM INFO LEVELS InfoLevel LPVOID lpFindStreamData DWORD dwFlags BOOL FindNextStreamW HANDLE hFindStream LPVOID lpFindStreamData typedef enum STREAM INFO LEVELS FindStreamInfoStandard STREAM INFO LEVELS typedef struct WIN32 FIND STREAM DATA LARGE INTEGER StreamSize WCHAR cStreamName MAX PATH 36 WIN32 FIND STREAM DATA The only thing worth noting on the stream enumeration interface is probably the in consistent use of system constants suggested by the need to add an arbitrary constant to MAX PATH Cache Manager Quoted from Mark Russinovich David Solomon Windows XP Kernel Improve ments Create a More Robust Powerful and Scalable OS In order to know what it should prefetch the Windows XP Cache Manager moni tors the page faults both those that require that data be read from disk hard faults and those that simply require data already in memory be added to a working set soft faults that occur during the boot process and application startup By default it records 120 seconds of the boot process 60 seconds after all services have finished initializing or 30 seconds after the shell starts whichever occurs first The Cache Manager also monitors the first 10 seconds of application startup After collecting a trace organized into faults taken on MFT if the application accesses files or dir
99. M a memory 1966 circuit developed at IBM ARPANET a network project at ARPA ive Uniplexed Information and 1970 Computing System UNICS UNIX an operating system developed at Bell Laboratories Pascal a programming 1971 language developed by Niklaus Wirth at ETH Zurich Chapter 1 Introduction Hardware Year Software SmallTalk a programming 1972 language developed by Alan Kay at Xerox PARC Mouse an input device with a single ball developed by Bill 1973 English at Xeroc PARC A well known computer of the time IBM System 360 has been the first to introduce configurable assembly from modules The computer used the OS 360 operating sys tem famous for its numerous bugs and cryptic messages OS 360 supported execut ing multiple programs in parallel and introduced spooling of peripheral operations Another famous operating system was Multiplexed Information And Computing Service MULTICS designed for providing public computing services in a manner similar to telephone or electricity MULTICS supported memory mapped files dynamic link ing and reconfiguration An important line of minicomputers was produced by Digital Equipment Corpora tion The first of the line was DEC PDP 1 in 1961 which could perform arithmetic operations in tens of microseconds and was equipped with 4k words of memory one word being 18 bits All this at a fraction of the size and cost of compara
100. Model Power Management Rehearsal Questions 1 Popi te obecnou architekturu ovlada e za zen a vysv tlete jak tato architek tura dovoluje zpracov vat sou asn asynchronn po adavky od hardware a synchronn po adavky od software 2 Popi te obvykl pr b h obsluhy p eru en jako asynchronn ho po adavku na obsluhu hardware ovlada em za zen Pr b h popi te od okam iku p eru en do okam iku ukon en obsluhy P edpokl dejte obvyklou architekturu ovlada e kde spolu asynchronn a synchronn volan sti ovlada e komunikuj p es sd lenou frontu po adavk Ka d krok popi te tak aby bylo z ejm kdo jsou jeho astn ci a kde z skaj informace pot ebn pro vykon n dan ho kroku 3 Popi te obvykl pr b h syst mov ho vol n jako synchronn ho po adavku na obsluhu software ovlada em za zen P edpokl dejte obvyklou architekturu ovlada e kde spolu asynchronn a synchronn volan sti ovlada e komunikuj p es sd lenou frontu po adavk Pr b h popi te od okam iku vol n do okam iku ukon en obsluhy Ka d krok popi te tak aby bylo z ejm kdo jsou jeho astn ci a kde z skaj informace pot ebn pro vykon n dan ho kroku 105 usb Chapter 4 Device Management Devices Busses Although busses are not devices in the usual sense devices that represent busses are sometimes available to control selected features of the
101. NTER jnz ReadKey repeat keyboard read until it is Figure 2 9 CP M BDOS System Call Example When calling the BIOS module the application placed arguments of the requested service in registers and called the BIOS entry point for the specific service The entry point for the specific service could differ from system to system but its distance from the beginning of the BIOS module was the same for all systems jmp BOOT cold boot jmp WBOOT warm boot jmp CONST console status jmp CONI console input jmp HOME disk head to track 0 jmp SETDMA set memory transfer address jmp READ read sector jmp WRITE write sector Figure 2 10 CP M BIOS System Call Entry Points Example Intel 80x86 Processor Privileges The Intel 80x86 processors traditionally serve as an example of why calling more privileged code from less privileged code can be slow On Intel 80286 an average MOV instruction took 2 clock cycles to execute A call that changed the privilege level took over 80 clock cycles to execute A call that switched the task took over 180 clock cycles to execute On Intel 80386 an average MOV instruction took 2 clock cycles to execute A call that changed the privilege level took over 80 clock cycles to execute A call that switched the task took over 300 clock cycles to execute Modern Pentium processors introduce the SYSENTER and SYSEXIT instructions for efficient implementation of the system call interface
102. OID reserved VOID CloseThreadpool PTP POOL ptpp BOOL SetThreadpoolThreadMinimum PTP POOL ptpp DWORD cthrdMic VOID SetThreadpoolThreadMaximum PTP POOL ptpp DWORD cthrdMost VOID SubmitThreadpoolWork PTP WORK pwk Figure 2 25 Windows Thread Pool Calls It is also possible to query various information on process timing BOOL GetProcessTimes HANDLE hProcess PFILETIME lpCreationTime PFILETIME lpExitTime PFILETIME lpKernelTime PFILETIME lpUserTime Figure 2 26 Windows Process Timing Call Rehearsal At this point you should understand how multiple processes can run in parallel on a computer with multiple processors and how an illusion of multiple processes running in parallel can be provided even when the number of processes is higher than the number of processors You should be able to recognize important parts of process context and explain how efficient context switching can be done with each part of the process context You should see why it can be useful to split an activity of a process into multiple threads You should understand why and which parts of the entire context remain shared parts of the process context and which parts become private parts of the thread context You should be able to design meaningful rules telling when to switch a context and what context to switch to related to both the architecture of the operat
103. Of Multiple File Subsystems sss Rehearsal aaner rento rei ted EE e Westie thal te Ree aa e se Pessoa nic meo 6 Network Subsystem s sccesncetisssisesscsccccusiaine decvetenssaseduscevinnsscbesduenseide dcseteneseuscnsouveusdes Abstractions And OperalolS usos aiat and Uh tanc encanto oases SOCK CES etes e I eese uses Mee e e Drei where Mee LR priu Remote Procedure Call reap raaa aa tenerte treten trennen Rehearsal 4 hes Ceu UE Network Subsystem Jitern ls 050bieaisuodi tesa rente o endete ife pt ceto Queuing Architecture 15 sansa sien scerscistsseanicrrsurialiarasis IDA RR as MISERE ANE lio s Packet Filtering eo oor cec eee s ose ate Gina et eS case Packet Scheduling vesiis iieri eara dai naoko ione Example Linux Packet Scheduling iso dmascnieh quaehedi co Ma A QE Rehe rsal a nhu ies tein be eee iba bu rak obe Spe Ce ER Network Subsystem Applications ssssssseseeeeee tenentes FB Systems o eed utin e doesnt a oko ttti cmi dede Computational Resource Sharing asciende aba itis ia casi oineleSvysteti Image see de a o R e icis Rehe rsal ueteri ce eer tes eee ved edes e eee p eere ee e Pere 7 Security SUDSVSIBHE coit staceathasscatelencedannslaetaniodl csapeconcsutatlalontelecastannebeeaheses 157 A EDNET CA LOK 220 2 ase m EAD OM Ut OB DEAD foh 157 IancoxcPAMEEXamble casada tici ko sida RH I eni aa ELI RS Eb ocior 157 Kerberos FX ATM LC sanesu ni i a ea Ba aE Bone o ooo 158
104. Operating Systems Petr Tuma Lubomir Bulej Tomas Bures Vlastimil Babka Operating Systems by Petr T ma Lubomir Bulej Tom Bure and Vlastimil Babka This material is a work in progress that is provided on a fair use condition to support the Charles University Operating Systems lecture It should not be used for any other purpose than to support the lecture It should not be copied to prevent existence of outdated copies It comes without warranty of any kind This is version 150M generated on 2010 10 04 12 20 34 For the latest version check http dsrg mff cuni cz ceres Edited by Foxit Reader Copyright C by Foxit Corporation 2005 2009 For Evaluation Only Table of Contents 1 Introduction TUER cdsontessec ssosespeseeoesseaadseensseoecesoeseses 1 BOLO W OLE MR O TR m 1 Bur 1 SEU GEUTO 1e EO i e intet tassbacsecesateolicesanet deisestavceves Sr e RAS 1 Historie PerspectV eissis 1 ree Ag PE O O 1 E E e e T EAEE AEN A 3 Low Integration scnis insitet iiir Ee iai ie i aiae e Erian 4 High Integration T RH E A aude ac E ME O POV 5 Basic COnCe pts 5 Hardware Building Blocks rtr tette enne 6 Basic Computer Architecte zuzana boardu 7 Advances In Processor Architecture eese nene 11 Advances In Memory Architecture 12 Advances In Bus Architecture ccccccccccccccccsssesessssssssesecs
105. REGS OFFSET RA Nbase endm SAVE REGISTERS macro LOAD REGISTERS base lw ra REGS OFFSET RA Nbase lw Sfp REGS OFFSET FP base rj r 37 Chapter 2 Process Management 38 lw gp lw a3 lw a2 lw Sal lw a0 lw v1 lw vO lw Sat lw zero EGS_OFFSE T GP base EGS OFFSI T A3 Moase EGS OFFSI T_A2 base EGS_OFFSI T_A1 base E DE E EGS OFFSI T A0 base T EGS OFFSE T Vl base EGS OFFSE T VO base EGS OFFSE T AT Nbase R EGS OFFSE T ZERO base endm LOAD REGISTERS switch cpu context x Allocate a frame on the stack of the old thread and update the address of the stack top of the old thread addiu sp CONT EXT SIZE Allocate space on stack sw sp a0 SAVE REGIS T ERS Ssp mflo t0 mfhi t1 sw t0 REGS OFFSE sw Stl REGS OFFSI mfcO t0 status Save the old stack Save general registers Few other registers that the macro does not handle LO Ssp need to be saved as well sw t0 REGS OFFSE la Stl CPO_STAT I and t0 Stl mtcO t0 status lw sp Sal lw t0 REGS OFFSI mtlo t0 mthi S t1 lw t1 REGS OFFSI HU OAD REGIST ERS Ssp US IE MASK STATU
106. S sp Disable interrupts Switch to the new stack LO S sp Restore the registers in HI sp roughly the opposite order to fit the Stack semantics addiu Ssp j ra mtcO k0 lw k0 REGS OFFS ET STATUS sp CONTEXT SIZE Free space on stack Return to the newly Status restored context Edited by Foxit Reader Copyright C by Foxit Corporation 2005 2009 For Evaluation Only Chapter 2 Process Management Memory State In principle the memory accessible to a process can be saved and restored to and from external storage such as disk For typical sizes of memory accessible to a pro cess however saving and restoring would take a considerable amount of time mak ing it impossible to switch the context very often This solution is therefore used only when context switching is rare for example when triggered manually as in DOS or CTSS When frequent context switching is required the memory accessible to a process is not saved and restored but only made inaccessible to other processes This requires the presence of memory protection and memory virtualisation mechanisms such as paging Switching of the memory context is then reduced to switching of the paging tables and flushing of the associated caches When separate notions of processes and threads are considered the memory state is typically associated with the process rather than the thread The need for separate
107. TR lpFileName DWORD dwDesiredAccess DWORD dwShareMode PSECURITY ATTRIBUTES lpSecurityAttributes DWORD dwCreationDisposition DWORD dwFlagsAndAttributes HANDLE hTemplateFile HANDLE hTransaction PUSHORT pusMiniVersion PVOID pExtendedParameter BOOL DeleteFileTransacted LPCTSTR lpFileName HANDLE hTransaction BOOL CreateDirectoryTransacted BOOL RemoveDirectoryTransacted BOOL MoveFileTransacted BOOL CopyFileTransacted The support for transactions is generic driven by a system transaction manager and cooperating resource managers Transactional operations can therefore be provided by other parts of the system such as registry 125 Chapter 5 File Subsystem Rehearsal Questions 1 Popi te obvykl rozhran opera n ho syst mu pro p stup k soubor m pomoc operac ten a z pisu Funkce rozhran uve te v etn argument a s mantiky 2 Vysv tlete pro obvykl rozhran opera n ho syst mu pro p stup k soubor m pomoc operac ten a z pisu odd luje operace otev en a zav en souboru a operaci nastaven aktu ln pozice v souboru od vlastn ch operac ten a z pisu 3 Popi te obvykl rozhran opera n ho syst mu pro p stup k soubor m pomoc operac mapov n do pam ti Funkce rozhran uve te v etn argument a s mantiky 4 Popi te obvykl rozhran opera n ho
108. Toto rozli en m e b t r zn nap klad realtime procesy v glob ln front a ostatn v lok ln ch nebo dynamick p esouv n mezi glob ln a lok ln frontou podle zat en procesoru views d l pam Tam se u musi zohled ovat i cena migrace procesu cena vzd len ho p stupu k prost edk m a podobn ale to te nech me What Is The Interface As illustrated by the individual examples the interface to the scheduler is mostly determined by the scheduler itself Example Windows Scheduler API BOOL SetPriorityClass HANDLE hProcess DWORD dwPriorityClass DWORD GetPriorityClass HANDLE hProcess BOOL SetThreadPriority HANDLE hThread int nPriority int GetThreadPriority HANDLE hThread BOOL SetProcessPriorityBoost HANDLE hProcess BOOL DisablePriorityBoost BOOL SetThreadPriorityBoost HANDLE hThread BOOL DisablePriorityBoost 49 Chapter 2 Process Management 50 BOOL SetProcessAffinityMask HANDLE hProcess DWORD_PTR dwProcessAffinityMask DWORD_PTR SetThreadAffinityMask HANDLE hThread DWORD PTR dwThreadAffinityMask Figure 2 24 Windows Scheduler Calls Windows also provides an interface that implements the thread pool scheduling pat tern where a pool of threads with predefined minimum and maximum size is used to handle incoming work requests PTP POOL CreateThreadpool PV
109. a continue gueue pro vzbuzen jednoho procesu z fronty 75 Chapter 2 Process Management 76 Guards Pon kud m n zn m synchroniza n prost edek dovoluje zapsat p ed blok k du podm nku kter mus b t spln na ne se dan blok k du za ne vykon vat T m je v podstat dosa eno podobn funkce jako u klasick ho pou it condition variables a na to e nikdo nesignalizuje okam ik kdy se m znovu otestovat podm nka Co je tak d vod pro obecn guards t m nikde nejsou k dispozici okam ik otestov n podm nky je t k ur it Tak e se d laj guards kter maj okam iky otestov n pod m nky omezen na ud losti jako je vol n metody apod P kladem takov ho guardu m e b t select a rendez vous v Ad Ten se zapisuje pomoc p kaz select a accept kter jinak vypadaj jako case a deklarace procedury task body Foo is i j integer begin select when j gt 0 gt accept Xyzzy n integer do ldem end Xyzzy or end select end Foo task body Bar is begin Xyzzy 1 end Bar Funkce je p mo ar select nejprve vyhodnot v echny podm nky pokud je u n jak spln n podm nky k dispozici rendez vous provede se jinak se provede n hodn v tev n jak spln n podm nky nebo v tev else pokud nen spln n dn pod m nka tak se hod v jimka Accept ek dokud jej n kdo nezavol Rehearsal At this point you
110. a je t ekat na event co nen nic jin ho ne asynchronn doru ovan zpr va obsahuj c jeden integer Ka d dom na si ekne o processor share kter chce ve form sel slice a period ob v n jak ch timer tic ch Syst m pak zaru uje e v ka d period dostane aplikace nejm n slice tik s podm nkou Ze suma v ech slice period v syst mu je men ne 1 jinak by nebylo dostatek CPU asu na uspokojen v ech dom n Intern je scheduler zalo en na earliest deadline first algoritmu Udr uje se fronta dom n kter je t nebyly v dan period uspokojeny se azn podle asu konce t to periody a fronta dom n kter ji uspokojeny byly se azen podle asu za tku nov periody Scheduler v dy spust prvn dom nu ve front neuspokojen ch dom n pokud je tato fronta pr zdn pak n hodnou dom nu ve front uspokojen ch dom n N sleduj c scheduler action se napl nuje na as nejbli deadline nebo do konce slice whichever comes sooner Mimochodem p vodn popis algoritmu nezmi oval scheduler action p i vy erp n slice co se pak projevovalo jednak v anom li ch p i rozjezdu algoritmu dvak v ne zen m rozd lov n p ebyte n ho asu procesoru Divn P klad t i procesy A B C A chce share 1 per 2 B chce share 1 per 5 C chce share 2 per 10 Tabulka dky as sloupce zb vaj c slices a period deadlines pro A B C 46 Jako detaily
111. a oni sv t j wMaxPacketSize 0x0005 bInterval 10 bytes 5 once Check the example to see what the device descriptor reveals The interface class HID means a human interface device the interface subclass BOOT means a device useful at boot the interface protocol MOUSE means a pointing device A report descriptor would be used to describe the interface but a parser for the report descriptor is com plicated Devices useful at boot can therefore be identified from the interface class interface subclass and interface protocol The interrupt mentioned in the descrip tor does not mean processor interrupt but interrupt transfer as one of four available transfer types for specific transfer pipe References 1 Universal Serial Bus Specification 1 0 2 Universal Serial Bus Specification 1 1 3 Universal Serial Bus Specification 2 0 tak se rozsvitilo 15 hod se pro lad n kernelu Clock indikace jedn m z pisem na jednu adresu se velmi hod 1 sign l STROKE impuls pro ten dat po lu tam data nastavim stroke chv li po k m zhodim a jedu d l vstup spec kan ly pro tisk rnu d se i p es ty datov moc nep eru oval S RIOVEJ PORT TxData RxData 1 dr t vstupn 1 dr t v stupn hodiny sypu na ten port bity pro Synchro n zac n konci stop bit 1 obvod UART USART pfev d l Byte na proud bit registr s flagem buffer empty dal dr ty RIS
112. ack so service system auth dvojici u ivatel prost edelsession optional pam console so bylo by moc velk nav c to nepostihuje tranzitivn Uveden p klad k e p ihl en pomoc slu by login bude vy adovat sp n v ci apod vykon n modul securetty stack a nologin Modul securetty testuje zda se u ivatel root p ihla uje z termin lu uveden ho v etc securetty pro ostatn u ivatele usp je HERES vzdy Modul nologin testuje zda neexistuje soubor etc nologin pokud ano usp je He MESE Sees SR pouze uZivatel root Modul stack zafad vSechny testy sluZby system auth kde jsou amatuje ACL access p Ll i es nim je t moduly env podle etc security pam env conf nastav prom nn prost ed ids xd j m unix podle etc passwd a etc shadow ov jm no a heslo a deny jako default volba v dy sel e Nov j verze maj m sto modulu stack volbu include nebo Obecn lze pou t volby requisite selh n modulu zp sob okam it vr cen chyby l required selh n modulu zp sob vr cen chyby po zpracov n ostatn ch modul ke ka d mu u ivateli si sufficient sp ch modulu zp sob okam it vr cen sp n ho v sledku optional ee ME E sp ch i selh n modulu je d le it pouze pokud je jedin Krom toho existuj 1205s Eques je t slo it j metody kombinace modul kter dovoluj pro ka d mo n zp sob prikied amos ukon e
113. adline Driven scheduler actually also implements a modified version of the Unidirectional Sweep strategy except that it assigns deadlines to all requests and when a deadline of a request expires it processes the expired request and continues from that position of the disk head The Complete Fairness Queueing scheduler is based on the idea of queueing re quests from processes separately and servicing the queues in a round robin fash ion or ina weighted round robin fashion directed by priorities This information is current for kernel 2 6 19 References 1 Hao Ran Liu Linux I O Schedulers http www cs ccu edu tw lhr89 linux kernel Linux IO Schedulers pdf 113 Chapter 4 Device Management jak se pozna ze disk n Failures disk si ukl d n jak redundantn data ala CRC Obsluha diskov ch chyb retries reset adi e chyby v software Spr va vadn ch typicky pozn e je n co blok p padn vadn ch stop v hardware SMART diagnostics Caching whole patn a ekne mi to track caching read ahead write back Zm nit mirroring a redundantn diskov pole e tam ty data u nem ORA M f to u bejv ale pozd RAID 0 uses striping to speed up reading and writing RAID 1 uses plain mirorring and therefore requires pairs of disks of same size RAID 2 uses bit striping and Ham diagnostick mechanismy ming Code RAID 3 uses byte striping and parity disk RAID 4 uses block striping d se podle n
114. age in Intel HEX consisted of lines starting with a coma and followed by a string of hexadecimal digits 23 Edited by Foxit Reader Copyright C by Foxit Corporation 2005 2009 For Evaluation Only Chapter 2 Process Management LLAAAATTxxxxCC LL length of the data AAAA address of the data in memory TT indication of last line xxxx data CC checksum of the data Figure 2 3 Intel HEX Format The program image still consisted of the program code and the static variables stored exactly in the same form as when the program is executing in memory Example Program Image In DOS For small programs DOS employs the segmentation support of the Intel 80x86 pro cessors to avoid the need for relocation A program is expected to fit into a single segment and to always start at the same address within the segment namely 100h The file with the program image consisted of the program code and the static vari ables stored exactly in the same form as when the program is executing in memory For large programs DOS introduced the EXE format Besides the program code and the static variables the file with the program image also contained a relocation table The relocation table was a simple list of locations within the program image that need to be adjusted the adjustment being a simple addition of the program base address to the location Besides the relocation table the header of the file also contained the required memory
115. ags int fd off t offset int munmap void xstart size t length Pokud flags neuv d j jinak adresa se bere pouze jako n pov da syst m m e namapovat soubor od jin adresy kterou vr t Adresa mus b t zarovn na na hranici str nky To je pochopiteln pam ov mapovan soubory implementuje file syst m ve spolupr ci se spr vcem virtu ln pam ti kter d o data p i v padc ch str nek Na rozd l od adresy u nemus b t d lka zarovn na na hranici str nky pro krat soubory bude posledn str nka dopln na nulami a data zapsan za konec souboru se p i odmapov n souboru zahod Ochrana dan parametrem prot je PROT READ PROT WRITE PROT_EXEC nebo PROT NONE p padn kombinace ale op t kv li zp sobu implementace je jasn e ne v echny kombinace budou k dispozici Hlavn flags jsou MAP SHARED pro norm ln sd len zm n MAP PRIVATE pro vytv en kop technikou copy on write MAP FIXED p i po adavku mapovat pr v na uvedenou adresu MAP ANONYMOUS pro mapov n bez souboru Flags MAP PRIVATE a MAP FIXED maj z ejm v znam p i nahr v n aplikac do pam ti void mremap void xold address size t old size size_t new size unsigned long flags Snad jedin zaj mav flag MREMAP MAYMOVE int msync void xstart size_t length int flags int posix madvise void xaddr size t len int advice Advice can be given on future use of the mapped file Fla
116. aised again after return from an interrupt handler are executed by a kernel thread called ksoftirgd A soft irg can execute simultaneously on multiple processors The number of soft irgs is limited to 32 soft irgs are used within the kernel for example to update kernel timers and handle network traffic 101 Chapter 4 Device Management 102 Registers softirq handler extern void open softirq int nr void xaction struct softirq action void xdata void open softirq softirq_vec nr data data softirq_vec nr action action Schedules softirq handler inline fastcall void raise softirq irqoff unsigned int nr or softirq pending 1UL lt lt nr if in interrupt wakeup_softirgd Two soft irqs are dedicated to executing low and high priority tasklets Unlike a han dler of a soft irq a handler of a tasklet will only execute on one processor at a time The number of tasklets is not limited tasklets are the main tool to be used for schedul ing access to resources within the kernel Finally bottom half handlers are implemented using tasklets To preserve backward compatibility with old kernels only one bottom half handler will execute at a time Bottom half handlers are deprecated define DECLARE TASKLET name func data struct tasklet struct name NULL 0 ATOMIC INIT 0 func data void tasklet schedule struct tasklet struct xt void tasklet disa
117. ak 32 kB 127 dir n zev 8 3 atributy nap dir Oislo ptvniho lister souboru velikost souboru pro konec Gasy bit archive nahod se p i ka d zm n souboru p i backupu si mu u ten bit shodit gt jednoduchej incremental bc FAT funguje probl my na velkejch diskach vSechny dulezity data na jednom mist FAT se m e po kodit so One Chapter 5 File Subsystem malo timestampu neumi linky na velkejch disk ch se Example HPFS File System FAT nevleze do pam ti info o jednom fajlu A koliv z OS 2 produkt Microsoftu Citace z roku 1989 k e HPFS solves all the kterej je rozh zenej na problems of the FAT file system and is designed to meet the demands expected into disku jsou rozh zen the next few decades pr PATE que s Na za tku disku je vyhrazeno 16 sektor na bootstrap loader n sleduje superblock z pis do fajlu 3 z pisy fOZm ry disku a pointery na bad block list directory band root directory a spare data block Zbytek disku je rozd len na 8 MB bands ka d band m free sectors bitmap a FAT e jsem pou il data area kter se st daj tak aby bands sousedily bu bitmapami nebo data areas cluster KaZdy soubor je reprezentovany strukturou F node kter se pfesn vejde do sektoru dir e jsem zm nil datuKazdy F node obsahuje r zn access control lists attributes usage history last 15 A i Ec iu Tc chars of file name plus an all
118. al Performance Evaluation of Cache Replacement Policies for the SPEC CPU2000 Benchmark Suite 2 Sleator et al Amortized Efficiency of List Update and Paging Rules Hardware Implementation Intel IA32 Address Translation Procesor m dv vrstvy adresace jedna p ev d logick adresy na line rn a druh p ev d line rn adresy na fyzick Prvn vrstvu zat m nech me pro str nkov n je zaj mav jen ta druh Z kladn verze logick adresa 32 bit fyzick adresa 32 bit P eklad simple CR3 je base of page directory prvn ch 10 bit adresy offset odtamtud jednotliv segmenty tak base of page table druh ch 10 bit adresy offset odtamtud base of page zbyl ch 12 kdy t eba dojde stack relokuju jen stack 84 bit adresy offset Directory entry m krom 20 bit base je t 3 bity user data jeden bit size jeden bit accessed jeden bit cache disabled jeden bit write through jeden bit user supervisor jeden bit read write jeden bit present Page entry m krom 20 bit base je t 3 bity user data jeden bit global jeden bit dirty jeden bit accessed jeden bit cache disabled jeden bit write through jeden bit user supervisor jeden bit read write jeden bit present Pokud je nastaven bit global mapov n dan str nky se pova uje za p tomn ve v ech adresov ch prostorech a nevyhazuje se z TLB p i zm n CR3 Pokud je nastaven bit page size directory entry neukazuj
119. alise i gos dani albeit Rehe rsalas 5er eie treten et er tete better a eL e nere Devices tocMus er rs i Dro cen Ai MUT cM Amr cA I OAR Disk Storage Deyices aiins ieee uten cse Me t lene Memory Storage Devices se itech cis better ate Hc be o dsetd Network Cards eese ettet tonne ne eee eee t a See Se terne eee Parallel Ports o odia itd oto aaa lava hee Serial Ports ccccsccssssssscsssesescssssecsesssessesesessesenessesecssessesessesesscsssesavsesesensesesss Pone eI n ek cere RU D Am oM ene Vs he ead aioe MO rins 3s OJ RR KC PO tee OPOP O RO Rehlearsalzc Stet idee teste pce e SA bok t sued tele Rehearsali ieiuna Ue eite er itu e eise e eoe bep ovens 5 File Subsystem M Abstractions And Operations sis sesaecanlessvdcsveldincdsswnis vcsseaserspesinlas dl koosd vina Stream Pile Operations oineraino sahaan eE aA Aecteasebesteaccuae ate Example Windows Stream File Operations WR Bes Mats eee EE Bebe secs eee Mapped File Opera Hons ou ccsessiascivesanasciioass scascaietdoevenaianasnoulenseadyneunasd Whole File Operations ts2 ii tee een e eterne ie te d de Directory Operations i isa ddl balrebreuo drm cr vaa Hebe FG Md Sharing Support inny Aneena aa e aa EL entis au erae Consistency Support Rehearsal eile REO er RI SE dotek File Subsystem Internals essei coenae Snot borku dos leda Disk EO VO O LO O OTO O O O Integration Of File Subsystem With Memory Management Integration
120. alls Windows also offers fibers as a lightweight variant to threads that is scheduled co operatively rather than preemptively Fibers are created using the CreateFiber call scheduled using the SwitchToFiber call terminated using the DeleteFiber call LPVOID CreateFiber SIZE_T dwStackSize iPFIBER_START_ROUTINE lpStartAddress LPVOID lpParameter VOID SwitchToFiber LPVOID lpFiber VOID DeleteFiber LPVOID lpFiber Figure 2 20 Windows Fiber Creation System Calls Windows also allows a thread to associate thread local data with a key and to retrieve thread local data of the current thread given the key Chapter 2 Process Management DWORD TlsAlloc void BOOL TlsFree DWORD dwTlsIndex BOOL TlsSetValue DWORD dwTlsIndex LPVOID lpTlsValue LPVOID TlsGetValue DWORD dwTlsIndex Figure 2 21 Windows Thread Specific Data Calls To permit graceful handling of stack overflow exceptions it is also possible to set the amount of space available on the stack during the stack overflow exception handling BOOL SetThreadStackGuarantee PULONG StackSizeInBytes Figure 2 22 Windows Stack Guarantee Call Example Java Thread API Java wraps the operating system threads witha Thread whose run method can be redefined to implement the thread function A thread begins executing when its start method is called the stop method can be used to terminate the thread Example Op
121. and unreadable I also hope other readers will find this material current detailed and interesting The notes are being extended and published in good faith and should be taken as such And remember that you can always revert to other sources of information Some are listed below References l Abraham Silberschatz Operating System Concepts Wiley 2002 ISBN 0471250600 2 Andrew S Tannenbaum Modern Operating Systems Second Edition Prentice Hall 2001 ISBN 0130313580 3 Uresh Vahalia UNIX Internals The New Frontiers Prentice Hall 1995 ISBN 0131019082 Structure It is a laudable trait of technical texts to progress from basic to advanced from sim ple to complex from axioms to deductions Unfortunately it seems pretty much im possible to explain a contemporary operating system in this way when speaking about processes one should say how a process gets started but that involves mem ory mapped files when speaking about memory mapped files one should say how a page fault gets handled but that involves devices and interrupts when speaking about devices and interrupts one should say how a process context gets switched but that involves processes and so on This text therefore starts with a look at his toric perspective and basic concepts which gives context to the text that follows There forward and backward references are used shamelessly Historic Perspective Stone Age In 1940s computers were built by
122. anding is optional Understanding is essential Understanding is optional Understanding is optional Understanding is recommended Understanding is essential Just a curiosity Understanding is essential Understanding is essential Understanding is essential Understanding is essential Chapter 2 Process Management 79 Chapter 2 Process Management 80 64 65 66 67 68 69 70 71 72 73 74 75 76 74 78 79 80 Understanding is essential Understanding is essential Understanding is essential Understanding is essential Understanding is essential Understanding is essential Understanding is recommended Understanding is recommended Understanding is essential Understanding is essential Understanding is essential Understanding is essential Understanding is essential Understanding is essential Understanding is recommended Understanding is essential Understanding is recommended Chapter 3 Memory Management adresovej prostor procesu k d statick data typ sou st k du Management Among Processes heap new stack Multiple Processes Together probl m kdy mam moc vl ken mam moc z sobn k To be done Single Partition Primitivn e en bootstrap nat hne program kter m pro sebe celou pam Nev hody jsou z ejm chyb device drivery software nen p enositeln Pot eba p enositelnosti vznikaj opera n syst m
123. ane d ky busy locku v dy v dentry cache n co zap u a tim P i parsov n cesty se pak u ka d ho dentry je t kontroluje zda nem mounted file pfekreju to co je na Cdsyst m pokud ano vezme se jeho root dentry Union FS aaa Example Linux Union File System Stackable filesystems Whiteout files Rehearsal Ouestions 1 Vysv tlete hlediska ovliv uj c volbu velikosti blok jako aloka n ch jednotek na disku 2 Uve te jak mi zp soby lze na disku ukl dat informaci o bloc ch ve kter ch jsou um st na data soubor Jednotliv zp soby ilustrujte na existuj c ch sys t mech soubor a zhodno te 3 Uve te jak mi zp soby Ize na disku ukl dat strukturu adres Jednotliv zp soby ilustrujte na existuj c ch syst mech soubor a zhodno te 4 Vysv tlete rozd l mezi hard linkem a symbolic linkem Porovnejte v hody a nev hody obou typ link 5 Uve te jak mi zp soby lze na disku ukl dat informaci o voln ch bloc ch Jednotliv zp soby ilustrujte na existuj c ch syst mech soubor a zhodno te 6 Popi te zp sob ulo en informace o um st n dat soubor v syst mu soubor FAT Uve te p ednosti a nedostatky tohoto zp sobu ulo en informace V 7 Popi te zp sob ulo en informace o struktu e adres v syst mu soubor FAT Uved te p ednosti a nedostatky tohoto zp sobu ulo en informace 8 Popi te zp sob ulo en informace o um st n voln
124. ap klad ZONE DMA ZONE NORMAL ZONE HIGHMEM Z ny maj seznamy voln ch str nek per CPU aby nedoch zelo ke koliz m na multiprocesorech V ka d z n jede n co emu auto i kaj LRU k d pro z jemce hlavn v Linux 2 6 9 mm vmscan c Linux 2 6 9 include linux mmzone h Pro ka d proces se pamatuje mapa jeho adresov ho prostoru ve struktu e mm struct Linux 2 4 22 include linux sched h kter je seznamem struktur vm area struct Linux 2 4 22 include linux mm h Ka d area m za tek d lku flagy code data shared locked growing a p padn associated file Pot ebn areas se vytvo p i startu procesu u ivatel pak m e volat u jen p r syscalls jako mmap pro memory mapped files dnes ji v podstat b n z le itost a shared memory rovn nic nov ho pod sluncem nebo brk pro nastaven konce heapu Nic moc Linux 2 4 20 Linux 2 6 9 Mel Gorman Understanding The Linux Virtual Memory Manager Example Solaris Klasick rozd len na HAL proto e je pot eba nez vislost na platform pak spr vce segmet a spr vce adresov ch prostor Mapa pam ti klasicky k d heap stack Stack roste on demand Ka d segment m sv ho spr vce v podstat virtu ln metody pro typy segment hlavn je seg vn driver pro vnodes soubor seg kmem pro nestr nkovatelnou pam kernelu seg map pro vnodes cache Ka d spr vce um advise jak se bude p istupovat k
125. at file syst m bezestavov co p in v hody v distribuovan ch syst mech Operace b vaj k dispozici v synchronn i asynchronn verzi Example Linux Stream File Operations int open char pathname int flags int open char pathname int flags mode t mode int creat char xpathname mode t mode int close int fd The open creat and close operations open and close a file stream The O RDONLY O WRONLY O_RDWR open mode flags tell whether the file is opened for reading writing or both This information is useful for access right checks and potentially also for sharing and caching support These flags can be combined with O CREAT to create the file if needed O EXCL to always create the file O TRUNC to truncate the file if applicable O APPEND to append to the file The O NONBLOCK flag indicates that operations on the file stream should not block O SYNC requests that operations that change the file stream block until the changes are safely written to the underlying media The value of mode contains the standard UNIX access rights off t lseek int fildes off t offset int whence 119 Chapter 5 File Subsystem 120 The Iseek operation sets the position in the file stream The whence argument is one of SEEK SET SEEK CUR SEEK END indicating whether the offset argument is counted relatively from the beginning current position or end of the file stream Ssize t read int fd void buf si
126. ateMutex LPSECURITY ATTRIBUTES lpsa BOOL fInitialOwner LPTSTR lpszMutexName HANDLE OpenMutex DWORD dwDesiredAccess BOOL bInheritHandle LPCTSTR lpName DWORD WaitForSingleObject HANDLE hHandle DWORD dwMilliseconds BOOL ReleaseMutex HANDLE hMutex Parametr Ipsa ur uje security sd len nezaj m n s FInitialOwner k zda bude mutex po vytvo en okam it zam en pro volaj c ho LpszMutexName umo uje pojmenovat mutex Dva procesy mohou sd let mutex tak e jej vytvo pod stejn m jm nem p padn lze zavolat HANDLE OpenMutex DWORD fdw Access BOOL fInherit LPTSTR lpszName Jinou metodou sd len beze jm na je vol n BOOL DuplicateHandle HANDLE hSourceProcess HANDLE hSource HANDLE hTarget Process LPHANDLE IphTarget DWORD fdwAccess BOOL fInherit DWORD fd wOptions kter umo uje zduplikovat handle handle se ned p edat rovnou je process specific Cek se pomoc DWORD WaitForSingleObject object timeout vrac OK TIME OUT OWNER TERMINATED Tak funguje WaitForMultipleObjects viz v e Mu texy maj vlastn ky a lock count stejn jako critical sections Odemyk se pomoc BOOL ReleaseMutex mutex Z mk m e existovat vice verz konec konc podobn jako jin ch synchroniza n ch primitiv Tak se m ete setkat s term ny spin lock pro z mek kter ek na uvoln n aktivn term nem spinning se rozum pr v
127. bc so 6 INIT 0x805a92c FINI 0x80c382c RE 0x805a32c gt hexdump s 0x00 0000134 Figure 2 8 ELF Segments Example 0x08047034 0x00000100 0x08047134 0x00000013 0x08047000 0x000904a0 0x00000000 0x00000000 000134 n 19 c bin bash lib ld linux so 2 0 paddr flags paddr flags paddr flags paddr flags 0x08047034 r x 0x08047134 pl 0x08047000 r x 0x00000000 rw align align align align 2 2 2 0 2 12 2 2 Chapter 2 Process Management The ELF file format also supports special techniques used to minimize the number of pages modified during relocation and linking These include using global offset table and procedure linkage table The global offset table is a table created by the dynamic linker that lists all the abso lute addresses that the program needs to access Rather than accessing the absolute addresses directly the program uses relative addressing to read the address in the global offset table The procedure linkage table is a table created by the dynamic linker that wraps all the absolute addresses that the program needs to call Rather than calling the absolute addresses directly the program uses relative addressing to call the wrapper in the procedure linkage table Calling Operating System A process needs a way to request services of the operating system The services pro vided by the operating system are often similar to the services provided by the li braries and in the si
128. bject HANDLE hHandle DWORD dwMilliseconds 73 struct timespec timec Chapter 2 Process Management BOOL ReleaseSemaphor HANDLE hSemaphore LONG cRelease LPLONG lplPrevious ReleaseSemaphore se nepovede pokud by se ta semaforu zv t il p es maximum specifikovan p i jeho vytvo en Mimochodem nen mo n zjistit okam itou hod notu semaforu bez jeho zm ny proto e ReleaseSemaphore vy aduje nenulov cRe lease Condition Variables wait blok ek m Semafory a mutexy v dycky ekaj na situace typu uvoln n obsazen ho prost edku signal vzbud se jeden asto je pot eba pasivn ekat na slo it podm nky nap klad kdy n jak thread caine EE zobrazuje stav jin ch thread v GUI a p i zm n m ud lat repaint Tam se pak hod Pe fanaa baa We nap klad condition variables vSichni volajici Condition variable m metody wait signal a broadcast Pokud proces zavol wait za ne pasivn ekat Pokud n kdo zavol signal vzbud se jeden z pr v ekaj c ch ek m a bude prom nn 10 proces pokud n kdo zavol broadcast vzbud se v echny pr v ekaj c procesy m n ho n kolik proces Vyu it na pasivn ek n na slo it podm nky je pak nasnad Proces kter ek aktivn ek n blb v cyklu st d wait s testov n m podm nky kdokoliv pak m e ovlivnit vyhodno od V cen podm nky d l signal i broa
129. ble struct tasklet struct xt void tasklet enable struct tasklet struct xt When executed on return from an interrupt handler soft irqs are not associated with any thread and therefore cannot use passive waiting Since tasklets and bottom half handlers are implemented using soft irqs the same constraint applies there as well When passive waiting is required work queues must be used instead A work queue is similar to a tasklet but is always associated with a kernel thread trading the ability to execute immediately after an interrupt handler for the ability to wait passively define DECLARE WORK name func data struct work struct name data NULL func Create a work queue with a kernel thread to serve it struct workqueue struct create workqueue const char xname Request executing work by a given work queue int queue work struct workqueue struct queue struct work struct work int queue delayed work struct workqueue struct queue struct work struct xwork unsigned long delay Request executing work by the default work queue int schedule work struct work struct xwork int schedule delayed work struct work struct work unsigned long delay Chapter 4 Device Management void flush workqueue struct workqueue struct xqueue This information is current for kernel 2 6 23 References 1 Matthew Wilcox I ll Do It Later Softirgs Tasklets Bottom Halves Task Queu
130. ble mainframe computers High Integration In 1970s large scale integration made personal computers a reality The computers run operating systems that are anything from simple bootstrap loader with a BASIC or FORTH interpreter glued on to a full fledged operating system with support for executing multiple programs for multiple users on multiple computers connected by a network Hardware Year Software Control Program Memory CP M an operating system 1976 developed by Gary Kildall at Intergalactic Digital Research later renamed to just Digital Research IBM PC a computer 1981 MS DOS an operating system developed by IBM developed at Microsoft ZX Spectrum a computer developed by Richard 1982 Altwasser at Sinclair Research Finder an operating system 1984 developed by Steve Capps at Apple Basic Concepts Historically a contemporary operating system combines the functions of an extended machine and a resource manager The extended machine separates applications from the low level platform dependent details by providing high level platform independent abstractions such as windows sockets files The resource 5 Chapter 1 Introduction manager separates applications from each other by providing mechanisms such as sharing and locking Both the extended machine and the resource manager rely on established hardware concepts to build operating system structure and provide operating sy
131. bles ned laj nic Starvation And Deadlock Ke hladov n doch z v p pad kdy je n kter proces neust le odkl d n p esto e by mohl b et Tohle lze dob e uk zat nap klad u ten a p sa kde p sa m e 67 Chapter 2 Process Management 68 prost ekat proto e n kdo po d te P i e en synchroniza n ch loh je proto asto d le it aby pou it algoritmy zaru ovaly e nedojde ke hladov n Pokud jde o uv znut synchronizovan ch proces v podstat se nab z t i mo nosti jak se s uv znut m vypo dat toti zotaven prevence a vyh b n se Zotaven je technika p i kter syst m detekuje vznik deadlocku a odstran ho Hned dva probl my jak detekovat a jak odstranit Dokud syst m v kdo na koho ek m e detekovat prost m hled n m cykl v grafu Jakmile se ale nev kdo na koho ek nejde to e g aktivn ek n user level implementace synchronizace a thread Mohou se pou t n hradn techniky nap klad watchdogs ale to nen spolehliv Pokud p ipust me e um me deadlock detekovat jeho odstran n tak nen trivi ln Kdy se pod v me na prost edky lze je rozd lit na preemptivn a nepreemptivn pouze ty prvn lze procesu bezpe n odebrat Mezi preemptivn prost edky se d a dit nap klad fyzick pam nepreemptivn je skoro v echno ostatn Sebrat procesu nepreemptivn prost e
132. blocks makes this attack more difficult Stack The process stack is typically used for return addresses procedure arguments tem porarily saved registers and locally allocated variables The processor typically con tains a register that points to the top of the stack This register is called the stack pointer and is implicitly used by machine code instructions that call a procedure re turn from a procedure store a data item on the stack and fetch a data item from the stack Example Stack Pointer Of Intel IA32 Processors The Intel IA32 processors have a stack pointer register called EsP The CALL ma chine code instruction decrements the ESP register by the size of a return address and stores the address of the immediately following machine code instruction to the 93 archiv Chapter 3 Memory Management address pointed to by the ESP register Symetrically the RET machine code in struction fetches the stored return address from the address pointed to by the ESP register and increments the ESP register by the size of a return address The PUSH and POP machine code instructions can be used to store and fetch an arbitrary register to and from the stack in a similar manner Note that the stack grows towards numerically smaller addresses This simplifies the process memory management when only one stack block is present as it can be placed at the very end of the virtual address space rather than in the middle of t
133. busses Most notable are features for bus configuration Example SCSI The SCSI bus provides configuration in a form of an inquiry command Devices are addressed using ID 0 7 or 0 15 and LUN 0 7 ID selects a device LUN selects a logical unit within the device Devices can communicate with each other by sending commands using command descriptor blocks Examples of commands include Test Unit Ready 0 Sequential Read 8 Sequential Write 0Ah Seek OBh Inquiry 12h Direct Read 28h Direct Write 2Ah Commands can be queued and reordered Each device responds to the Inquiry command 106 Bit 7 6 5 4 3 2 1 0 Byte 0 Operation Code Inquiry 12h 1 Logical Unit Number Reserved EVPD 2 Page Code 3 Reserved 4 Allocation Length Inquiry Reply Length 96 5 Control Bit 7 6 5 4 3 2 0 Byte 0 Peripheral Qualifier Peripheral Device Typ 1 RMB Device Type Modifier 2 ISO Version ECMA Version ANSI Version 3 AENC TrmIOP Reserved Response Data Format 4 Additional Length n 4 5 Reserved 6 Reserved 7 RelAdr WBus32 WBus16 Sync Linked Reserved CmdQue SftRe 8 MSB e eegpccc Vendor Identification m 15 LSB 16 MSB Se perc Product Identification 31 LSB
134. cemi v pipeline Zd se e dn p kn obr zky nejsou UltraSparc Address Translation Je velmi podobn MIPS procesor m s d lkami str nek 8 64 512 a 4096 KB virtu ln adresa 44 bit ale rozd len na dv poloviny na za tku a konci prostoru 64 bit fyzick adresa 41 bit Op t je k dispozici kontext ur uj c kter mu procesu pat polo ka TLB bit na jeho ignorov n u glob ln ch mapov n page size ve dvou bitech z dal ch je t eba bit indikuj c endianness dat ulo en ch na dan str nce jmenuje se IE od Invert Endianness dal daj o endianness je v address space ID dal v instrukci xxX2 x TLB miss se z registr MMU d vy st adresa do translation table v pam ti kde by Chapter 3 Memory Management podle jednoduch ch hash rules m la b t pot ebn polo ka TLB Pokud tam je MMU ji um na pokyn od handleru na st do TLB Obr zky UltraSparc 2 User s Manual Chapter 15 MMU Internal Architecture Obr zek 15 1 ukazuje form t polo ky TLB CONTEXT je address space ID V je valid NFO je cosi IE je invert endianness L je lock entry P je privileged W je read only G je global Obr zek 15 2 ukazuje form t translation table v pam ti split k jestli se budou 8k a 64k str nky hashovat spole n nebo ne ARM Address Translation Rada procesor od ARM z n vodu pro ARMIOE Instruk n a datov TLB ka d 64 polo ek tabulka str nek podpor
135. cestovat Handoff scheduling Scheduling On Multiprocessors Pl nov n za ne b t je t o n co zaj mav j v multiprocesorov ch syst mech N k ter probl my u byly nazna en toti multiprocesorov pl nov n by nem lo b t pouh m roz en m singleprocesorov ho kter m jednu ready frontu a z n pos l procesy na v echny procesory Pro vlastn D vod spo v v tom jak vypad multiprocesorov hardware I u velmi t sn v zan ho syst mu maj procesory lok ln informace nasb ran za b hu procesu jako t eba cache p ekladu adres memory cache branch prediction a podobn Z toho je snadno vid t e v konu syst mu prosp je pokud scheduler bude pl novat tyt procesy p padn thready t ho procesu na st le stejn procesory Tomu se n kdy k processor affinity Ji zm n n mi p klady byly Solaris Linux a Windows NT kter na multiproce sorov m hardware zohled uj processor affinity a m rn se sna pl novat stejn pro cesy na stejn procesory Dal by byl t eba Mach Samoz ejm z st vaj dal probl my jeden z nich je nap klad sd len ready fronty m v ce proces sd l libovoln prost edek t m v ce na n m budou ekat a ready frontu sd l ka d procesor a kernel do n hrabe ka dou chv li Drobn m vylep en m je nap klad definov n local a global ready front s rozli en m kdy se bude sahat do kter
136. cheduler by describing how the algorithm decides what process to run and for how long Define processor affinity and explain how a scheduler observes it Propose an interface through which a thread can start another thread and wait for termination of another thread including passing the initial arguments and the termination result of the thread 51 Chapter 2 Process Management Exercises 1 Design a process scheduler that would support coexistence of batch realtime and interactive processes Describe how the processess communicate their scheduling requirements to the scheduler and what data structures the scheduler keeps Describe the algorithm that uses these requirements and data structures to decide what process to run and for how long and analyze the time complexity of the algorithm Process Communication 52 Means Of Communication B n pou van prost edky pro komunikaci mezi procesy jsou PER on NUM to mu e bejt i soubor sd len pam ti a v m na informac skrz tuto pam zas l n zpr v mezi procesy v r zn ch form ch seci ln varianta sign ly sd len pam musim e it zamyk n apod Shared Memory To be done Example System V Shared Memory To be done int shmget key t key size t size int shmflg void shmat int shmid const void xshmaddr int shmflg int shmdt const void shmaddr ipcs m key shmid owner perms bytes nattch status 0x0000
137. cky procedur v n jak m pro gramovac m jazyce Example Spring Remote Procedure Call Na pr v popsan m principu b nap klad Spring kde se procesy volaj skrz doors P i vol n door se p ed v buffer kter m e obsahovat data identifik tor door out of line data P ed v n je bu consume nebo copy s jasnou s mantikou Thread na stran klienta se pozastav na stran serveru se vybere thread z thread pool p s lu ej c ho k door kter vykon k d spojen s door Interfaces jsou popsan v IDL p ekl d se do client a server stub pod nimi jsou je t subcontracts ignore Pro marshalling Spring p vodn pou val buffer fixn velikosti spojen s ka d m threadem to se ale uk zalo patn ze dvou d vod Za prv v t ina vol n p en ela m n ne 128 bajt dat 90 pod 48 bajt a n kolikakilobajtov buffer byl pak zbyte n velk Za druh buffery se rezervovaly staticky m spot ebov valy pam Jako e en se ud lal stream interface s metodami put int put short put char put bulk put door identifier put aligned a odpov daj c mi get metodami Stream si by default alokuje buffer 128 bajt do kter ho od konce ukl d structured data door identifiers a out of line data a od za tku unstructured data v echno ostatn Structured data se p ekl daj unstructured kop ruj p i zapln n se alokuje extra overflow buffer Rehearsal A
138. comes free but is only added to the wait queue and re moved from the ready queue after the critical section becomes free Such a process would continue waiting even though the critical section would be free Another major flaw of the naive solution is that the access to the shared queue vari able is not synchronized The implementation of the shared queue variable would be a critical section in itself Both flaws of the naive solution can be remedied for example by employing active waiting both to make the decision to wait and the consecutive queue operations atomic and to synchronize access to the shared queue variable The solution is too long to be presented in one piece though Passive waiting is useful when the potential for contention is relatively high or the duration of waiting is relatively long Passive waiting also requires existence of an p i us n n je li ten flagther process that will wake up the passively waiting process false tak neusnu Linux sleepif n co a dal typicky zamky nemaji zaru enou f rovost 66 Nonblocking Synchronization From practical perspective many synchronization problems include bounds on wait ing Besides the intuitive requirement of fairness three categories of solutions to syn chronization problems are defined A wait free solution guarantees every process will finish in a finite number of its own steps This is the strongest category where bounds on waiting always
139. copy of data finish In the reclamation phase the writer discards the past copy of data The interface does not deal with writer synchronization Linux provides Read Copy Update as a part of the kernel The rcu read lock and rcu read unlock functions delimit readers The synchronize_rcu function synchronizes writers by waiting until there are no active readers The rcu assign pointer and rcu dereference macros make sure atomicity or ordering does not break synchronization For simplicity the interface does not care what data is accessed all readers are synchronized against all writers void rcu read lock void rcu read unlock typeof ptr rcu assign pointer ptr val typeof ptr rcu dereference ptr void synchronize rcu The interface permits many different implementations When context switching can be prevented a straightforward implementation can leave the reader synchroniza tion empty and wait for a context switch on each processor for writer synchroniza tion ta pro synchronizaci proces up down int value ups uvolmit value down value ek m value gt 0 value a jedu semafor inicializovan na hodnotu 1 je skoro z mek rozdily nem vlastn ka up na z mek s value 1 mu zm ni value na 2 ale to se d hl dat na co to je omezen po tu vl ken sahaj c ch na disk t eba max 5 najednou producent amp konzument readers amp writers co d lat pfi p du vl
140. cumbersome when each process has a completely distinct program and a completely distinct state the programs have to be kept compatible and the information about the state must be exchanged explicitly The need to simplify communication gives rise to the concept of threads as activities that share parts of program and parts of state within a process The introduction of threads redefines the term process When speaking about pro cesses alone the term process is used to refer to both the program and state as passive entities and the act of execution as an active entity When speaking about processes and threads together the term process is used to refer to the program and state as a passive entity and the term thread is used to refer to the act of execution as an active entity process is a passive shell that contains active threads executing within it Starting A Process Starting a process means loading the program and initializing the state The program typically expects to begin executing from a specific instruction with only the static variables initialized The program then initializes the local and heap variables as it executes Starting a process therefore boils down to loading the program code and the static variables together called the program image and setting the position of the currently executing instruction within the program to the instruction where the program expects to begin executing Bootstrapping zadratovane pevne m
141. d atomic_inc val 0 tenhle test se provede v user space The old value was not zero meaning mutex was not free Wait unless the value has changed since the increment futex_wait amp val old 1 void unlock val 0 Wake a waiting caller if any futex wake amp val 1 References 1 Ulrich Drepper Futexes Are Tricky http people redhat com drepper futex pdf Windows NT maj tak mutexy a to hned dvoj ho druhu Jedn m se k critical sec crit section je n J x y He i J pro zamyk n mezi vl knam lons druh m mutexes Nejprve critical sections mutexy mezi procesama void InitializeCriticalSection LPCRITICAL SECTION lpCriticalSection BOOL InitializeCriticalSectionAndSpinCount LPCRITICAL SECTION lpCriticalSection DWORD dwSpinCount 70 Chapter 2 Process Management void EnterCriticalSection LPCRITICAL SECTION lpCriticalSection BOOL TryEnterCriticalSection LPCRITICAL SECTION lpCriticalSection void LeaveCriticalSection LPCRITICAL SECTION lpCriticalSection Critical sections ve Windows NT si pamatuj vlastn ka je moZn je zamknout jed nim threadem n kolikrat a tolikr t se musi odemknout Jsou vicem n rychl ale nefunguj mezi procesy pochopiteln Pro synchronizaci mezi procesy se ve Windows NT pou vaj kernel objekty Z t ch n s moment ln zaj m mutex HANDLE Cre
142. d tak Ze jsou namapov ny n jak mno iny str nek pro ka d proces Algoritmy se daj aplikovat r zn m zp sobem klasick je rozd len na lok ln aplikov n algoritmu v r mci jednoho procesu a glob ln ap likov n algoritmu v r mci cel ho po ta e Mno in str nek kter proces pr v pou v se k working set jej obsah se m n tak jak proces b Ve chv li kdy b p li mnoho aplikac se jejich working sets do pam ti nevejdou Pak se v dy spust proces kter pot ebuje naswapovat n jakou str nku tedy se najde ob a za ne se swapovat mezit m se spust jin proces kter pot ebuje naswapovat n jakou str nku a tak po d dokola tak e se nic neud l k se tomu thrashing U lok ln ch algoritm se daj l pe poskytovat z ruky nap klad realtime aplikac m proto e se nestatne e jedna aplikace sebere druh pam ale obecn vyhr vaj sp glob ln algoritmy spolu s n jak m minimem str nek pro ka d proces Do vy hazovan ch str nek se pak po taj tak kernel caches I glob ln algoritmy v t inou funguj tak e iteruj postupn p es jednotliv procesy proto e t m budou sp vyha zovat nejd v str nky z jednoho procesu a pak z dal ho m zvy uj pravd podob nost e n kter procesy budou m t v pam ti cel working set Zm nit memory mapped files a copy on write References 1 Al Zoubi et
143. d 8 do 512 bajt po 8 a pro nejbli vy velikosti od 512 bajt zhruba logaritmicky V seznamech pro nejbli vy velikosti jsou bloky se azeny podle skute n velikosti pro v b r se pou v best fit P i uvoln n se blok ihned sceluje se sousedn mi bloky pokud je to mo n D le je dobr aby alok tor udr oval lokalitu tedy aby umis oval ned vno alokovan bloky bl zko sebe To m za v sledek men n roky na mechanizmus virtualizace str nkov n m proto e ned vno alokovan bloky budou pravd podobn pou v ny spole n Aby se toto splnilo alok tor zkus nejprve naj t exact fit pokud takov nen k dispozici a je mo n je t rozd lit voln blok ze kter ho se p id lovalo naposledy pou ije se ten jinak se pou ije best fit Jako dal speci ln optimalizace se posledn voln blok pam ti kter jedin m e r st pokud je pot eba v ce heapu pova uje za v t ne v echny ostatn bloky pro ely best fit algoritmu To zamezuje zbyte n mu natahov n heapu References 1 Doug Lea A Memory Allocator http gee cs oswego edu dl html malloc html Example Linux Kernel Slab Allocator To be done Create a slab cache kmem cache t x kmem cache creat const char xname size t size size t offset unsigned long flags void ctor void kmem cache t unsigned long void dtor void kmem cache t unsigned long
144. d for a specific application purpose can have that purpose imple mented by the bootstrap process Such approach however would be too limiting for computers designed for general use which is why the bootstrap process typically only initializes the hardware and starts another process whose program image is loaded from whatever source the bootstrap process supports tj mam na to 16B typicky to je skok nekam na konci aby Example Booting IBM PC 4 vyuzitelny prostor zacinal od 0 historicky duvod I The IBM PC line of computers use ntel 80x86 line of processors which start executing from address FFE FFFOh exact address depending on the address bus width and hence on the processor model A fixed memory with BIOS program image resides at that address The BIOS process initializes and tests the hardware of the computer as necessary and looks for the next process to start In the early models of the IBM PC line of computers the BIOS process expected the program image of the next process to start to reside in the first sector of the first disk connected to the computer have exactly 512 bytes in size and end with a two byte signature of 55A Ah The BIOS process loaded the program image into memory at address 7C00h and if the two byte signature was present the BIOS process then begun executing the next process to start from address 7C00h In many cases the fixed size of 512 bytes is too small for the program image of the next process to
145. d in 1959 Eventually a system that can interrupt an executing pro gram execute another program and then resume the originally interrupted program was developed The system was called Compatible Time Sharing System CTSS and required a hardware modification of the IBM 7094 computer Low Integration In 1960s integrated circuits appeared alongside transistors Integration has paved the way for smaller computers less power consumption less heat generation longer uptimes larger memory and lots of other related improvements Cabinet sized mini computers have appeared alongside room sized mainframe computers The comput ers run operating systems that support executing multiple programs in parallel with virtual memory provided by paging Hardware Year Software Integrated circuit a technology to integrate multiple transistors within a single device has 1961 developed by Robert Noyce at Fairchild Semiconductors Mouse an input device with two wheels developed by 1963 Douglas Engelbart at SRI Beginner s All Purpose Symbolic Instruction Code BASIC a programming language IBM System 360 a computer developed by J Kemeny and T developed by IBM The first Kurtz at Dartmouth College computer with configurable Ps Time Sharing System TSS an assembly from modules operating system developed at IBM MULTICS an operating 1965 system developed at Bell Laboratories Dynamic Random Access Memory DRA
146. d registers if the registers can do either read and write or conditional update or at least N 2 of those shared registers if the registers can do both read and write and conditional update The same goes for starvation free mutual exclusion Randomized synchronization primitives References 1 Maurice Herlihy Wait Free Synchronization 2 Faith Fich Danny Hendler Nir Shavit On the Inherent Weakness of Condi tional Synchronization Primitives Synchronization And Scheduling Convoys To be done Priority Inversion Priority inversion je situace kdy procesy s vy prioritou ekaj na n co co vlastn procesy s ni prioritou V nejhor m p pad m e priority inversion v st i k dead locku e en m priority inversion m e b t priority inheritance Inversion and active and passive waiting Z principu se podpora priority inheritance zd b t jednoduch V okam iku kdy proces za ne na n co ekat se jeho priorita prop j procesu vlastn c mu to na co se ek Probl m je v tom e tohle funguje dob e u z mk kter maj jednoho vlast n ka ale u semafor nebo condition variables se u ned zjistit kdo vlastn bude ten proces kter v zan prost edek uvoln a komu se tedy m priorita p j it Re en nap klad v Solarisu priority inheritance funguje p mo a e u mutex read write z mky zv prioritu prvn ho vlastn ka a dal ch u ne condition varia
147. dcast Jen jedno drobn zdokonalen proto e test vn im si CV hl daj Cipodminky musi byt atomicky condition variable je sv z na s mutexem ktery chr n cu promenheu testovanou podminku j na ni wait kdo ji m n ten na n Implementace condition variable nast n n pou it uvnit while ignal NES T MS T am Samoz ejm operace condition variables je t eba pou vat s rozmyslem Jednak je t eba m t na pam ti e u condition variable se signal p ed broadcast nezapo t na rozd l od z mk a semafor Dvak kdy se ud l broadcast m e doj t ke spu t n zbyte n velk ho po tu proces nar z V pthread library condition variables samoz ejm jsou Vytv ej se int pthread cond init pthread cond t pthread condattr t dal metody jsou signal broadcast wait timedwait timeout a destroy Z ejm nen co dodat int pthread cond init pthread cond t cond pthread condattr t xcond attr int pthread cond destroy pthread cond t xcond int pthread cond signal pthread cond t xcond int pthread cond broadcast pthread cond t xcond int pthread cond wait pthread cond t cond pthread mutex t mutex int pthread cond timedwait pthread cond t cond pthread mutex t xmutex const struct timespec xabstime pln stranou podobn mechanismus je mo n naj t t eba v Jav kde thread m e zavolat metodu wait na libovoln m objektu jin thready ho pa
148. dek jen tak nejde n siln ukon it proces m e zp sobit dal My probl my ste n e en nab zej transakce s mo nost rollbacku a retry Prevence uv znut znamen e procesy se naprogramuj tak aby nemohly uv znout Aby mohly procesy uv znout mus sou asn platit ty i podm nky procesy mus ekat na prost edky v cyklu procesy mus prost edky vlastnit v hradn procesy musi b t schopny p ib rat si vlastnictv prost edk prost edky nesm b t mo n vr tit e en jsou pak zalo ena na odstran n n kter z t chto podm nek na prvn je to nap klad uspo d n prost edk a jejich z sk v n v po ad ur en m t mto uspo d n m na druhou virtualizace na t et sou asn zamyk n v ech prost edk na tvrtou spin styl zamyk n prost edk Vyh b n se deadlocku spo v v tom e procesy p edem poskytuj dost informac o tom na kter prost edky budou je t ekat Samoz ejm to je problematick ale ob as se to d l Ze katulky vyh b n se deadlocku je i bank v algoritmus Jeho jm no poch z z modelov situace kdy bank nab z z kazn k m p j ky do ur it ho limitu a jeho celkov kapit l je men ne po et z kazn k kr t limit P i ka d dosti o p j ku bank zkontroluje zda po p j en z stane dost pen z na to aby si alespo jeden z kazn k mohl vybrat pln limit pos
149. ding is optional N RSS Just a curiosity N O1 Just a curiosity N G Understanding is optional N N Just a curiosity N oo Understanding is recommended N NO Understanding is recommended Q e Just a curiosity a Just a curiosity 116 Chapter 4 Device Management 32 Just a curiosity 33 Just a curiosity 117 Chapter 4 Device Management 118 Chapter 5 File Subsystem Fajlsyst my adres fe soubory velkej po et i objem odolnost zabezpe en sd len dat O RW C open close pro open zvl t proto e sta jednou ud lat kontroly naj t atd tj aby Read co nejrychlej mode p i open opr vn n apod append jde i pokud zapisuje v c aplikac najednou v dy to seekuje na konec optimalizace na sekven n ten institut aktu ln pozice v souboru seek zm na pozice close flush n close aktualizace informac o souboru amp tp reopen sync no buff rovnou se to sype na disk asynchronn operace File syst m poskytuje abstrakce adres a soubor nad disky p padn i jin mi typy pam ov ch m di Tyt abstrakce adres a soubor se mohou pou t i k jin m el m nap klad ke zp stupn n stavu syst mu nebo ke zprost edkov n s ov komunikace Z kladn po adavky kladen na file syst m jsou schopnost ukl dat velk po et i velk objem
150. ditional jumps to facilitate statistical prediction Another solution all possible branches are prefetched and the incorrect ones are discarded pokud chci vic scitat dam si tam vic scitacek iont Superscalar Execution optimalizuje kod An increase in speed and efficiency can be achieved by replicating parts of the proces sor and executing instructions concurrently The superscalar execution is made diffi cult by dependencies between instructions either when several concurrently execut ing instructions employ the same parts of the processor or when an instruction uses results of another concurrently executing instruction Both collisions can be solved by delaying some of the concurrently executing instructions thus decreasing the yield lepsi je prokladat nesouvisejiobf the superscalar execution instrukce predikce skoku branch predict nebudu cekat tipnu si jestli An alternative solution to the collisions is replicating the part in the processor For illustration Intel Core Duo processors are capable of executing four instructions at jence under ideal conditions Together with instruction pipelining AMD Hammer processors can execute up to 72 instructions in various stages se skoci a pripadne invaliduju An alternative solution to the collisions is reordering the instructions This may not always be possible in one thread of execution as the instructions in one thread typi staticka predikce skok dozadu cally work on th
151. do k m tak m pust Chapter 2 Process Management 64 and write instructions operating across cache lines are not Read modify write in structions can be made atomic using a special LOCK prefix Starting with the Intel Pentium 4 processors the processor family introduced the MONITOR and MWAIT instruction pair The MONITOR instruction sets up an ad dress to monitor for access The MWAIT instruction waits until the address is ac cessed The purpose of the instruction pair is to optimize multiprocessor synchro nization because the processor is put into power saving mode while waiting for the access Also starting with the Intel Pentium 4 processors multiple memory ordering models were introduced to enable optimization based on reordering of memory accesses The basic memory ordering model works as follows Reads can be issued speculatively Reads by a single processor are carried out in the program order Most writes by a single processor are carried out in the program order Reads and writes by a single processor to the same address are carried out in the program order Younger reads and older writes by a single processor to different addresses are not carried out in any particular order Writes by a single processor are observed in the same order by other processors Writes by multiple processors are not observed in any particular order by other processors Writes to the same location are totally ordered
152. dstatn je address translation mechanism a ten je popsan p esn Jinak existuj varianty tohoto pro cesoru kter maj zjednodu enou MMU Hezk obr zek je v MIPS32 4K Processor Core Family Software User s Manual MIPS32 4K Manual pdf Memory Management 3 3 Translation Lookaside Buffer Alpha Address Translation Procesor od Compagu z n vodu pro Alpha 21264 Virtu ln adresa 48 nebo 43 bit podle bitu v registru I CTL fyzick adresa 44 bit nejvy bit je 0 pro pam a 1 pro za zen TLB pro instrukce a pro data s round robin alokac ka d 128 bit ma puj 8 KB str nky bu po jedn nebo po skupin 8 64 nebo 512 s 8 bity ID procesu S TLB pracuje takzvan PAL Privileged Architecture Library code co je v podstat privilegovan k d bl zk mikrok du kter je ulo en v norm ln pam ti Zaj mav je e en vyhazov n polo ek z TLB Procesor m registry ITB IA ITB IS DTB IAP DTB IA a DTB IS Z pis do TB IA vyhod z datov nebo instruk n TLB v echny polo ky DTB IAP vyhod v echny polo ky dan ho procesu TB IS vyhod v echny polo ky t kaj c se zapisovan adresy Tenhle hr zn procesor m dokonce i virtu ln registry B n registry se jmenuj RO a R31 ale kdy je program tor pou v p emapuj se na intern registry procesoru tak aby se minimalizoval po et fale n ch write after read a writer after read z vis lost mezi instruk
153. e CALL instruction with out TSS takes only 20 clock cycles and the processor context can be switched more quickly using common instructions Specifically PUSHAD saves all general purpose registers in 11 clock cycles PUSH saves each of the six segment registers in 3 clock cycles PUSHF saves the flags in 4 clock cycles Inversely POPF restores the flags in 9 clock cycles POP restores each of the six segment registers in 3 clock cycles POPAD restores all general purpose registers in 9 clock cycles Additional context switching support mechanism takes care of saving and restoring the state of processor extensions such as FPU Floating Point Unit MMX Multime dia Extensions SIMD Single Instruction Multiple Data These extensions denote 35 Chapter 2 Process Management 36 specialized parts of the processor that are only present in some processor models and only used by some executing processes thus requiring special handling The processor supports the FXSAVE and FXRSTOR instructions which save and restore the state of all the extensions to and from memory This support makes it possible to use the same context switch code regardless of which extensions are present The processor keeps track of whether the extensions context has been switched after the processor context If not an exception is raised whenever an attempt to use the extensions is made making it possible to only switch the extensions context when it is actuall
154. e instructions and data in a cache must be coherent with the original Cache This is a problem when other devices than a processor access memory A cache co klic data herency protocol solves the problem by snooping on the processor bus and either klic data invalidating or updating the cache when an access by other devices than a processor is observed hw ke kazdemu klici mam 128 polozek je malo gt castecne asociativni cache mam cachi vic podle adresy vyberu komparator na vsechny najednou spravnou kes a do te se pak podivam ia oznaceni klice y najee instruction And Data Prefetching 1i hledan data delia vO th nebudu cachovat Byty cacheline dnes byva 64B a cca 128 nil opie PO muze mit stranka 3kB nemuze hardwarove by to bylo strasne slozite musi to byt 2 n tj adresa ma vlastne 3 casti vyberu cache vyberu klic vyberu spravny Byte obvykle vic urovni cache set select key line offset L1 2 3 takty L2 asi 10 taktu RAM treba 200 taktu Advances In Bus Architecture key set select line offset typicky oddelene cache pro data a pro k d pac jsou obvykle na jinejch mistech v pameti l PPN Burst Access vic procesoru gt slozitejsi systemy rizeni aby se cetla The operations of the processor bus were optimized for the common case of transfer a zapisovala spravna data ring g block of data from consecutive addresses Rather than setting an address on the addfess bus for each item of the block the address is
155. e na page table ale rov nou na str nku kter je pak velk 4 MB Bity accessed a dirty nastavuje p slu n m zp sobem procesor pou vaj se pro page replacement algoritmy Bity s pr vy a podobn jsou jasn vic se stejn proberou na n jak m jin m p ed m tu Kdy je polo ka not present v echny ostatn bity jsou user defined Chapter 3 Memory Management No a aby se to pletlo od Pentia Pro je je t Physical Address Extension bit v CR4 kdy se nahod tak jsou directory entry a page entry dlouh 64 bit v CR3 se objev pointer na page directory pointer table a p ekl d se troj rov ov 2 bity z adresy do pointer table 9 bit do directory table 9 bit do page table 12 bit do page nebo 2 bity do pointer table 9 bit do directory table 21 bit do page Fyzick adresa je pak 36 bit Intel IA64 Address Translation Velikosti str nek 4 8 16 64 a 256 KB a 1 4 16 a 256 MB Virtu ln adresa 54 bit 51 bit adresa 3 bity region index fyzick adresa 44 bit ale aplikace vid virtu ln adresu 64 bit 61 bit adresa 3 bity region index Region index ukazuje na jeden z osmi region registr ky 24 bit region je v podstat address space a region in dex je ve virtu ln adrese proto aby procesy mohly koukat do address space jin m proces m P ekl d se pomoc TLB polo ka obsahuje krom adresy a regionu obvykl bity present cacheable accessed
156. e nezept m mu u si d lat co choi zamyk n soubor Aby se omezila velikost seznamu z mk DOS nap klad vy aduje aby odemyk n specifikovalo pouze p esn takov bloky kter byly zam en Tedy nen mo n zam knout velk blok a odemknout kousek z jeho prost edka m se odstran probl my mandatory locki s fragmentac blok skute nej z mek pokus o p stup k zam en mu fajlu wi i 1 cede k xshiskaciaf Example Linux Sharing Operations Unix rozli uje advisory a mandatory locking Od za tku implementovan jsou pouze advisory locks toti z mky kter se projev pouze pokud se na n proces zept To samoz ejm nen p li bezpe n a tak se doplnily je t mandatory locks kter kontroluje kernel Aby mandatory locks neblokovaly ve chv l ch kdy to st vaj c aplikace ne ekaly eklo se e budou automaticky nasazen na soubory s nastaven m group ID bitem a shozen m group execute bitem zamknut souboru pamatuje se v pam ti transakce Mandatory locks um l prvn tu m UNIX System V Nep jemn vlastnost mandatory locks je Ze maj trochu slo it j s mantiku ne ad visory locks a ne v echny syst my se do n v dycky tref Sice existuje specifikace UNIX System V Interface Definition ale tu snad nikdo p esn nedodr uje P kn seznam odchylek je v dokumentaci o zamyk n v Linux kernelu Mandatory locking tak m e zp sobovat deadloc
157. e same data Intel Pentium Pro processors do this se provede cyklus dopredu ne An alternative solution to the collisions is splitting the instructions into micro in structions that are scheduled independently with a smaller probability of collisions saturating counter pamatuju AMD Athlon processors and Intel Pentium Pro processors do this si jestli se spis skace nebo ne An alternative solution to the collisions is mixing instructions from several threads a podle toho skacu nebo ne counter je zvlast pro kazdou cilovou adresu skoku of execution This is attractive especially because instructions in several threads typ ically work on different data Intel Xeon processors do this An alternative solution to the collisions is using previous values of the results This chytrej compiler to vi a podle poctu scitacek a nasobicek apod proc paralelismus a ne radsi rychlejsi procesor protoze to pak moc hreje bimetalovy princip vln se mi to difuze pri zahrivani se urychluje takze se mi procesor rozpusti sam v sobe pri vysoke frekvenci mi proud nebeha po dratech ale jakoby kolem nich tj bud mam vic jader v jednom plasti nebo zmnozim navratove adresy cachuje se is attractive especially because the processor remains simple and a compiler can re jenom registry klidne treba 16 urovni 12 order the instructions as necessary without burdening the programmer MIPS RISC processors do this
158. eaders Linux provides Read Write Locks as a part of the Posix Threads library int pthread rwlock init pthread rwlock t rwlock const pthread rwlockattr t xattr int pthread rwlock destroy pthread rwlock t rwlock int p int p hread rwlock rdlock pthread rwlock t xrwlock hread rwlock wrlock pthread rwlock t xrwlock ct ct int p int p hread rwlock tryrdlock pthread rwlock t xrwlock hread rwlock trywrlock pthread rwlock t xrwlock ct ct int pthread rwlock unlock pthread rwlock t rwlock Windows provide Slim Reader Writer Locks that can be used within a single process VOID InitializeSRWLock PSRWLOCK SRWLock VOID AcquireSRWLockShared PSRWLOCK SRWLock VOID AcquireSRWLockExclusive PSRWLOCK SRWLOCK VOID ReleaseSRWLockShared PSRWLOCK SRWLock VOID ReleaseSRWLockExclusive PSRWLOCK SRWLock Read Copy Update To avoid some of the blocking associated with implementing the Readers And Writ ers Synchronization Problem using read write locks the read copy update interface lets readers operate on past copies of data when updates are done This is achieved by splitting an update into the modification and reclamation phases In the modification phase the writer makes the updated copy of data becomes visible to new readers but the past copy of data is retained for existing readers In between the modification and reclamation phases the writer waits until all readers of the past
159. eads asynchronously and then waits for them to complete before letting an application s startup continue Backup Support Quoted from Mark Russinovich David Solomon Windows XP Kernel Improve ments Create a More Robust Powerful and Scalable OS A new facility in Windows XP called volume shadow copy allows the built in backup utility to record consistent views of all files including open ones The shadow copy driver is a type of driver called a storage filter driver that layers between file system drivers and volume drivers the drivers that present views of the disk sectors that represent a logical drive so that it can see the I O directed at a volume When the backup utility starts a backup operation it directs the volume shadow copy driver Windows System32 Drivers Volsnap sys to create a volume shadow copy for the volumes that include files and directories being recorded The volume shadow copy driver freezes I O to the volumes in question and creates a shadow volume for each For example if a volume s name in the Object Manager namespace is Device HarddiskVolume0 the shadow volume might be named Device HarddiskVolumeShadowCopyN where N is a unique ID Instead of opening files to back up on the original volume the backup utility opens them on the shadow volume A shadow volume represents a point in time view of a volume so whenever the volume shadow copy driver sees a write operation di rected at an original volume
160. ectories the files referenced and the directories referenced the Cache Manager notifies the prefetch component of the Task Scheduler that performs a call to the internal NtQuerySystemInformation system call requesting the trace data Af ter performing post processing on the trace data the Task Scheduler writes it out to a file in the Windows Prefetch folder The file s name is the name of the applica tion to which the trace applies followed by a dash and the hexadecimal representa tion of a hash of the file s path The file has a pf extension so an example would be NOTEPAD EXE A F43252301 PF An exception to the file name rule is the file that stores the boot s trace which is always named NTOSBOOT BOODFA AD PF a convo lution of the hexadecimal compatible word BAADFOOD which programmers often use to represent uninitialized data 131 Chapter 5 File Subsystem 132 When the system boots or an application starts the Cache Manager is called to give it an opportunity to perform prefetching The Cache Manager looks in the prefetch directory to see if a trace file exists for the prefetch scenario in question If it does the Cache Manager calls NTFS to prefetch any MFT references reads in the contents of each of the directories referenced and finally opens each file referenced It then calls the Memory Manager to read in any data and code specified in the trace that s not already in memory The Memory Manager initiates all of the r
161. ecve const char filename char const argv char const envp pid t wait int xstatus pid t waitpid pid t pid int status int options void exit int status Figure 2 15 Posix Process Creation System Calls kapku lepsi nez fork The Posix standard call to create a thread is pthread create which takes the address of the function executed by the thread as its main argument The pthread join call waits for a thread to terminate a thread can terminate for example by returning from the thread function or by calling pthread exit int pthread create pthread t thread pthread attr t xattr void start routine void x pointer na main vlakna void xarg int pthread join string dostanu pointer pthread t th na kod funkce void xxreturn value void pthread exit void xreturn value Figure 2 16 Posix Thread Creation System Calls The Posix standard also allows a thread to associate thread local data with a key and to retrieve thread local data of the current thread given the key int pthread key creat pthread key t xkey void destructor void int pthread setspecific pthread key t key const void value void xpthread_getspecific pthread_key_t key Figure 2 17 Posix Thread Specific Data Calls Example Windows Process And Thread API The Windows API provides the CreateProcess call to create a process two of the main arguments of the call are the name of the pr
162. egisters or locally allocated variables Each distinct type of content typically occupies one or several continuous blocks of memory within the virtual address space The initial placement of these blocks is managed by the loader of the operating system the content of these blocks is man aged by the process owning them The blocks that contain executable code and static data are of little interest from the process memory management point of view as their layout is determined by the com piler and does not change during process execution The blocks that contain stack and heap however change during process execution and merit further attention While the blocks containing the executable code and static data are fixed in size the blocks containing the heap and the stack may need to grow as the process owning them executes The need for growth is difficult to predict during the initial placement of the blocks To avoid restricting the growth by placing either heap or stack too close to other blocks they are typically placed near the opposite ends of the process virtual address space with an empty space between them The heap block is then grown upwards and the stack block downwards as necessary When multiple blocks of memory within the virtual address space need to grow as the process owning them executes the initial placement of the blocks becomes a prob lem This can be partially alleviated by using hardware that supports large virtual addre
163. enMP Thread API The traditional imperative interface to creating and terminating threads can be too cumbersome especially when trying to create applications that use both uniprocessor and multiprocessor platforms efficiently The OpenMP standard proposes extensions to C that allow to create and terminate threads declaratively rather than imperatively The basic tool for creating threads is the parallel directive which states that the encapsulated block is to be executed by multiple threads The for directive sim ilarly states that the encapsulated cycle is to be iterated by multiple threads The sections directive finally states that the encapsulated blocks are to be executed by individual threads More directives are available for declaring thread local data and other features pragma omp parallel private iThreads iMyThread iThreads omp_get_num_threads iMyThread omp_get_thread_num pragma omp parallel for for i 0 i lt MAX i a i 0 pragma omp parallel sections pragma omp section DoOneThing pragma omp section DoAnotherThing Figure 2 23 OpenMP Thread Creation Directives 33 Chapter 2 Process Management Rehearsal At this point you should understand how the abstract concept of a running pro cess maps to the specific things happening inside a computer You should be able to describe how the execution of a process relates to the execution of machine code in structions b
164. ery Process Alone SMP symtric multiprocessor Before delving into how multiple processes are run in parallel and how such pro nekolik rovnocennych procesoresses communicate and synchronize closer look needs to be taken at what exactly a vidi vsechny tu samou pamet process is and how a process executes tj muzu pustit 3 vlakna tehoz procesu na trech procesorech delaji totez Process And Thread Concepts problem rychlost pameti vs rychlost procesoru An obvious function of a computer is executing programs A program is a seguence of instructions that tell the computer what to do When a computer executes a pro gram it keeps track of the position of the currently executing instruction within the program and of the data the instructions of the program use This gives rise to the NUMA Non Uniform Memory concept of a process as an executing program consisting of the program itself and of Access kazdy procesor Ma the execution state that the computer keeps track of svoji pamet do svoji leze 2 rychle do jine pomalu The abstract notions of program and state are backed by concrete entities The pro gram is represented as machine code instructions which are stored as numbers in memory The machine code instructions manipulate the state which is also stored as numbers either in memory or in registers It is often useful to have several processes cooperate A cooperation between pro cesses requires communication which may be
165. es Work Queues and Timers Example Windows Deferred Procedure Calls Windows kernel provides the option of postponing work done while servicing an in terrupt through the deferred procedure call mechanism The interrupt service routine can register a deferred procedure call which gets executed later The decision when to execute a deferred procedure call depends on the importance of the call the depth of the queue and the rate of the interrupts Registers DPC for a device VOID IoInitializeDpcRequest IN PDEVICE OBJECT DeviceObject IN PIO DPC ROUTINE DpcRoutine i Schedules DPC for a device VOID IoRequestDpc IN PDEVICE_OBJECT DeviceObject IN PIRP Trp IN PVOID Context DPC VOID DpcForIsr IN PKDPC Dpc IN struct DEVICE OBJECT xDeviceObject IN struct IRP wIrp IN PVOID Context Example Solaris Pinned Threads Solaris obsluhuje p eru en ve vyhrazen ch vl knech Proto e inicializace vl kna p i p eru en by byla dlouh pou vaj se interrupt threads jako omezen varianta kernel threads P i p eru en se aktivn vl kno ozna jako pinned thread co znamen e se usp ale e nem e b t napl nov no na jin procesor proto e jeho kontext nen pln uschov n Spust se interrupt thread kter obslou p eru en po jeho ukon en se vzbud pinned thread Pokud by interrupt thread zavolal funkci kter ho
166. f a cluster memory management has been implemented for OSF 1 run ning on DEC Alphas connected by ATM The prototype classifies pages on a sys tem node as local or global depending on whether they are accessed by this node or cached for another node The page fault algorithm of the prototype distinguishes two major situations The faulted page for node X is cached as global on another node Y The page can be fetched from network a space for it has to be made on X The faulted page from Y is exchanged for any global page on X If X has no global page a LRU local page is used instead The faulted page for node X is not cached as global on any node The page must be fetched from disk a space for it has to be made in cluster and on X A cluster wide LRU page on another node Y is written to disk Any global page on X is written to Y If X has no global page a LRU local page is used instead The faulted page is read from disk where all pages are stored as a backup in case a node with global pages becomes unreachable To locate a page in the cluster the prototype uses a distributed hash table For each page in the cluster the table contains the location of the page Each node in the cluster manages part of the table To locate a cluster wide LRU page the prototype uses a probabilistic LRU algorithm The lifetime of the cluster is divided into epochs with a maximum epoch duration and a maximum eviction count Each epoch a coordinator is chose
167. f the combination of class type protocol is not supported int bind int sockfd struct sockaddr my addr socklen t addrlen define SOCKADDR COMMON sa prefix 141 Chapter 6 Network Subsystem sa family t sa_prefix family struct sockaddr in SOCKADDR COMMON sin in port t sin port struct in addr sin addr unsigned char sin zero sizeof struct sockaddr SOCKADDR COMMON SIZE sizeof in port t sizeof struct in addr l struct sockaddr in6 SOCKADDR COMMON sin6_ in port t sin6 port uint32 t sin6 flowinfo struct in6 addr sin addr uint32 t sin6 scope id is API se tv jakoby se The bind call binds the socket to a given local address The binding is typically nec v dycky navazovalo Spojenlessary to tell the socket what local address to listen on for incoming connections int listen int sockfd int backlog jsem ten kdo pfij m The 1isten call tells the socket to listen for incoming connections and sets the length vol n kdy n co p ijde of the incoming connection queue tak kernel otevfe spojen int accept int sockfd struct sockaddr addr socklen t xaddrlen The accept call accepts an incoming connection on a listening socket that is blokuj c pokud n co SOCK STREAM SOCK RDM SOCK SEOPACKET The function returns a new socket and p ijde dostanu handle an address that the new socket is connected to and keeps the original socket na novej
168. f the operating system features MACH is a prominent example of a microkernel that has been used in contemporary operating systems including the NextStep and OpenStep systems and notably OS X Most research operating systems also qualify as microkernel operating systems References 1 The Mach Operating System http www cs cmu edu afs cs cmu edu project mach public wwv 2 Andrew Tannenbaum Linus Torvalds Debate On Linux http www oreilly com catalog opensources book appa html Virtualized Systems Attempts to simplify maintenance and improve utilization of operating systems that host multiple independent applications have lead to the idea of running multiple op erating systems on the same computer Similar to the manner in which the operating system kernel provides an isolated environment to each hosted application virtual ized systems introduce a hypervisor that provides an isolated environment to each hosted operating system Hypervisors can be introduced into the system architecture in different ways A native hypervisor runs on bare hardware with the hosted operating systems residing above the hypervisor in the system structure This makes it possible to 18 Chapter 1 Introduction implement an efficient hypervisor paying the price of maintaining a hardware spe cific implementation e A hosted hypervisor partially bypasses the need for a hardware specific imple mentation by running on top of another ope
169. fer Explain the role of the operating system in this process Hint Do not concentrate on the addresses alone The address translation process also reads and writes some flags in the entries of the Translation Lookaside Buffer A thorough answer should also explain the relationship between the widths of the address fields and the sizes of the address translation structures 5 At the level of individual address bits and entry flags describe the process of obtaining physical address from virtual address on a processor that supports multilevel page tables Explain the role of an operating system in this process Hint Do not concentrate on the addresses alone The address translation process also reads and writes some flags in the entries of the multilevel page tables A thorough answer should also explain the relationship between the widths of the address fields and the sizes of the address translation structures 6 Explain the relation between the page size and the size of the information describing the mapping of virtual addresses to physical memory and list the advantages and disadvantages associated with using smaller and larger page sizes Chapter 3 Memory Management 7 List the advantages and disadvantages of using multilevel page table as a data structure for storing the mapping of virtual to physical memory 8 List the advantages and disadvantages of inverse page table as a data structure for storing the mapping of virtual t
170. for initialization and termination symtab and dynsym sections that contain the symbol tables for static and dy namic linking Symbol tables list symbols defined by the program A symbol is defined by its name value size and type strtab and dynstr sections that contain the string tables for static and dynamic linking String tables list all strings used in the file A string is referred to using its index in the table rather than quoted e rel sections that contain the relocation information for other sections Relocations are defined by their position size and type objdump h bin bash Sections Idx Name Size VMA LMA File off Algn 0 interp 00000013 08047134 08047134 00000134 2xx0 CONTENTS ALLOC LOAD READONLY DATA 12 text 0006834c 0805b4e0 0805b4e0 000144e0 2xx4 CONTENTS ALLOC LOAD READONLY CODE 21 got 00000004 080d858c 080d858c 0009058c 2xx2 CONTENTS ALLOC LOAD DATA 23 data 000053b4 080d8880 08048880 00090880 2xx5 CONTENTS ALLOC LOAD DATA 25 Chapter 2 Process Management 26 25 bss 00004 ALLOC 960 080ddc5c Figure 2 6 ELF Sections Example objdump R bin bash DYNA OFFSET 00722640 00722674 080d858c 080ddc40 080ddc44 080d859c 080d85a0 TYPE R_386_32 R_386_JU R_386_GLOB_DAT R_386_COPY R_386_COPY R_386_JU R_386_JU P_SLO P_SLO P_SLO IC RELOCATION RECORDS T
171. free algorithm Describe the interface of a lock and the sematics of its methods 20 Describe the interface of a read write lock and the sematics of its methods 21 Explain when a lock is a spin lock 77 Chapter 2 Process Management Notes 78 22 Explain when a lock is a recursive lock 23 Explain why Windows offer Mutex and CriticalSection as two implemen tations of a lock 24 Implement a solution for the mutual exclusion synchronization problem using locks 25 Describe the interface of a semaphore and the sematics of its methods 26 Implement a solution for the producer and consumer synchronization problem over a cyclic buffer using semaphores 27 Describe the interface of a condition variable and the sematics of its methods 28 Explain what is a monitor as a synchronization tool and what methods it pro vides Exercises 19 99 VOY uou C03 10 e N N NPR RRP RP RP RP RP RP Aa Ne OO WON DTD G AK WN KY c 1 Implement a spin lock and then a recursive lock using the spin lock and the Sleep and Wake functions as well as suitable functions of your choice for man aging lists and other details all without any guarantees as to parallelism Ex plain how this implementation works on a multiprocessor hardware Still a sketch Understanding is essential Understanding is recommended Understanding is recommended Just a curiosity Understanding is recommended Just a curiosity Just a curiosity
172. g SMTP port 80 packets belong to class 1 3 which is this queuing discipline Together the filter tells Linux to first send ICMP SSH DNS and outgoing web replies to pppO If there are no such packets the filter tells Linux to send any packets except outgoing SMTP If there are no such packets the filter tells Linux to send outgoing SMTP with a bandwidth limit Rehearsal Questions Fr Vysv tlete pro p i implementaci p stupu k s ti v opera n m syst mu m e kop rov n p en en ch dat b t probl m Zhodno te m ru tohoto probl mu a uve te jak m zp sobem jej lze odstranit N Vysv tlete roli filtrov n paket v opera n m syst mu Uve te p klady krit ri podle kter ch mohou b t pakety filtrov ny a p klady akc kter mohou filtry s pakety vykon vat a Vysv tlete roli pl nova e paket v opera n m syst mu A Popi te Token Bucket algoritmus pro pl nov n paket a vysv tlete co je jeho c lem O1 Popi te Stochastic Fair Queuing algoritmus pro pl nov n paket a vysv tlete co je jeho c lem 6 Popi te Class Based Queuing algoritmus pro pl nov n paket a vysv tlete co je jeho c lem 7 Popiste Random Early Detection algoritmus pro pl nov n paket a vysv tlete co je jeho c lem 150 Chapter 6 Network Subsystem Network Subsystem Applications ely sharing File Systems remote access reliability
173. ge passing with direct addressing 9 Propose a blocking interface through which a process can send a message to another process Use asynchronous message passing with indirect addressing Chapter 2 Process Management Exercises 1 Design a process communication mechanism based on message passing suit able for a microkernel operating system Describe the interface used by a pro cess to communicate with other processes Include specialized support for very short messages that would be communicated as quickly as possible and spe cialized support for very long messages that would be communicated as effi ciently as possible Compare the overhead introduced by the process commu nication mechanism with the overhead of a local procedure call Process Synchronization When concurrently executing processes communicate among themselves or use shared resources they can obviously influence each other This influence can lead to errors that only exhibit themselves in certain scenarios of concurrent execution Such errors are called race conditions Bernstein conditions from 1966 state that given sets of inputs and sets of outputs for concurrently executing processes race conditions can only occur when either sets of outputs of two processes overlap or a set of inputs of a process overlaps with a set of outputs of other processes Race conditions are notoriously difficult to discover Process synchronization pro vides means of avoiding
174. ges in a cir cular list walked by two clock hands Both clock hands advance whenever page replacement is needed a page that the first hand points to is replaced if it is not accessed the page that the second hand points to is marged as not accessed The angle between the two hands determines the aggressivnes of the algorithm Least Recently Used replaces the page that has been accessed longest time ago Since the information on when a page has been accessed last is rarely available ap proximations are used The algorithm exhibits very inappropriate behavior when large structures are traversed Least Frequently Used replaces the page that has been accessed least frequently lately Since the information on how frequently a page has been accessed lately is rarely available approximations are used The algorithm exhibits very inappropri ate behavior when usage frequency changes U algoritm kter nezohled uj pou v n pam ti procesem m e doj t k Beladyho anom lii toti v ur it ch situac ch se p id n m frames zv po et v padk str nek P kladem m e b t t eba algoritmus FIFO 012301401234 ve t ech a ty ech str nk ch je pot eba po tat u na t n prvn ch str nek jako v padky 83 Chapter 3 Memory Management ke strankovani m navic segmentaci CS code segment SS stack segment a dalSi v r mci jednoho procesu daj se relokovat V syst mu s v ce procesy to pak vypa
175. gged tedy p ed ka dou polo kou je naps no co je za Tag obsahuje out of line flag velikost dat v po tu polo ek velikost polo ky v bitech a kone n typ polo ky ten je ze standardn ch typ plus handler portu Kernel interpretuje p ed n handleru portu jako p ed n p slu n capability Chapter 2 Process Management Pro odesl n a p jem zpr v slou vol n mach msg to m jako argument adresu hlavitky zpr vy flags s bity jako expect reply enable timeout enable delivery notifi cation velikost zpr vy velikost m sta na odpov d plus porty Remote Procedure Call a ae TE E A H eee Vol n slu by serveru pomoc zpr vy z klienta m obvykle charakter vol n proce 9700 pro server f lt 1enladury a tak se k d pro manipulaci se zpr vami na klientovi a serveru odd luje a automaticky generuje n pad zhruba kolem roku 1984 ransparency speed Kdyz se vol norm ln procedura uloZ se na stack parametry procedura si je errors rozb t se m e vyzvedne a n co ud l vr t v sledky Kdy se vol slu ba na serveru parametry komunikace st e en ge ulo do zpr vy server ji p ijme a n co ud l vr t v sledky RPC ud l lok ln Grete Jedna SUME spadn proceduru kter vezme parametry ze stacku ulozi je do zpr vy zavol server zbytek Lo M ds p ijme v sledky a vr t je volaj c mu A aby i program to i serveru m li pohodu oby ud l
176. gnal A0 19 mote wires and chg s value a tes Bs ese en M MENR nvactive when low D0 15 data bus wires in disconnected state 1 2 3 4 5 6 1 Figure 1 4 Timing Diagram Example What all operations of the processor bus have in common is the general order of steps which typically starts with the processor setting an address on the address bus and a signal on the control bus that indicates presence of a valid address and proceeds with the transfer of data Any device connected to the processor bus is responsible for recognizing its address usually through an address decoder that sends the chip select signal when the address of the device is recognized Example ISA Bus The ISA Industry Standard Architecture bus is synchronized by a clock signal tick ing with the frequency of 8 10 MHz In the first clock tick of a bus cycle the bus mas ter which is typically the processor sets the address on the address bus and pulses the BALE Bus Address Latch Enable signal to indicate that the address is valid In a read bus cycle the bus master activates one of the MEMR Memory Read or IOR Input Output Read signals to indicate either reading from memory or reading from an input device The bus master waits the next four cycles for the memory or the device to recognize its address and set the data on the data bus Edited by Foxit Reader Copyright C by Foxit Corporation 2005 2009 For Evaluation Only Chapter 1 Introduction vidim ze sber
177. gnal value int si int Integer value sent with signal void si ptr Pointer value sent with signal void x sl addr Associated memory address int si fd Associated file descriptor int sigaction int signum const struct sigaction xact struct sigaction xoldact e sa handler signal handler with limited arguments sa sigaction signal handler with complete arguments sa mask what other signals to mask while in signal handler SA RESETHAND restore default signal handler after one signal SA NODEFER allow recursive invocation of this signal handler SA ONSTACK use alternate stack for this signal handler Figure 2 28 Signal Handler Registration System Call Due to the ability of signals to interrupt processes at arbitrary times the actions that can be taken inside a signal handler are severely limited Access to shared variables and system calls are not safe in general This can be somewhat alleviated by masking signals int sigprocmask int how const sigset t xset sigset t xoset int pthread sigmask int how const sigset t x xset sigset t xoset e SIG BLOCK add blocking to signals that are not yet blocked e SIG UNBLOCK remove blocking from signals that are blocked SIG SETMASK replace existing mask Figure 2 29 Signal Masking System Call Signals are usually reliable even though unreliable signals did exist Signals are de livered asynchronously usually on return from system call Multiple
178. gs describing the future use include POSIX MADV NORMAL for no predictable pattern POSIX MADV SEQUENTIAL and POSIX MADV RANDOM for specific patterns and POSIX MADV WILLNEED and POSIX MADV DONTNEED to determine whether data will be accessed in the near future 121 Chapter 5 File Subsystem sendfile poslat fajn z disku rovnou na s tovej Socket tj nekop rovat do userspace a pak teprv do sitovy karty mkdir smit linky symlink jako win z stupce jeden TEN soubor ostatn odkaz pfes jm no muzou bejt relativni muzeou bejt mezi vic FS hardlink ve FS kde Example Windows Mapped File Operations HANDLE CreateFileMapping HANDLE hFile PSECURITY ATTRIBUTES lpFileMappingAttributes DWORD flProtect DWORD dwMaximumSizeHigh DWORD dwMaximumSizeLow LPCTSTR lpName Toto vol n vytvo abstraktn objekt reprezentuj c mapovany soubor je t ale nic nenamapuje Flagy PAGE READONLY PAGE READWRITE PAGE READCOPY v znam z ejm Flag SEC COMMIT vy aduje p id len fyzick ho prostoru v pam ti i na disku Flag SEC IMAGE upozor uje na mapov n spustiteln ho souboru Flag SEC NOCACHE v znam z ejm Flag SEC RESERVE vy aduje rezervaci bez p id len fyzick ho prostoru v pam ti i na disku Handle souboru m e b t OxFFFFFFFF pak musi b t uvedena i velikost mapovan ho bloku syst m vyhrad po adovan prostor podobn jako p i odkl d n pam ti p i s
179. he queue of requests can be implemented either by the v praxi n jak modifikace computer in software or by the disk in hardware The computer typically only con T siders the current track that the disk head is on because it does not change without S RUM DNO SEL the computer commanding the disk to do so as opposed to the current sector that n kolik p kaz najendou the disk head moves over tj muzou Cist chytrejc Most versions of the ATA interface do not support issuing a new request to the disk before the previous request is completed and therefore cannot implement any strat egy to process the queue of requests On the contrary most versions of the SCSI and 112 Chapter 4 Device Management the SATA interfaces do support issuing a new request to the disk before the previous request is completed Example SATA Native Command Queuing A SATA disk uses Native Command Queuing as the mechanism used to maintain the queue of requests The mechanism is coupled with First Party Direct Memory Access which allows the drive to instruct the controller to set up Direct Memory Access for particular part of particular request Example Linux Request Queuing Linux 2 2 18 drivers block ll rw blk c Linux sice ve zdroj c ch vytrvale pou v n zev Elevator ale ve skute nosti ad p choz po adavky podle line rn ho sla sek tor s v jimkou po adavk kter p li dlouho ekaj 256 p esko en pro
180. he standardized registers contain vendor ID and device ID subsystem vendor ID and subsystem device ID flags memory address ranges port address ranges interrupts etc Devices are addressed using domains 0 OFFFFh busses 0 0FFh slots 0 1Fh func tions 0 7 A domain typically addresses a host bridge A bus typically addresses a bus controller a slot typically addresses a device gt lspci t 0000 00 00 0 01 0 0000 01 00 0 02 0 0000 02 03 1f 0 0000 03 00 0 107 Chapter 4 Device Management 108 le 0 0000 04 0b 0 0c 0 0d 0 HTEO AIT Fais 2 iat ES r l1f 4 Ved The example shows a computer with one domain which has three bridges from bus 0 to busses 1 2 and 4 one bridge from bus 2 to bus 3 one device with six functions on bus 0 one device on bus 1 one device on bus 3 three devices on bus 4 lspci 00 00 0 Host bridge Intel Corp 82860 860 Wombat Chipset Host Bridge MCH rev 04 00 01 0 PCI bridge Intel Corp 82850 850 Tehama Chipset AGP Bridge rev 04 00 02 0 PCI bridge Intel Corp 82860 860 Wombat Chipset AGP Bridge rev 04 00 1e 0 PCI bridge Intel Corp 82801 PCI Bridge rev 04 00 1f 0 ISA bridge Intel Corp 82801BA ISA Bridge LPC rev 04 00 1f 1 IDE interface Intel Corp 82801BA IDE U100 rev 04 00 1f 2 USB Controller Intel Corp 82801BA BAM USB Hub 41 rev 04 00 1f 3 SMBus I
181. he virtual address space where it can collide with other blocks that change during process execution Stack Addressing The use of stack for procedure arguments and locally allocated variables relies on the fact that the arguments and the variables reside in a constant position relative to the top of the stack The processor typically allows addressing data relative to the top of the stack making it possible to use the same machine code instructions to access the procedure arguments and the locally allocated variables regardless on their absolute addresses in the virtual address space as long as their addresses relative to the top of the stack do not change Example Relative Addressing On Stack Of Intel IA32 Processors The Intel IA32 processors have a base pointer register called EBP The EBP reg ister is typically set to the value of the ESP register at the beginning of a proce dure and used to address the procedure arguments and locally allocated variables throughout the procedure Thus the arguments are located at positive offsets from the EBP register while the variables are located at negative offsets from the EBP register void SomeProcedure int anArgument int aVariable aVariable anArgument SomeProcedure PUSH EBP save original value of EBP on stack OV EBP ESP Store top of stack address in EBP SUB ESP 4 allocate space for aVariable
182. he writing and reading threads Ordering is transitive Accesses to all basic types besides long and double are atomic in Java References 1 James Gosling Bill Joy Guy Steele Gilad Bracha The Java Language Specifi cation aktivn ek n ek m si s m jsem stale pl nov n a dokud ek m tak akor t plytv m asem pasivn ek n ekaj c vl kno nen pl nov n sp o probuzen do k n se mus rozhodnout n kdo jin pas nejde pou t nap p i ek n na periferii kter neum pos lat p eru en pas znamen usp n a vzbuzen p i kr tk m ek n je to neefektivn na 1 proc pc irelevan na multiproc m e b t Passive Waiting Passive waiting is an approach to process synchronization where a process that waits for a condition does so by sleeping to be woken by another process that has either caused or observed the condition to become true Consider the solution to the mutual exclusion problem using the AtomicSwap op eration A naive extension of the solution to support passive waiting uses the Sleep and Wake operations to remove and add a process from and to the ready gueue 65 Chapter 2 Process Management naivn nefunk n e en zkusim si zamknoutk krit sekci t eba T amp S kdy se nepovede hodim se do fronty a jdu sp t kdy se povede nebo a m n kdo vzbud vykon m kritickou sekci pokud n kdo ek ve fro
183. hile it is being written n 66 Dining Philosophers potrebuju exkluzivni pristup k necemu treba tiskarne Dining Philosophers models a scenario where several philosophers alternatively think and dine at a round table The table contains as many plates and forks as there are philosophers A philosopher needs to pick two forks adjacent to his plate to dine The problem approximates a situation where several processes compete for an exclu sive use of resources with the possibility of a deadlock Sleeping Barber server thread pool Sleeping Barber models a scenario where several customers visit a barber in a barber shop The shop contains a limited number of seats for waiting customers The barber serves customers one at a time or sleeps when there are no customers A customer enters the shop and either wakes the barber to be served immediately or waits in a seat to be served later or leaves when there are no free seats The problem approximates a situation where several processes queue to get served by another process Means For Synchronization The most trivial example of process synchronization is exclusive execution which prevents all but one process from executing Technical means of achieving exclusive execution include disabling processor interrupts and raising process priority Disabling interrupts yields exclusive execution in an environment that uses inter rupts to schedule multiple processes on a single processor simply
184. hodnou str nku jako m tko pro hodnocen be n ch algoritm kter se vesm s sna vyrobit n jakou smysluplnou predikci chov n programu podle locality of ref erence n hodn a optim ln v b r str nky p edstavuj limitn situace pro nulovou a dokonalou predikci aplikace Tady vypadaj moc p kn ty grafy co vy ly v ACM OS Review 10 97 je na nich vid t chov n r zn ch druh aplikac p i p stupu k pam ti Tedy mo n ne moc p kn ale na za tku by asi nebylo patn je zm nit First In First Out replaces the page that has been replaced longest time ago Not Recently Used presumes that a read access to a page sets the accessed bit asso ciated with the page and that a write access to a page sets the dirty bit associated with the page The operating system periodically resets the accessed bit When a page is to be replaced pages that are neither accessed nor dirty are replaced first pages that are dirty but not accessed are replaced second pages that are accessed but not dirty are replaced third and pages that are accessed and dirty are replaced last e One Hand Clock is a variant of Not Recently Used that arranges pages in a circular list walked by a clock hand The clock hand advances whenever page replacement is needed a page that the hand points to is replaced if it is not accessed and marked as not accessed otherwise Two Hand Clock is a variant of Not Recently Used that arranges pa
185. i zasobniku to si typicky pamatuje operacni system pro kazdy vlakno prectu umisteni zasobniku noveho vlakna odnekud Z pameti z toho zasobniku nactu kontext noveho vlakna obsah registru Mikrokernel pravidlo co jde vyndat vyndam nevyhody rezie prepinani kontextu nebo aspon urovne privilegii apod prepnuti prav jsou stovky taktu to je blby 40 programovat fakt oddelene je tezky pracny viz site a vrstvy ISO OSI VICE PROCESU Edited by Foxit Reader Copyright C by Foxit Corporation 2005 2009 MIKROKERNEL For Evaluation Only oddeleni procesu kvuli bezpecnosti ny kvuli robustnosti v echny AP a pomoc APIC je resetuje startup adresa se zad bu p es CMOS kdyz spadne word BIOS hack nebo p es APIC nemel by spadnout Funkci Intel MPS v et procesor doru ov n p eru en atd v sou asn dob prohlizec nahrazuj sti specifikace ACPI Jak u jednou procesory b probl m je jenom s p eru en m Syst m si adi e vzajemne ovlivneni p eru en ka d ho procesoru nastav jak pot ebuje pomoc APIC je mo n doru it na stavu procesu tak Inter Processor Interrupts IPI se hod nap klad na TLB nebo PTE invalidation tj registry ok pamet kazdy ma vlastni pametovy prostor strankovani bezici program nesmi menit strank tabulky cache mapovani pameti dva nebo vic modu behu procesoru user mode privileged mode v use
186. i pam ov mapovan m soubor ale hlavn kv li cache Aby se zabr nilo s mantick m chyb m p i p stupu k soubor m sou asn pomoc read a write a pomoc mmap implementuje se read a write intern jako mmap do kernel bufferu Kdy se takov buffer uvoln str nka se sice eviduje jako voln ale z stane informace o tom kter vnode a offset obsahovala tak e a do t doby ne ji n kdo pou ije je sou st cache a d se znovu pou t Kv li tomu se udr uje hash str nek podle vnode a offsetu Page reclaim algoritmus je hodinov se dv mi ru i kami spou t se t m v c m v c doch z pam vzd lenost a rychlost ob h n se nastavuje v konfiguraci kernelu p i bootu Zaj mav efekt nastal kdy se integroval spr vce soubor a spr vce pam ti a za aly b t rychl disky toti kdy se zapln cache a hled se ob pro vyhozen ru i ky za nou ob hat p li rychle p i dne n ch disc ch des tky v p pad diskov ch pol stovky MB za vte inu co znamen Ze se i bity o p stupu nuluj v dech vte in a tedy za nou p li agresivn vyhazovat str nky program proto e ty je prost za Chapter 3 Memory Management tak kr tkou dobu nestihnou pou t Solaris 7 tak up ednost uje cache p ed str nkami program dokud mu neza ne doch zet pam Solaris 8 u d l n co pln jin ho o em nem m informace Jako jin zaj mav funkce existuj
187. ich tipnout and parity disk RAID 5 uses block striping and parity striping RAID 6 uses block e disk brzo odejde striping and double parity striping The levels were initially defined in a paper of au SMART surface monitoring thors from IBM but vendors tend to tweak levels as they see fit RAID 2 is not used blablabla technology RAID 3 is rare RAID 5 is freguent RAID 0 1 and RAID 1 0 or RAID 10 combine jak m disk pr b n men RAID 0 and RAID 1 probl my kter nepozn m norm ln t eba e to co jsem od n j dostal se mu Example SMART Diagnostics povedlo p e st a napot et um to testy Linux 2 6 10 smartctl a dev hda prints all device information Attributes have raw value and normalized value raw value is usually but not necessarily human read atributy able normalized value is 1 254 threshold 0 255 is associated with normalized value old age jen jak to mam Worst lifetime value is kept If value is less or equal to threshold then the attribute dlouho failed Attributes are of two types pre failure and old age Failed pre failure attribute pre fail to znamen e signals imminent failure Failed old age attribute signals end of life Attributes are to fakt za n odch zet numbered and some numbers are standardized im men hodnota tim h partitioning t eba aby OS kdy zni sysdisk aby d se pou t RAID nezni il i data d v vy rychlost PENNY ochrana p ed zapln
188. identifiers Pokud se p ed v n jak hodnota kter m v znam pro ker nel klienta nemus u znamenat tot u serveru Typicky handlery soubor sla port a podobn Zm nit konverzi p i pos l n zpr v u Machu Dal probl m je error handling S t m moc chytristiky ud lat nejde Mo n varianty selh n jsou zn m je prost nutn po tat s t m e RPC m e selhat je t p r jin mi zp soby ne norm ln call a o et it to v programu P i implementaci RPC je d le it efektivita stoj na n cel syst m Kritick cesta p i RPC call stub prepare msg buffer marshall params fill headers call kernel send context switch copy stub buffer to kernel space fill headers set up interface receive interrupt check packet find addressee copy buffer to addressee wake up addressee context switch unmarshall params prepare stack call server Co trv dlouho marshalling buffer copying p i patn implementaci header fill ing e se obvykle mapov n m a scatter and gather network hardware efektivn jen pro del zpr vy Stuby a skeletony je pot eba automaticky generovat Jako vstup gener toru slou definice hlavi ek procedur ty jazykov ale nejsou zpravidla dostate n informativn tak e se definuje n jak jazyk pro popis hlavi ek procedur IDL podle kter ho se 57 Chapter 2 Process Management 58 pak jednak generuji stuby a skeletony a jednak hlavi
189. identifying garbage in garbage collecting algorithms and to discuss the principal differences between process memory management that relies on explicit garbage disposal and implicit garbage collection You should understand the working of basic reference counting and reference tracing algorithms and see how the typical heap usage patterns lead to optimizations of the algorithms Based on your knowledge of how process memory management is used you should be able to design an intelligent API that not only allows to allocate and free blocks of memory but also helps to debug common errors in allocating and freeing memory Questions 1 Identify where the following function relies on the virtual address space to store executable code static data heap and stack void xSafeAlloc size t iSize void pResult malloc iSize if pResult NULL N ies O1 G 7 8 9 10 11 12 13 14 15 16 17 18 Chapter 3 Memory Management printf Failed to allocate z bytes n iSize exit ENOMEM return pResult List four distinct types of content that reside in a virtual address space of a typical process Explain the advantages of segmentation over flat virtual memory address space Explain why random placement of allocated blocks in the virtual address space of a process can contribute to improved security Explain what the processor stack is used for with
190. ies for numerically lower priority val se uspim a planovac me zase ues are often used to meant semantically higher priorities In this text the semantic vzbudi az na me prijde rada meaning of priorities is used bezne az do r 2000 pro kod bezici v kernelu Nevyhody gt a 35 kdyz je v procesu vecnak tak Ey namie Eseries mi zamrzne celej komp Processes are assigned initial priorities actual priorities are calculated from the initial priorities and the times recently spent executing the process with the highest actual priority is scheduled Round Robin w mlutiple FIFO queues pro n priorit n front vzdy vybiram prvni vlakno z nejvyssi fronty a pri davani do fronty dam prior nove vlakno n TM vlakno se dobrovolne Shortest Job First uspalo zustava stejna i Pokus o dobrou podporu d vek spou t se ta co trv nejkrat dobu t m se dos hne nucene uspani i 1 pr m rn nejkrat ho asu do ukon en d vky proto e pokud ek v ce d vek jejich blokovani i 1 protoze asi asy ukon en se postupn p i taj a tak je dob e za t od nejkrat ch as Toto se d treba pracovalo s diskem a z sti pou ti na interaktivn procesy v situaci kdy v ce u ivatel na v ce termin lech asi na nej ceka i nekdo dalsi ek na odezvu spust se ten proces kter mu bude doru en odezvy trvat nejm n tak abych ten disk dostatecne dlouho t m se dos hne minim ln average response time Nep je
191. info u32 d blocks Jx Z j u32 i flags TP Quenty adi5eni JAER 5 i block EXTO N BLOCKS bloku v ce soubory DE i u32 i version nepouZ v se je to slo it 32 idile amie fs tem u32 i dir acl adreBATe u32 i faddr Pee Hera u8 l i frag jm no max 255 u8 l i fsize islo inode typ souboru define EXT2 DIR BLOCKS 12 p vodn nijak net d n define EXT2 IND BLOCK EXT2 DIR BI vymaz n z prost edka define EXT2 DIND BLOCK EXT2 IND BLOCK je pekklo define EXT2 TIND BLOCK EXT2 DIND BLOCK 1 nov mu e soubory m t define EXT2 N BLOCKS v B stromu define EXT2 SECRM FL 0x00000001 ext3 urn lov n define EXT2 SYNC FL 0x00000008 define EXT2 IMMUTABLE FL 0x00000010 define EXT2 APPEND FL 0x00000020 u leta existuj lep FS d lo by se nahradit Directories are stored either unsorted or with hash tree indices ale uz leta ty ext fungujou nemaj tak z sadn nevyhody aby st lo struct ext2_dir_entry_2 ext4 u32 inode Inode number extents pamatuju si ul6 rec len zacatek a velikost kazd u8 name len souvisl velikosti blok u8 file type File type zase kdy mal tak char name EXT2 NAME LEN v inode pak odkaz na B st om H strom prealokac pfedem s define EXT2 NAME LEN 255 alokuje v tSi prostor na disku define EXT2_FT_REG_FIL 1 define EXT2_FT_DIR 2 define EXT2_FT_CHRDEV 3 define EXT2_FT_BLKDEV 4 define EXT2_FT_SYMLIN
192. ing system and the requirements of the applications You should be able to explain the working of common scheduling algorithms in the light of these rules Chapter 2 Process Management Questions Fr m oa A G 7 NO 11 12 13 14 15 16 Explain how multiple processes can run concurrently on a single processor hardware Explain what is a thread and what is the relationship between threads and processes Explain what happens when the thread context is switched Explain what happens when the process context is switched Using a step by step description of a context switch show how an implemen tation of threads in the user space and an implementation of threads in the kernel space differ in the way the context is switched Na popisu p epnut kontextu krok po kroku uka te jak se implementace vl ken v u ivatelsk m prostoru a implementace vl ken v prostoru j dra li ve zp sobu p ep n n kontextu Explain how the requirements of interactive and batch processes on the process scheduling can contradict each other List the typical reguirements of an interactive process on the process schedul ing List the typical requirements of a batch process on the process scheduling List the typical requirements of a realtime process on the process scheduling 10 Explain the difference between soft and hard realtime scheduling reguirements Define typical phases of a p
193. ing when they do not wait before returning When a non blocking procedure needs to wait it can replace blocking by polling or callbacks Message passing can use symmetrical asymmetrical and indirect addressing The symmetrical addressing requires both the sender and the receiver to specify the ad dress of the other party The asymmetrical addressing requires the sender to specify the address of the receiver The indirect addressing requires both the sender and the receiver to specify an address of the same message queue The message sent from the sender to the receiver can be anything from a single in teger number through an unformatted stream of bytes to a formatted structure of records 53 Chapter 2 Process Management 54 Example Posix Signals Signals are messages that can be delivered to processes or threads A signal is iden tified by a number with numbers from 1 to 31 allocated to standard signals with predefined meaning and numbers from SIGRTMIN to SIGRTMAX allocated to real time signals sign l zpr va o jednom sle Name Number Meaning SIGHUP 1 Controlling terminal closed SIGINT 2 Reguest for interrupt sent from keyboard SIGOUIT 3 Reguest for guit sent from keyboard SIGILL 4 Illegal instruction SIGTRAP 5 Breakpoint instruction SIGABRT 6 Reguest for abort SIGBUS 7 Illegal bus cycle SIGFPE 8 Floating point exception SIGKILL 9 Request
194. instances of some signals may be queued 55 Chapter 2 Process Management 56 int kill pid t pid int sig int pthread kill pthread t thread int sig union sigval int sival_int void xsival ptr int sigqueue pid t pid int sig const union sigval value Figure 2 30 Signal Send System Call Example System V Message Passing Jako prvn p klad message passing lze asi uv st System V message passing API Zpr va tam vypad jednodu e na za tku je long message type za n m n sleduje pole bajt jejich d lka se ud v jako dal argument p i vol n API Vol n jsou pak trivi ln int msgsnd int que message xmsg int len int flags int msgrcv int que message xmsg int len int type int flags P i odes l n zpr vy lze specifikovat zda se m p i zapln n bufferu zablokovat vola j c proces nebo vr tit chyba jako drobn detail i zablokovan mu volaj c mu procesu se m e vr tit chyba t eba pokud se zru message gueue P i p jmu zpr vy se ud v maxim ln velikost bufferu flagy kaj zda se v t zpr vy maj o znout nebo zda se m vr tit chyba Typ zpr vy m e b t bu 0 co znamen any message nebo konkr tn typ pak se ve flaz ch d ci zda se vr t prvn zpr va uveden ho nebo jin ho ne uveden ho typu Z porn argument pak znamen p ijmout zpr vu s nejni m typem men m ne je absolutn hodnota argume
195. inux Man Pages 2 Morgan A G Linux PAM Application Developer s Guide 3 Morgan A G Linux PAM System Administrator s Guide Kerberos Example Probl mem s centr ln autoritou pro ov en identity je mo nost fal ovat jej v sledky To hroz zejm na v distribuovan ch syst mech kde je sna zachytit komunikaci mezi aplikacemi a touto autoritou Aby se o et il tento probl m pou vaj se bezpe nostn protokoly na z klad n vrhu Needhama a Schroedera podle toho e mi to chod jejich typick m p edstavitelem je Kerberos z MIT RFC 1510 za ifrovan spr vnym e je to kl em vim autentick 158 Princip zm n n ho protokolu je jednoduch P edpokl d se pou it symetrick kryptografie a existence autority kter m k dispozici tajn kl e v ech astn k protokolu Pokud pak klient chce komunikovat se serverem pou ije n sleduj c sekvenci vA Klient po le autorit dost o spojen se serverem ve kter uvede sv jm no jm no serveru a unik tn slo U1 Autorita ov pr vo klienta spojit se se serverem Autorita po le klientovi zpr vu za ifrovanou jeho tajn m kl em KC ve kter uvede unik tn slo Ul p edt m zaslan klientem n hodn kl KR pro komunikaci se serverem a tiket T co je je t jednou kl KR a jm no klienta v e za ifrovan tajn m kl em serveru KS My Klient ov pravost autority t m
196. iod N sleduj procesy kter m nehroz pro vihnut window constraint z t ch se berou nejprve ty kter maj views p sn j window constraints Example Linux Hierarchical Start Time Fair Queuing Scheduler Hierarchical start time fair queuing Procesy jsou listy stromov hierarchie kazdy uzel m v hu kter k jakou st z kapacity nad azen ho uzlu vyu v V znam vah se m n podle po tu konkuruj c ch uzl sou et vah se pova uje za celou kapac itu nad azen ho uzlu Ka d mu procesu se v okam iku dosti o kvantum p id l start timestamp kter je maximem z jeho finish timestampu a virtu ln ho asu Fin ish timestamp je as od posledn ho start timestampu zv t en o L W po vykon n kvanta d lky L a v hy W Virtu ln as je start timestamp pr v b c ho procesu p padn nejvy finish timestamp pokud nikdo neb Spust se v dy proces s ne VA jni m start timestampem Jako eggzample t i procesy A B C A weight 1 B 2 C5 daj neust le o kvanta d lky 10 Tabulka dky as sloupce start timestamp a finish timestamp Funkce je op t p mo ar Virtu ln as se ine vp ed procesy se sv mi finish times tampy v n m postupuj rychlost m rnou jejich v ze D le it je e scheduler je fair vzhledem k v h m tedy odchylka od ide ln ho pom ru dan ho v hami v dn m okam iku nep ekro odchylku kterou
197. ional Understanding is essential Understanding is optional Understanding is optional 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Understanding is essential Understanding is optional U U U Understanding is recommended U U U U Understanding is essential U U U nderstanding is optional nderstanding is recommended nderstanding is optional nderstanding is essential nderstanding is essential nderstanding is essential nderstanding is essential nderstanding is recommended nderstanding is optional nderstanding is recommended 23 Just a curiosity 24 25 Understanding is recommended Understanding is recommended 26 Just a curiosity 27 Just a curiosity 28 Just a curiosity 29 30 31 32 Understanding is optional Understanding is recommended Understanding is recommended Understanding is optional 33 Just a curiosity 34 Just a curiosity 35 Understanding is recommended 36 Understanding is recommended 37 Just a curiosity 38 Just a curiosity Chapter 5 File Subsystem 139 Chapter 5 File Subsystem 140 Chapter 6 Network Subsystem Podpora s t se d zhruba rozd lit do dvou st Prvn st je pouh zp stupn n s t aplikac m kv li p enosu dat druhou st je vystav n n jak ch zaj mav ch mecha nizm nad vlastn m p enosem dat Abstractions And
198. iosity Chapter 7 Security Subsystem model roles amp permissions Authentication kdo to je heslo apod ov en identity Authentication je probl m ove en toho zda je aktivita proces u ivatel t m za koho se vyd v Zpravidla se pou v kombinace jm na a hesla typicky je pak v syst mu n jak centr ln autorita kter toto ov uje ostatn se u jen ptaj t hle autority Linux PAM Example PAM je sada knihoven kter poskytuje API pro ov en identity Jej m hlavn m rysem je schopnost dynamicky konfigurovat jak aplikace budou pou vat jak metody ov en identity Funkce jsou rozd leny do ty ech skupin Account management spr v t zalo it apod e Authentication management P ihla ov n Password management m nit hesla apod e Session management 13ce vim i ges iura wer l PAM je konfigurov na souborem kter pro ka dou slu bu aplikace kter chce PAM rents E s agi sy drm pou vat uv d pro jakou skupinu bude pou it jak modul a jak se zachovat p i heslo te ka prst jeho selh n zbytek syst mu vol jeho funkce gt cat etc pam d login auth required pam_securetty so auth required pam stack so service system auth auth required pam nologin so account required pam stack so service system auth opravn ni password required pam stack so service system auth idea matice pro ka dou session required pam st
199. isto nekde typ EEPROM odkud se ma startovat pri zapnuti PC se pusti BIOS The first catch to starting a process comes with the question of who loads the program inicializuje HW image The typical solution of having a process load the program image of another zkusi nahrat OS process gets us to the question of who loads the program image of the very first pro typicky 1 disk 1 sektor spustim cess to be started This process is called the bootstrap process and the act of starting bootloader tedy ma 256B the bootstrap process is called bootstrapping tj typicky zase nekde neco The program image of the bootstrap process is typically stored in the fixed memory ocekava az nakonec se of the computer by the manufacturer Any of the ROM PROM EEPROM or FLASH natahne jadro OS type memory chips which keep their contents even with the power switched off can be used for this purpose The processor of the computer is hardwired to start exe BIOS potrebuje nejaky HW cuting instructions from a specific address when the power is switched on the fixed potrebuje napr disk pochopitelne 21 je dobre kdyz nastavi nektere specificke HW veci treba casovac chipsetu takze vyssi vrstvy se pak o ne proste nestaraji Edited by Foxit Reader Copyright C by Foxit Corporation 2005 2009 For Evaluation Only Chapter 2 Process Management memory with the program image of the bootstrap process is therefore hardwired to reside on the same address Computers designe
200. it no longer uses it A garbage collector replaces the explicit freeing of blocks with an automatic freeing of blocks that are no longer used A garbage collector needs to recognize when a block on the heap is no longer used A garbage collectors determines whether a block is no longer used by determining whether it is reachable that is whether a process can follow a chain of references from statically or locally allocated variables called roots to reach the block Note that there is a difference between blocks that are no longer used and blocks that can no longer be used This difference means that a garbage collector will fail to jednou za as pustit tracefree blocks that can be used but are no longer used In other words a garbage collec ref tracing tranzitivn tor exchanges the burden of having to explicitly free dynamically allocated variables for the burden of having to discard references to unused dynamically allocated vari prolezu v echny reference ables Normally this is a benefit because while freeing variables is always explicit co nenajdu to odalokuju roots globs locals i v nadfazenejch fcich za nu rootama hled m pointry kam vede pointr to nen garbage p vodn stop the world discarding references is often implicit Reference Tracing Reference tracing algorithms Copying Mark and sweep Mark and compact program ek a dob hne GC dnes paralelismus zastavileference Cou
201. iticalSectionBusy true Code of critical section comes here bCriticalSectionBusy false The principal flaw of the naive solution can be revealed by considering what would happen if two processes attempted to enter the critical section at exactly the same time Both processes would wait for the bCriticalSectionBusy to become false both would see it become false at exactly the same time and both would leave the active waiting cycle at exactly the same time neither process noticing that the other process is about to enter the critical section Staying with two processes the flaw of the naive solution can be remedied by split ting the bCriticalSectionBusy variable into two each indicating the intent of one process to enter the critical section A process first indicates its intent to enter the critical section then checks if the other process indicates the same intent and enters the critical section when alone or backs off when not while true Indicate the intent to enter the critical section bIWantToEnter true Enter the critical section if the other process does not indicate the same intent if bHeWantsToEnter break Back off to give the other process a chance and continue the active waiting cycle bIWantToEnter false Code of critical section comes here bIWantToEnter false The solution is safe in that unlike its naive predecessor it never lets mo
202. iut cree Pe EUER MIRROR FINAL Ural oe rete 59 Synchronization Probleiag cuc sccves tbc dies oen tuere itu enne 59 Means For Synchronization ssssssssssseseeeeeneeneee tenens 61 Synchronization And Scheduling sse 67 What Is The Interface cccsccessccescececceveccereceevscescscavaceeceesececeevarsecarscavscavacee 68 Rehearsal eed O O O tui ev ENESE ENNS 76 SM Memory Mati du ru pC 81 Management Among Processes sse tenentes 81 Multiple Processes TOSelhet iiia pn prep Hp Slo epa rU 81 Separating Multiple Processes siparisi decd ss aieea S ae oaa asa aaia 82 What Is The Interface cccescceccecssceveccececsececeacsceavsceececeecaceevavsecarscavecearacee 90 Reh arsall evccscssiccsessccsissccenczdeedecsasscustoseua ccsauncsatucstadseeseiecawajesdcecsatssevensuucneseanesvencets 90 Allocation Within A Process c ccccccccccccsssesessssssssssecssssecsesscessecesseseecessesesseseseneseeess 92 Process Memory Layout aoceoientenake es tortete eei cortasbecnq rene tii iid 92 LIC 93 ICAP M ile 95 Rehea rsal 2 sects deeds reto ree totae tr ra sodu ds duo eee eta tee Pee idee eed 98 iii 4 Device LETT rus a DEVICE DELVOIS 60 oue T ILLE LED OM US OB DRAN BR Asynchronous Requests sa sessusscceiciresscsscories sistiatraccaaavinsartiscuantesasistssioads Synchronous Requesbi missin dvd dna dt B RA eo bn ten Pow er Management 9b dre
203. iv vratit z funkce a pak terv nastavit navratovou hodnotu a taky z toho plyne ze tak jak clovek programuje se to nehodi pro pipelining employs some parts of the processor and leaves other parts idle The idea of instruc tion pipelining is to process several instructions concurrently so that each instruction is in a different execution phase and thus employs different parts of the processor The instruction pipelining yields an increase in speed and efficiency For illustration Intel Pentium 3 processors sported a 10 stage pipeline early Intel Pentium 4 processors have extended the pipeline to 20 stages late Intel Pentium 4 processors use 31 stages AMD Opteron processors use 12 stages pipeline for fixed point instructions and 17 stages pipeline for floating point instructions Note that it is not really correct to specify a single pipeline length since the number of stages an instruction takes depends on the particular instruction One factor that makes the instruction pipelining more difficult are the conditional jumps The instruction pipelining fetches instructions in advance and when a con ditional jump is reached it is not known whether it will be executed or not and what instructions should be fetched following the conditional jump One solution statistical prediction of conditional jumps is used AMD Athlon processors and In tel Pentium processors do this AMD Hammer processors keep track of past results for 65536 con
204. ive addressing might be more cumbersome than absolute addressing Declaring and accessing a global variable in C static int i declare a global variable i 0 access the global variable The C code compiled into position independent Intel 80x86 assembler comm i 4 4 declare i as 4 bytes aligned at 4 bytes boundary call get thunk get program starting address in ECX addl GOT ecx calculate address of global table of addresses in ECX movl 0 iQGOT ecx write value 0 into target address i relative from ECX The assembler code compiled into position independent Intel 80x86 machine code E8 call 1C000000 target address 0000001Ch distant from here SICT addl target ECX D9110000 value 000011D9h C781 movl target address relative from ECX 20000000 target address 00000020h distant from ECX 00000000 value 00000000h Figure 2 2 Relative Addressing Example Example Program Image In CP M CP M avoids the need for relocation by expecting a program to always start at the same address namely 100h The file with the program image consisted of the pro gram code and the static variables stored exactly in the same form as when the pro gram is executing in memory Example Program Image In Intel HEX When the program image was to be stored on media that was originally designed for storing text such as some types of paper tapes or punch cards formats such as Intel HEX were used A program im
205. jak jsem s n pohnul TISK RNY apod Chapter 4 Device Management chytr za zen podstata skryt 18 p vodn paraleln port Mouse p vodn znakovej tisk Microsoft mouse Serial 1200 bps 7N1 3 byte packets sync buttons high bits X Wm LE and Y low bits X low bits Y Mouse Systems mouse Serial 1200 bps 8N1 5 byte pozd ji grafickej re im packets sync buttons X Y delta X since X delta Y since Y p epnu se do n j ESC a cpu tam graf data PS 2 mouse Serial 10000 16667 bps 801 3 byte packets sync buttons direction nov ji pcl ps n jakej overflow delta X delta Y Mouse can receive commands OFFh reset recognizes jednoduchej jazyk v dycky 3 modes of operation stream sends data when mouse moves remote sends data when polled wrap echoes received data ZVUK el sign l amplituda samplov n jak rychle 19 abych ti stihnul Video Devices pro Clov ka staci Rozd leni na command interface a memory mapped interface Popis vlastnosti ter min l s command interface standardy d c ch p kaz dvojn sobek nejvy frekvence v sign lu ANSI Escape Seguences treba ESC lt n gt J clear screen 0 from cursor 1 to cursor 2 Shanon Nyguist entire ESC lt line gt lt column gt H goto line and column n co na barvy atd Popis termin l s memory mapped interface znakov a grafick displeje pr ce s a ab o bis video RAM akceler tory
206. je p i odemyk n fast a recursive mutexy m e odemknout ka d u recursive mutex se testuje vlastn k to je ale na rozd l od zamyk n nep enositeln detail zmi uje se jen aby bylo vid t e existuje tak koncept vlastn ka z mku Kdy se pod v me na implementaci pthread mutex t je malink struktura obsahuj c krom pr zdn ch pol frontu ekaj c ch thread stav mutexu counter rekurzivn ho zamyk n pointer na vlastn ka a typ mutexu Implementace vlastn ch operac je pak jednoduch sd len mezi procesy pokud jej syst m um se za izuje pomoc shared memory int pthread mutex init pthread mutex t mutex const pthread mutex attr t xmutexattr int pthread mutex destroy pthread mutex t xmutex int pthread mutex lock pthread mutex t xmutex int pthread mutex trylock pthread mutex t xmutex int pthread mutex timedlock pthread mutex t xrestrict mutex const struct timespec abs timeout int pthread mutex unlock pthread mutex t xmutex For active waiting Posix threads library provides spin locks available through the pthread spinlock t data structure and pthread spin functions which are analogical to the mutex functions except there is no timedlock variant Upon initialization the pshared flag specifies if the spin lock can be used only by threads inside one process or also by different processes provided that the spin lock is allocated in a shared memory area int pthread
207. jen malou st tablice a p esto uspokojovat velk po et dotaz na voln bloky a mo nost ukl d n pr v ve voln m m st Mo nost vyu it seznamu al FAT Evidence vad n ch blok vadn soubor ozna en vadn ch blok Diskov kv ty mechanizmus hard a soft kv ty Princip implementace tablice otev en ch soubor tablice majitel otev en ch soubor Performance Mala rychlost a maly pocet blokt cache Vhodn strategie z vis na aplikaci prava pro p ednostn caching adres a I nodes Write back caching rozd len m sta mezi write back a read cache Minimalizace pohybu hlavi ky um st n adres do st edu disku rozd len velk ho disku na segmenty Alokace soubor do sousedn ch blok defragmentace Reliability Po adavek spolehlivosti nejjednodu m e en m je z lohov n Podpora pro z lo hov n archivn atribut detekce link snapshot Z lohov n na p sky na disky mir roring Konzistence syst mu d le itost n kter ch oblast disku P ednostn z pis adres I nodes FAT aloka n ch map a tak Periodick sync Kontrola konzistence p i bootov n syst mu Unerase nev hody dolepovan ch unerase podpora v syst mu FAT Example FAT File System boot sektor i F z z P e D amp Book opie Klasika boot sektor s rozm ry filesyst mu za n m dvakr t FAT za n root directory konfig FS za nim data area Adres ov
208. k Obl ben m hackem b valo zamknout si lok ln mandatory n jak soubor a pak zkusit poru it tenhle z mek p es NFS m se s trochou t st dal zablokovat NFS server Souborov z mky zpravidla nejsou vhodn pro ast zamyk n s malou granulari tou 124 Chapter 5 File Subsystem Example Windows Sharing Operations Locked and unlocked regions must match it is not possible to lock a region and then unlock part of a region or to lock multiple adjacent regions and then unlock the regions together Locking does not prevent reading through memory mapping Locks are unlocked on closing the locked file or terminating the owning process Arbitrary time may elapse between closing or terminating and unlocking Consistency Support To be done Example Windows Transaction Operations HANDLE CreateTransaction PSECURITY ATTRIBUTES lpTransactionAttributes LPGUID UOW DWORD CreateOptions DWORD IsolationLevel DWORD IsolationFlags DWORD Timeout LPWSTR Description BOOL CommitTransaction HANDLE TransactionHandle BOOL RollbackTransaction HANDLE TransactionHandle Transaction context can be used to group together multiple operations and provide multiple readers with a consistent past snapshot of data in presence of a single writer Most arguments of the context creation call are ignored and should be set to zero HANDLE CreateFileTransacted LPCTS
209. k pomoc notify nebo notify All mohou vzbudit Podobn jako u klasick ch condition variables i tady je mo n tyto metody volat pouze pokud je p slu n objekt zam en Mimochodem condition variables jdou napsat dv ma zp soby u jednoho po signal b d l signaller u druh ho b jeden z ekaj c ch proces Prvn zp sob se ob as tak naz v Mesa Semantics druh Hoare Semantics Windows provide Condition Variables that can be used within a single process VOID InitializeConditionVariable PCONDITION VARIABLE ConditionVariable BOOL SleepConditionVariableCS 74 Chapter 2 Process Management PCONDITION VARIABLE ConditionVariable PCRITICAL SECTION CriticalSection DWORD dwMilliseconds BOOL SleepConditionVariableSRW PCONDITION VARIABLE ConditionVariable PSRWLOCK SRWLock DWORD dwMilliseconds ULONG Flags VOID WakeConditionVariable PCONDITION VARIABLE ConditionVariable VOID WakeAllConditionVariable PCONDITION VARIABLE ConditionVariable Events Windows events Parametry jako obvykle fManualReset k zda je event pot eba explicitn shazovat ek se tak stejn pomoc WaitForXxx ale signalizuje se jinak BOOL SetEvent HANDLE hEvent ohl s event u non manual reset events se po rozb hnut jednoho ekaj c ho threadu event zase shod BOOL ResetEvent HAN DLE hEvent shod manual reset event BOOL PulseEvent H
210. kets The first pitfall is caused by excessive data copying The individual modules that process packets may need to add headers or footers to the data which may prompt a need for moving the data to make room for the headers or footers With top desktop systems moving data in memory in hundreds to thousands of MB per second and top network systems moving data in wires in thousands of MB per second even a small amount of data copying may be a problem The second pitfall is caused by excessive data dispatching Many solutions exist the traditional ones including hash tables the wilder ones ranging from dispatcher short cut caching to dispatcher code generation and dispatcher code upload Both pitfalls can be sidestepped by using smart hardware Example Linux SK Buff Structure To avoid data copying the individual modules that process packets keep data in the sk buff structure The structure reserves space before and after data so that headers or footers can be added without data copying struct sk buff xalloc skb unsigned int size int priority void skb reserve struct sk buff skb unsigned int len int skb headroom const struct sk buff xskb int skb tailroom const struct sk buff skb unsigned char xskb put struct sk buff skb unsigned int len unsigned char skb push struct sk buff xskb unsigned int len unsigned char skb pull struct sk buff xskb unsigned int len 145 Chapter 6 Network Subsystem 146
211. l m t m typicky bajtech na polo ku vych z tabulka kolem 4MB To je moc proto se d laj eknu offset a zbytku segment slo str nky V ce rov ov tabulky kde nen pot eba rezervovat prostor pro tabulku str nek cel virtu ln pam ti ale jen pro pou itou st nav c mohou b t sti tabulky tak mapuju str nky virtu ln str nkov ny na r mce fyzick e R zn velikosti str nek kde je pot eba men po et polo ek na namapov n ste stanoven velikosti str nky jn ho objemu pam ti b n 4KB ale ob as se hod Inverted page tables kter maj polo ku pro ka dou fyzickou str nku a jsou jin velikost d se n kdy tedy vlastn asociativn pam t kter vyhled v podle virtu ln adresy Maj tu stanovit ru n v hodu e jejich velikost z vis na velikosti fyzick pam ti nikoliv virtu ln ov em patn se prohled vaj Proto e inverted page tables se prohled vaj zpravidla hashov n m a e it kolize v hardware by bylo n kladn jsou vedle N abralo by to moc m sta v ce r str nkov n ka dej proces m jen prim rn tabulku a ty tabulky ni rovn kter pou v TLB cache pro a f polo ky bud najdu polo ku v TLB nebo kdy neni ve str nkovac ch tabulk ch pointr do tabulky nejvy rovn je v registru procesoru t eba CR3 str nkovac tabulka se vlastn neprohled v ale prost se tam p ist
212. le gignit ese dele 0x12345678 access the global variable rel nebo co The C code compiled into Intel 80x86 assembler loader OS na to mrkne a neco udela a pusti to comm i 4 4 declare i as 4 bytes aligned at 4 bytes boundary movl 0x12345678 i write value 12345678h into target address i The assembler code compiled into Intel 80x86 machine code C705 movl 22 Chapter 2 Process Management C0950408 target address 080495C0h 78563412 value 12345678h Figure 2 1 Absolute Addressing Example When a program image uses absolute addressing it must be loaded at the specific range of addresses it has been constructed for Unfortunately it is often necessary to load program images at arbitrary ranges of addresses for example when multiple program images are to share one address space This reguires adjusting the program image by fixing all machine code instructions and static variables that refer to specific addresses using absolute addressing This process is called relocation The need for relocation can be alleviated by replacing absolute addressing which stores addresses as absolute locations in machine code instructions with relative ad dressing which stores addresses as relative distances in machine code instructions The program image is said to contain position independent code when it does not need relocation Constructing position independent code usually comes at a price how ever because in some cases relat
213. len int flags 142 Chapter 6 Network Subsystem ssize_t recvfrom int sockfd void xbuf size t len int flags struct sockaddr xfrom socklen t fromlen ssize t recvmsg int sockfd struct msghdr xmsg int flags struct msghdr void msg name optional address Socklen t msg namelen optional address length struct iovec msg iov array for scatter gather size t msg iovlen axray for scatter gather length void xmsg control additional control data Socklen t msg controllen additional control data length int msg flags l The recv family of calls receives data over a socket The read call can also be used but the flags cannot be specified in that case int select int setsize fd set xreadfds fd set writefds fd set xexceptfds struct timeval xtimeout int poll struct pollfd xufds unsigned int nfds int timeout struct pollfd int fd short events requested events short revents returned events l The select call is used to wait for data on several sockets at the same time The arguments are sets of file descriptors usually implemented as bitmaps The file de scriptors in readfds are waited for until a read would not block the file descrip tors in writefds are waited for until a write would not block the file descriptors in except fds are waited for until an exceptional condition occurs The call returns the number of file descriptors that meet the condition of the
214. lients renew callbacks when opening files whose file data were sent some time ago AFS uses Rx which is a proprietary RPC implementation over UDP AFS uses Ker beros for authentication AFS uses identities that are separate from system user iden tities Example Coda File System The Coda File System sports a design similar to AFS with global name space repli cated servers and caching clients Servers keep files in volumes which can be moved and read write replicated across multiple servers Files are read from one server and written to all servers Clients check versions on all servers and tell servers to resolve version mismatches Clients work in strongly connected weakly connected and disconnected modes The difference between connected and disconnected modes is that in the connected 152 Chapter 6 Network Subsystem modes the client hoards files while in the disconnected mode the client uses the hoarded files The difference between strongly connected and weakly connected modes is that in the strongly connected mode writes are synchronous while in the weakly connected mode writes are reintegrated Reintegration happens whenever there is a write to be reintegrated and the client is connected Writes are reintegrated using an optimized replay log of mutating opera tions Conflicts are solved manually Global File System The Global File System is a distributed file system based on shared access to media rather
215. ltime clock baterka Chapter 4 Device Management p vodn ip GRAFICK KARTY z hodinek 4 EX which describes an interface between the disk and the computer The ATA standard US se poresit allows the disk to be accessed using the command block registers the ATAPI stan a letn as termin lov podobn jako hn dard allows the disk to be accessed using the command block registers or the packet bts vrbs commands ta p ipoj se videoram karta te obsahThe command block registers interface relies on a number of registers including thena zdroj hodinov a vykresluje Cylinder High Cylinder Low Device Head Sector Count Sector Number Com sign lu mand Status Features Error and Data registers Issuing a command entails reading Pt P SORORIS palate the Status register until its BSY and DRDY bits are cleared which indicates that the P eru rychlos se Gasto da 7 Se e es disk is ready then writing the other registers with the required parameter and finally 77777 ee ar chui writing the Command register with the required command When the Command reg df v animace zm nou palety P z i ister is written the disk will set the Status register to indicate that a command is being nes paleta pevn i ALARMS executed execute the command and finally generate an interrupt to indicate that the p eru m za 100 CRT registry kolik je Command has been executed
216. luation Only na 100 Chapter 2 Process Management potrebuje procesor stejne jako u procesu pamet take kernely bezi v semi privileged rezimu atd stejne principy ale min flexible problem sit karta gt simulace s initial ve kter m se proces vytv b hem vol n fork a stav zombie ve kter m vic sitovejch karet pomalee proces ek po vol n exit dokud parent neud l wait syscall volam privilegovane veci run gueue procesy kt jsou v kernelu zvlastni 1 32 ready to run How To Decide Who Runs skok kde instrukce 1 proces je running The responsibility for running processes and threads includes deciding when to runMa definovanou x l which process or thread called scheduling Scheduling accommodates various re adresu na kterou switching planovac prepina quirements such as responsiveness throughput efficiency skace tj nemuzu procesy mezi temito stavy si skocit kam chci roces prestane bezet 2 pira 7 pozada yield e Responsiveness requires that a process reacts to asynchronous events within BAN bun 4 sh Mis p kdyz mu dojde kvantum cas sonable time The asynchronous events may be for example a user input where a kdyz prijde proces s vetsi prompt reaction is required to maintain interactivity or a network request wheresyscall musi bejt co a prompt reaction is required to maintain quality of service nejvic robustni tj ana e Predictability requires that the operating system can pro
217. m 19 Explain what is thrashing and what causes it Describe what can the operating system do to prevent thrashing and how can the system detect it 20 Explain the concept of memory mapped files 21 Explain the priciple of the copy on write mechanism and provide an example of its application in an operating system Exercises 1 Consider a system using 32 bit virtual and 32 bit physical addresses Choose and describe a way the processor in this system should translate virtual ad dresses to physical Choose a page size and explain what kind of operation is the page size well suited for Design a data structure for mapping virtual addresses to physical and describe in detail all the records used in the structure When designing the data struc ture take into account the choice of the address translation mechanism and explain why is the resulting data structure well suited for the system Describe the involvement of the operating system in the process of translating a virtual address to physical if any and provide a sketch of the algorithms for handling a page fault and selecting a page for eviction Hint Among other things approaches to address translation differ in the range of exceptions that the operating system must handle These might include access protection exception address translation exception page fault exception Does your description of the operating system involvement cover all the exceptions ap plicable in the ch
218. mapuje do rezervy tracks ae ee The Shortest Seek First strategy of processing requests directs the disk to service A the request that has the shortest distance from the current position of the disk head vadnejch sektor tak Ze Th ffer letti di to jede stopu po stop e strategy can suffer from letting too distant requests starve tuk tuk tuk tuk tuk tuk tuk The Bidirectional Elevator strategy of processing requests directs the disk to ser tuk tuk bam bam tuk uk vice the request that has the shortest distance from the current position of the disk jak to p esko do rezerVKaag in the selected direction which changes when no more requests in the se pe lected direction are waiting The strategy lets too distant requests starve at most passes over the disk in both directions ATAPI with packet inerente p kazy pos l m p es dat e The Unidirectional Sweep strategy of processing requests directs the disk to service registry the request that has the shortest distance from the current position of the disk head in the selected direction or the longest distance from the current position of the Jak to Helat rychler disk head when no more requests in the selected direction are waiting The strat SSF shortest seek first eov lets too distant requests starve at most one pass over the disk in the selected elevator seek jen jednim directions sm rem spravedliv j unidir bidir The strategy used to process t
219. menujou 7 Fil amp ztrea VOLUME NAME Disk s volume name file data e default SVODLUME INFORMATION NTFS version and dirty flag security n ktery atrib dal se ud lat i vlastn FILE NAME File or directory name stream tj p idat tam STANDARD INFORMATION File time stamps and hidden system n co co norm ln neni and read onlv flags vid t ML dneska se to zakazuje SECURITY_DESCRIPTOR Security information nedovoli mi A DATA File data neco co ma v N zvu ale vnit n to tam furt je 130 um primo snapshot prefatch ifrov n kompresovani probl m fragmentace sama MFT m e bejt fragmentovan sna ej se d vat MFT do jin sti disku ne ke jsou dat kdy je disk hodn napln nej MFT se za ne fragmentovat a hodn se zpomal Chapter 5 File Subsystem Atribut Obsah INDEX_ROOT Directory content SINDEX ALLOCATION Directory content BITMAP Directory content mapping ATTRIBUTE_LIST Describes nonresident attribute headers Je t pozoruhodn ji Windows dlouhou dobu nem ly pro pr ci se streams dn API tedy ne el snadno vypsat seznam streams apod e en nab zela funkce BackupRead kter ze souboru vyrob speci ln backup stream ur en pro z lohov n Tento stream obsahuje data pot ebn pro kompletn rekonstrukci soubor tedy i streams a jeho form t je zn m Zd se e i ACL jsou ulo en jako stream HANDLE FindFirstStreamw
220. metric is kept for all processes as a ratio of the time spent calculat ing to the time spent waiting for input or output Processes with the priority range Chapter 2 Process Management between 100 and 140 which is reserved for user processes get their priority adjusted so that processes that calculate a lot are penalized and processes that wait for input or output a lot are rewarded Processes that wait for input or output a lot are never moved from the active to the expired queue An extra kernel thread redistributes processes between processors References 1 Rohit Agarwal Process Scheduler For Kernel 2 6 x Example Linux Late 2 6 X Series Scheduler The late 2 6 series of kernels distinguish Completely Fair Scheduler CFS and Real Time RT classes handled by separate scheduler modules with separate per proces sor run queue structures The scheduler calls the put prev task function of the appropriate module to put a previously scheduled process among other runnable processes and the pick next task function of the highest priority module to pick the next scheduled process from other runnable processes The per processor structure of the RT class scheduler contains an array of queues one for each process priority The pick next task function picks the process with the highest priority until the time allocated to the process group or the process class in a single scheduler period is consumed The per processor structure of the
221. mn je e vy aduje vytezoval vizion stv e se t eba statisticky unix win podobn priority U a LZT Fair Share 0 a 50 kernel aby g a ee NR v kernelu nikdo ne um Tak guaranteed share scheduling proces m se zaru uje jejich procento ze dlouho pa kdy tam n co Strojov ho asu bud ka d mu zvl nebo po skupin ch nebo t eba po u ivatel ch zamkne chci a je to zase brzo odem en BOLA SEE Sees Earliest Deadline First startovac prio 50 Procesy maj deadlines do kdy musi n co ud lat pl nuje se v dy proces s nejbli Processor Usage Count deadline Deadlines se zpravidla rozd luj do hard realtime deadlines ty jsou kr tk 0 127 a nesm se pro vihnout soft realtime deadlines ty jsou kr tk a ob as se pro vihnout NICE 0 39 p id l m m ou dokud bude mo n zaru it n jak statistick limit pro vihnut timesharing PEL startu deadlines ty v podstat nemaj pevn asov limit ale m ly by se do n kdy st t batch deadlines ty maj limity v hodin ch a obvykle s nimi nebyvaj probl my nejen p dni SSL solaris Nemesis pro soft realtime t eba p ehr va e mi dave aktualni prio prio 0 160 nejlep proces dom na se v dycky pou t ze stejnyho mista i KEN realtime 100 160 tj n jak ho main adm peu Ege kernel 60 99 d m si jak asto mam b et t eba 10 100 10 ze eM MUS user 0 59 sta tik
222. mplest case are also called similarly The situation becomes more complex when access to protected resources such as hardware devices or ap plication memory must be considered Typically the privileges that govern access to protected resources are enforced by the processor Depending on the privileges of the currently executing code the processor decides whether to allow executing instructions that are used to access the protected resources To prevent malicious code from accessing protected resources constraints are imposed on the means by which the currently executing code can change its priv ileges as well as the means by which less privileged code can call more privileged code The operating system posesses various privileges that allow it to access protected resources Requesting services of the operating system therefore means calling more privileged code from less privileged code and must be subject to the constraints that prevent malicious code from accessing protected resources These constraints are met by the system call interface of the operating system The system call interface must be efficient Depending on the processor this can become an issue especially when calling more privileged code from less privileged code because the constraints imposed by the processor can make the call slow or make the copying of arguments necessary The system call interface must be robust Malicious code should not be able to trick the opera
223. musim volat kernel neni potreba delat privileged funkce rychle prepinani kontextu Example Linux Syslet API A simplified example of reading a file using syslets is copied from Molnar References 1 Ingo Molnar Syslet and Threadlet Patches http people redhat com mingo syslet patches Example Windows System Call API On Intel 80x86 The libraries wrapping the system call interface are called in the same way as any other libraries int MessageBox HWND hwndOwner LPCTSTR lpszText LPCTSTR lpszTitle UINT fuStyle MessageBox 0 zMessageText zWindowTitle MB OK MB SYSTEMMODAL push MBOK or MB SYSTEMMODAL or MB ICONHAND push offset zWindowTitle push offset zMessageTest push 0 call MessageBoxA call the library add esp 16 remove arguments from stack Figure 2 14 Library System Call Example The system call interface uses either the interrupt vector 2Eh or the SYSENTER and SYSEXIT instructions In both cases the EAX register contains a number identifying the requested service and the EDX register points to a stack frame holding arguments of the requested service What Is The Interface Typically the creation and termination of processes and threads is directed by a pair of fork and join calls The fork call forks a new process or thread off the ac tive process or thread The join call waits for a termination of a process or thread The exact syntax and semantics depends on
224. n and Validation Scheme CCEVS od NIST a NSA ta hodnot podle ISO Standard 15408 aneb Common Criteria for IT Security Evaluation Urovn se ozna uj EAL1 a EAL7 a obsahuj kombinace po adavk v r zn ch t d ch EAL1 je nejjednodu v podstat n jak funguje a do EAL4 se zvy uje rove testov n ale nikoliv n roky produkty navr en bez uva ov n CC by m ly obst t na EAL4 EAL5 po aduje semiformal design and test ing EAL6 po aduje je t semiformal verification of design EAL7 po aduje formal design and testing a formal verification of design Pro mal srovn n z opera n ch syst m je na EAL3 SGI IRIX na EAL4 Solaris 8 HP UX Windows 2000 z datab z je na EAL4 Oracle 8 ze smart cards je na EAL5 GemPlus JavaCard vy levels se z ejm zat m neud luj Notes Still a sketch Understanding is essential Just a curiosity Just a curiosity Understanding is essential Understanding is essential Understanding is essential Understanding is essential O DN O sgi g9 NP Understanding is recommended 10 Just a curiosity 11 Understanding is recommended 12 Just a curiosity 13 Just a curiosity MAC DAC MAC mandatory pravidla na p stupov pr va nesm j to m nit u ivatel t eba nikdo nesm lezt do ciz ho home SELINUX role typy 162
225. n as the node with most idle pages from the last epoch The coordinator collects summary of page ages from each node in the cluster and determines the percentage of oldest pages within the maximum eviction count on each node in the cluster The coordinator distributes 89 Chapter 3 Memory Management 90 this percentage to each node in the cluster and appoints a coordinator for the next epoch Each eviction a node is chosen randomly with the density function using the distributed percentages the node then evicts LRU local page Michael J Feeley William E Morgan Frederic H Pighin Anna R Karlin Henry M Levy Chandramohan A Thekkath Implementing Global Memory Management in a Workstation Cluster What Is The Interface To be done Rehearsal Questions 1 Internal fragmentation leads to poor utilization of memory that is marked as used by the operating system Explain how internal fragmentation occurs and how it can be dealt with 2 External fragmentation leads to poor utilization of memory that is marked as free by the operating system Explain how external fragmentation occurs and how it can be dealt with 3 Explain how memory virtualization through paging works and what kind of hardware and operating system support it requires 4 At the level of individual address bits and entry flags describe the process of obtaining physical address from virtual address on a processor that only has the Translation Lookaside Buf
226. ni modulu hlavn usp ch a selh n ale je t mnoho dal ch uv st zda se m oro kafdei prast edak modul ignorovat zda m stack okam it nebo nakonec selhat nebo usp t a je t p r soubor jeden random mali kost integer ka dou vystavenou Z pohledu program tora je pak pou it PAM p mo ar hlavn je asi funkce kapabilitu zhe uje s tim pam authenticate pro ov en u ivatele dal funkce jsou k dispozici pro zb vaj c integerem a d to klientovfynkce knihovny Zvl tnost je pou it konverza n funkce to je callback funkce s tim he em tim kontroluje oskytnut aplikac knihovn tak aby tato mohla v p pad pot eby vyzvat jestli B T i vi 9m E us HR e TOME EU u ivatele nap klad k zad n hesla include lt security pam appl h gt include lt security pam misc h gt 157 symetrick Sifrovani centralni autorita ma kli e vSech klientu v rime ji coz nevadi Chapter 7 Security Subsystem static struct pam_conv conv misc_conv NULL int main int argc char x xargv pam_handle_t pamh NULL char xuser int retval VA retval pam start check user user amp conv amp pamh if retval PAM SUCCESS retval pam authenticate pamh 0 Is user really himself if retval PAM SUCCESS retval pam acct mgmt pamh 0 Is user account valid if retval PAM SUCCESS pam end pamh retval References 1 L
227. nice je CLK synchronni en BALE 2 2 address enable EMR tety ceka na data DO 15 Figure 1 5 ISA Bus Read Cycle In a write bus cycle the bus master activates one of the MEMW Memory Write or IOW Input Output Write signals to indicate either writing to memory or writing to an output device The bus master sets the data on the data bus and waits the next four cycles for the memory or the device to recognize its address and data Figure 1 6 ISA Bus Write Cycle 10 vstup od periferie klavesnice driv hlida si program cyklem dnes preruseni zarizeni kontaktuje radic preruseni procesor mezi kazdejma dvema instrukcema kontroluje jestli nedoslo k preruseni podle toho co je to za preruseni se skoci na prisusnou adresu tabulka spravuje ji OS kontext je uschovan kdyz obsluhuju preruseni zakazu preruseni prerusenim se obvykle mysli toto hw preruseni Sw preruseni nebo vyjimka to se chova podobne ale neni to tak zajimavy zakazovat preruseni by se nemelo na dlouho periferie nemaj neomezeny buffery a po case prichzim o data 100 taktu ok 1s spatne po preruseni muzu sikovne vyvolat ne tu prerusenou aplikaci ale nejakou jinou pac jsem tamtu vlastne uspal prenos dat 0 5 kB pres procesor data bezej po sbernici 2x disk procesor procesor pamet DMA procesor se na chvili Edited by Foxit Reader Copyright C by Foxit Corporation 2005 2009
228. node contains information such as file type superblok metadata access rights owners timestamps size and the allocation map The allocation map oblasti s vl datama contains pointers to direct blocks a pointer to a single level indirect block which pro zkr cen seek contains pointers to direct blocks a pointer to a double level indirect block which blocks typ desitky MB real blocks skupinka nekolika sektoru contains pointers to single level indirect blocks and a pointer to a triple level indirect block which contains pointers to double level indirect blocks By default 12 pointers to direct blocks reside in the inode 1024 pointers to indirect blocks reside in a block Some versions of the filesystem could store file tails in block fragments The inode block structure therefore contains block fragment related fields which however are not inode table staticky used in the current filesystem implementations tj mam tady obvykle hodn voln ho mista bo pokud nemam samy mr av soubory hodn polo ek se nepou ij typicky t eba 4 se pougtig et ext2 inode free inode data bitmapa free block data 2 tmapa data v bloc ch typicky 4kB definov no v superbloku inody v tabulce identofokovan po adov m slem to jednozna n identifikuje soubor ka dej block m svuj fragment tabulky inod inode flagy co to je za pokyny pro FS jak se k tomu chvoat velikost timestampy links
229. nsate for the possibility of a collision that would make multiple flows share one queue the algorithm changes the hash function periodically Token Bucket The token bucket algorithm is used when single flow needs to observe bandwidth The flow is assigned a bucket for tokens with a defined maximum capacity Tokens are added regularly and removed when data is sent no tokens are added to a full bucket no data can be sent when no tokens are available The speed of adding tokens determines bandwidth limit The capacity of token bucket determines fluctuation limit Hierarchical Token Bucket To be done Class Based Queuing The class based queuing algorithm is used when multiple flows need to share band width The flows are separated into hierarchical classes that specify their bandwidth requirements and can borrow unused bandwidth from each other A class has a level The level of a leaf class is 1 the level of a parent class is one higher than the maximum level of its children A class is under limit if it transmits below the allocated capacity A class is over limit if it transmits above the allocated capacity A class is on limit otherwise A class is unsatisfied if it is under limit and it or its siblings have data to transmit A class is satisfied otherwise A class is regulated if the class based queuing algorithm prevents it from sending data A class is unregulated otherwise V klasick Formal Sharing implementaci m
230. nt tak ho vzbud m a nech m zam eno jinak odemknu probl my kdy testuju frontu m u bejt p eru enej mezitim mi n kdo vleze do fronty a usne a u se nevzbud fronta je sd len datov struktura e en zamyk n fronty pomoc aktivn ho ek n nebo tam m u pou t interrupty na multiprocesoru m u b t p eru en po odemknut fronty ale p ed usnutim gt op t nekone n sp nek e en p ed odem enim si nastavim flag e p jdu sp t and the oWaitingProcesses shared gueue variable to keep track of the waiting processes if The critical section is busy put the process into the waiting queue oWaitingProcesses Put GetCurrentProcess Wait until somebody wakes the process Sleep AtomicSwap bCriticalSectionBusy true DY Code of critical section comes here See if any process is waiting in the queue oWaitingProcess oWaitingProcesses Get if oWaitingProcess A process was waiting let it enter the critical section Wake oWaitingProcess else No process was waiting mark the critical section as free bCriticalSectionBusy false One major flaw of the naive solution is that the decision to wait and the consecutive adding of the process to the wait queue and removing of the process from the ready queue are not performed atomically It is possible that a process decides to wait just before the critical section be
231. ntel Corp 82801BA BAM SMBus rev 04 00 1f 4 USB Controller Intel Corp 82801BA BAM USB Hub 42 rev 04 00 1f 5 Multimedia audio controller Intel Corp 82801BA BAM AC 97 Audio rev 04 01 00 0 VGA compatible controller ATI Technologies Inc Radeon RV100 QY Radeon 7000 VE 02 1f 0 PCI bridge Intel Corp 82806AA PCI64 Hub PCI Bridge rev 03 03 00 0 PIC Intel Corp 82806AA PCI64 Hub Advanced Programmable Interrupt Controller 04 0b 0 Ethernet controller 3Com Corporation 3c905C TX TX M Tornado rev 78 04 0c 0 FireWire IEEE 1394 Texas Instruments TSB12LV26 IEEE 1394 Controller Link 04 0d 0 Ethernet controller Intel Corp 82544EI Gigabit Ethernet Controller Copper Check the example to see what are the bridges from the previous example Bus 1 is on board AGP going to ATI VGA bus 2 is on board AGP going to PCI64 with APIC bus 4 is on board PCI going to network cards Check the example to see what are the devices from the previous example Device 00 1f is single chip integrating ISA bridge IDE USB SMB audio gt lspci vvs 04 0b 0 04 0b 0 Ethernet controller 3Com Corporation 3c905C TX TX M Tornado rev 78 Subsystem Dell Unknown device 00d8 Control I O Mem BusMaster SpecCycl MemWINV VGASnoop ParErr Stepping Status Capt 66Mhz UDF FastB2B ParErr DEVSEL medium gt TAbort lt TAbort lt MAbc Latency 64 2500ns min 2500ns max Cache Line Size 10 Interrupt pin A routed to IRQ 23 Region 0
232. nting program az kdyz v tSinu vim jestli chci nebo nechci paralelismus na snapshotu apod Reference counting algorithms Cycles Distribution kopie heapu Distinguishing Generations It has been observed that objects differ in lifetime Especially many young objects quickly die while some old objects never die Separating objects into generations therefore makes it possible to collect a generation at a time especially to frequently collect the younger generation using a copying collector and to seldomly collect the 97 oo Hn Co O0 0000008 oe Ef 90 56 96 86 80 15 85 Flg w Chapter 3 Memory Management 98 older generation using a mark and sweep collector Collecting a generation at a time requires keeping remembered sets of references from other generations Typically all generations below certain age are collected therefore only references from older to younger generations need to be kept in remembered sets Dave Ungar Generation Scavenging A Non Disruptive High Performance Storage Reclamation Algorithm Richard E Jones Rafael Lins Garbage Collection Algo rithms for Automatic Dynamic Memory Management Aaditional Observations Note that having garbage collection may simplify heap management Copying and compacting tends to maintain heap in a single block making it possible to always allocate new objects at the end of a heap making allocation potentially as simple as a
233. ntu Ve flaz ch se samoz ejm d tak ci zda se ek na zpr vu Adresuje se pomoc front zpr v Fronta se vytvo vol n m int msgget key flags ve kter m se uv d identifik tor fronty a flagy Identifik tor je glob ln p padn m e m t speci ln hodnotu IPC PRIVATE kter zaru uje vytvo en nov fronty P stup ke front ovliv uj access rights ve flaz ch ty jsou stejn jako nap klad u soubor int msgget key t key int msgflg Example Mach Message Passing V Machu jsou procesy vybaveny frontami zpr v spravovanymi kernelem t mto front m se k porty Opr vn n k pr ci s portem jsou ulo ena v tabulk ch pro ka d proces spravovan ch kernelem t mto opr vn n m se k capabilities Proces identifikuje port jeho handlerem co je index do p slu n tabulky capabilities Capability m e oprav ovat st z portu zapisovat do portu nebo jednou zapsat do portu Pouze jeden proces m e m t pr vo st z portu to je jeho vlastn k P i startu je proces vybaven n kolika v znamn mi porty nap klad process portem pro komunikaci s kernelem syscalls jsou pak pos l n zpr v na tento port Zpr va v Machu se skl d z hlavi ky ta obsahuje destination a reply port handler velikost zpr vy kernelem ignorovan message kind a function code pole a potom posloupnost datov ch pol tvo c ch zpr vu Zvl tnost Machu je e data zpr vy jsou ta
234. o ek pro soubor z v ce fragment ale nepou v se stejn jako se zd se n kter v ci jsou both endian po sob ulo en ob verze fajly jsou continuous nepou v interleaving Jedn m zaj mav m detailem v ISO9660 jsou path tables Aby se adres e nemusely prohled vat item by item ulo se do path table se azen podle hloubky a dal ch krit ri seznam v ech cest na disku pro ka dou cestu obsahuje path table sektor p slu n ho adres e a jeho parenta teoreticky i interleaved The standard imposes a number of weird limits on the file system structure such as max d lka vno en je 8 n zvy soubor jen velk p smena bez poml ek roz en Joliet apod pro PC vyu it maximum directory nesting depth of 8 only capital letters digits and underscores in file names no extensions in directory names etc For this reason extensions such as Joliet and Rock Ridge have appeared References 1 Erdelsky P J ISO9660 Simplified For DOS Windows path table abych v cest nemusel seekovat pro ka dej adres mam seznam v ech adres P IPISOVATELN CD prevailing descriptors ka dej decsr m slo verze v dy plat ten nejnov j descriptor 134 Example UDF File System Standard ISO13346 a UDF a ECMA167 Z kladn principy podobn ISO9660 a ECMA116 Zaj mav je koncept Prevailing Descriptors pro p ipisovateln m dia
235. o physical memory NO Explain what a Translation Lookaside Buffer is used for 10 Describe the hardware realization of a Translation Lookaside Buffer and ex plain the principle and advantages of limited associativity 11 How does the switching of process context influence the contents of the Trans lation Lookaside Buffer Describe ways to minimize the influence 12 Provide at least two criteria that can be used to evaluate the performance of a page replacement algorithm 13 Explain the principle of the First In First Out page replacement algorithm and evaluate the feasibility of its implementation on contemporary hardware along with its advantages and disadvantages 14 Explain the principle of the Not Recently Used page replacement algorithm and evaluate the feasibility of its implementation on contemporary hardware along with its advantages and disadvantages 15 Explain the principle of the Least Recently Used page replacement algorithm and evaluate the feasibility of its implementation on contemporary hardware along with its advantages and disadvantages 16 Explain the principle of the Least Freguently Used page replacement algorithm and evaluate the feasibility of its implementation on contemporary hardware along with its advantages and disadvantages 17 Explain what is a working set of a process 18 Explain what is a locality of reference and how it can be exploited to design or enhance a page replacement algorith
236. ob kolem 1970 jeho feritov pam se slovem ky 60 bit m la speci ln hardware kter um l p esouvat pam rychlost 40 MB za vte inu Relokace byla snadn d ky adresaci b zov m registrem P ib v samoz ejm tak nutnost pamatovat si rozvr en variable partitions co je probl m kter se objevuje i v mnoha podobn ch situac ch jako je spr va heapu spr va pam ti v kernelu spr va swapu Separating Multiple Processes Zat m v echno pov d n snad s v jimkou overlaying po talo s t m e se do fyzick pam ti vejde n kolik program najednou Situace je ale ob as opa n program se v bec nemus vej t Tak e se vymyslela virtu ln pam p ekvapiv u n kdy kolem 1961 MMU p ekl d adresy Page Translation SZ hi CH z jagi 7 x xiz a 4 x14 NUUS Sree E Princip str nkov n je v pohod Pam t se rozd li na bloky stejn d lky a ud l se mapov n 1 1 by zabrala tabulka kter mapuje virtu ln adresy na fyzick Probl m je samoz ejm v t tabulce v c m sta ne ta mapovan pamet Tabulka mus b t schopna pokr t cel adresov prostor procesu p padn jich m e proto mapuju po bloc ch b ti v c pro v c proces To vede na probl m s velikost tabulky pro adresov pros t eba 4KB tj doln tory 32 bit a str nky o velikosti 4KB zb v 20 bit adresy na slo str nky p i 4 bit adresy v bec 2 v PNE vate nep ek
237. ocation structure Allocation structure je bud 8 runs i S rix he eee S p mo v F node ka d run je 32 bit starting sector a 32 bit number of sectors nebo ibas E k B strom o 12 v tv ch jeho leaf nodes obsahuj a 40 runs Zaj mavou v c jsou cera te sleshs a bid extended attributes u ka d ho souboru se m e ulo it a 64 KB name value p r a XC se to opravuje kter jsou bud primo v F node nebo v extra runu Adres e jsou podobn jako soubory reprezentovan strukturou F node pointer na F node root directory je v superbloku Adres m polo ky r zn d lky ve 2 KB obnova smazanejch souborublocich uspo dan ch jako B strom polo ek se p i jm nech kolem 10 znak vejde n dalebsi do 2 KB bloku tak 40 V ka d polo ce je jm no usage count F node pointer ve FAT ozna im clustery Jednou v hodou HPFS je aloka n strategie d ky kter jsou soubory ukl d ny pokud jako voln mo no v souvisl ch bloc ch a d ky kter je F node bl zko u dat soubor Pou v se v dir sma u prvn k znak prealokace po 4 KB p ebyte n bloky se vrac p i zav en souboru Samoz ejmost je souboru read ahead a write back dokumentace tvrd e se u souboru pamatuje usage pattern 54 kdy se dir hodn a podle n j se toto d m n tak je tam spousta neplatnejch polo ek Zaj mav je tak fault tolerance Syst m si udr uje hotfix map pokud naraz na kter tu p l
238. odn proces roste re ie na p e po t v n priorit Dal neum nic garantovat zejm na ne response time nebo pro cessor share A z v rem aplikace mohou priority ovliv ovat pouze p es nice factor kter nenab z zrovna moc advanced control Example Solaris Scheduler Podobn je System V Release 4 scheduler Ten m jako j dro scheduleru fronty ready to run proces podle priority 0 160 a rutinu pro pl nov n kter klasicky vybere proces nejvy priority round robin na stejn priorit Ka d proces pat do n jak t dy priorit kter m na starosti v echa rozhodnut ohledn p id l n priority a d lky kvanta By default jsou k dispozici t i t dy priorit timesharing system a realtime Timesharing Pou v priority 0 59 procesu se zvy uje priorita poka d kdy na n co ek nebo kdy dlouho trv ne spot ebuje sv kvantum priorita se sni uje poka d kdy proces spot ebuje sv kvantum P esn zp sob zm ny priority se ur uje podle tabulky jako p klad proces priority 40 spadne po spot ebov n kvanta na 30 po ukon en ek n nebo pokud proces nespot ebuje kvantum do 5 vte in priorita naopak vyleze na 50 ekaj c proces dostane priority 59 zm na v norm ln priorit se objev po n vratu do user mode K v sledku se je t p id v nice value t m je priorita v user mode ur ena jednak syst mem po tanou prioritou a dvak nice val
239. ogram file to execute and the command line to supply to the process The process can terminate by calling ExitProcess the WaitForSingleObject call can be used to wait for the termi nation of a process 31 Chapter 2 Process Management 32 BOOL CreateProcess vytvoreni procesu LPCTSTR lpApplicationName LPTSTR lpCommandLine prikazova radka pro spusteni procesu PSECURITY ATTRIBUTES lpProcessAttributes PSECURITY ATTRIBUTES lpThreadAttributes BOOL bInheritHandles DWORD dwCreationFlags LPVOID lpEnvironment LPCTSTR lpCurrentDirectory LPSTARTUPINFO lpStartupInfo PPROCESS INFORMATION lpProcessInformation i VOID ExitProcess UINT uExitCode kdyz sam proces chce skoncit DWORD WaitForSingleObject HANDLE hHandle DWORD dwMilliseconds Figure 2 18 Windows Process Creation System Calls Windows applications can create threads using the CreateThread call Besides returning from the thread function the thread can also terminate by calling ExitThread The universal WaitForSingleObject call is used to wait for a thread to terminate HANDLE CreateThread PSECURITY ATTRIBUTES lpThreadAttributes SIZE T dwStackSize PTHREAD START ROUTINE lpStartAddress LPVOID lpParameter DWORD dwCreationFlags LPDWORD lpThreadId i VOID ExitThread DWORD dwExitCode Figure 2 19 Windows Thread Creation System C
240. olume inode tr kl em inod free space ebu Siz tree key siz by location tree key address chci dalSi fragment co nejbliz Chapter 5 File Subsystem Summary References 1 Russinovich M Inside NTFS 2 Wijk J v NTFS Disk Structure Definitions Example XFS XFS has been designed by SGI and provides support for large numbers of large files in large directories accessed by large numbers of clients This is achieved by using balanced trees for most structures and by providing metadata logging XFS divides the disk into groups each group contains metadata and data areas The group metadata area contains a copy of the superblock pointers to roots of two group free block trees a pointer to the root of a group inode tree and a reserved group free block list The data area is split into equal sized blocks Most references used by XFS come in two flavors a relative reference within a group and an absolute reference which is created by prepending the group identifier to the relative reference Free space is allocated by blocks The two free block trees allow locating free space by block number and by extent size each leaf of a tree points to a free extent The reserved free block list contains free blocks reserved for growing the free block trees Files are represented by inodes The inode tree allows locating an inode by inode number each leaf of the tree points to a block with a sparse array of 64
241. omic sets of operations on behalf of clerks Disks can be attached to multiple servers but only one of those servers accesses the disks one of the remaining servers is chosen to access the disks if the current server fails Clerks present clients with files that can have user defined attributes and multiple data streams Servers store files in an infinite log At top level server handles objects with unique identification with multiple named cells for storing data accessed in one piece and with multiple numbered streams for storing data accessed in multiple pieces Files are mapped onto objects with attributes in cells and contents in streams Directories are mapped onto objects with attributes and entries in cells At medium level server handles infinite log Objects are mapped into B tree stored in the log the keys are object identifiers the leaves are objects Cells and streams are mapped into B trees of a single leaf of the object B tree When cells are stored in a B tree the keys are names of the cells and the leaves denote cell data When a stream is stored in a B tree the keys are positions in the stream and the leaves denote stream extents Optimizations that store short extents within their leaves also apply At bottom level server handles segments Segments are blocks of consecutive sectors 256 kB long A segment consists of a data area and a commit record area that are writ ten in two physical phases for each logical write Log
242. ommited nebo uncommited MODIFY INSERT DELETE na data uncommitted soubor jako parametr kaj zda commitnout READ na committed file SIZE Soubory jsou reprezentovan pomoc capabilities soubory bez capabilities se automaticky ma ou Proto e se nev kdo m capabilities pou vaj se timeouts uncommitted files se ma ou za 10 minut committed files maj parametr age kter Directory Server posouv vol n m touch typicky se vol touch jednou za hodinu a zmiz za 24 hodin od posledn ho touch Directory Server je obecn naming server kter p i azuje jm na k capabilities Z kladn operace jsou CREATE DELETE vytvo en a zru en adres e APPEND vlo en capability do directory REPLACE nahrazen capability v directory Notes Chapter 6 Network Subsystem d le it pro atomick update soubor LOOKUP GETMASKS CHMOD ten a nastaven pr v Example Mach To be done Example Plan 9 Plan 9 is an experimental operating system developed around 1990 in Bell Labs Plan 9 builds on the idea that all resources should be named and accessed uniformly To be done Rehearsal Questions BOs 100 SE ON OTe diss oes CES des e E k E Pr WN F o 1 Popi te architekturu s ov ho syst mu soubor NFS v etn pou van ch pro tokol a hlavn ch operac t chto protokol 2 Vysv tlete co to je a jak polo ky zpravidla obsahuje file handle v s ov m syst m
243. on stack OV EAX EBP 8 fetch anArgument into EAX which is 9 bytes below the stored top of stack OV EBP 4 EAX store EAX into aVariable which is 4 bytes above the stored top of stack OV ESP EBP free space allocated for aVariable POP EBP restore original value of EBP RET return to the caller In the example the stack at the entry to SomeProcedure contains the return ad dress on top that is 0 bytes above the value of ESP and the value of anArgument one item below the top that is 4 bytes above the value of ESP Saving the original value of EBP stores another 4 bytes to the top of the stack and therefore decrements the value of ESP by another 4 bytes this value is then stored in EBP During the execution of SomeProcedure thevalueof anArgument is therefore 8 bytes above the value of EBP Note that the machine code instructions used to access the proce dure arguments and the locally allocated variables do not use absolute addresses in the virtual address space of the process 94 Chapter 3 Memory Management Stack Allocation Allocating the block that contains stack requires estimating the stack size Typically the block is allocated with a reasonable default size and an extra page protected against reading and writing is added below the end of the allocated block Should the stack overflow an attempt to access the protected page will be made causing an exception The operating system can handle
244. opakovan testov n p s tupu blocking lock pro z mek kter ek pasivn recursive lock pro z mek kter lze zamknout v cekr t stejn m threadem read write lock pro z mek kter m re im zam en pro ten a pro z pis atd V ce zamyk n v transakc ch Implementace odemknut z mku m e b t v jednom detailu naprogramovan dv ma zp soby Posledn vlastn k bu odemkne z mek a rozeb hne n kter ekaj c proces kter z mek znovu zamkne nebo jej prost p ed zam en n kter mu ekaj c mu procesu Druh metoda se sice zd efektivn j ale m nep jemnou vlastnost v situaci kdy se n kdo pokus zamknout z mek ve chv li kdy se jej ji vzdal star vlastn k ale je t se nerozb hl nov vlastn k V takov situaci skon pokus o zam en zablokov n m volaj c ho co m e b t pokl d no za patnou v c aktivn proces mus ekat na pasivn Vytv en t chto z vislost mezi procesy se k convoys Read Write Locks To implement the Readers And Writers Synchronization Problem a variant of a lock that distinguishes between readers and writers is typically provided The lock can be locked by multiple readers but only by a single writer and never by both readers and writers 71 Chapter 2 Process Management 72 Read write locks can adopt various strategies to resolve contention between readers and writers Often writers take precedence over r
245. or is the ability to collect only part of the heap How is this possible without the collector missing some references 99 Chapter 3 Memory Management Notes 100 SO D0 cq OS CUI Ee PT es WN NNN NNN NN N BRR BRP RP rR rR rR aH SO NI A UP O NN E TO AN a Ga WN r O Still a sketch Understanding is recommended Understanding is recommended Understanding is recommended Understanding is optional Understanding is essential Understanding is essential Understanding is essential Understanding is recommended Understanding is recommended Understanding is optional Understanding is optional Understanding is optional Understanding is optional Understanding is optional Understanding is recommended Understanding is recommended Understanding is recommended Just a curiosity Understanding is essential Understanding is essential Understanding is optional Understanding is essential Understanding is optional Understanding is optional Understanding is essential Understanding is essential Understanding is optional Understanding is optional Understanding is essential Chapter 4 Device Management Device Drivers Traditionally the operating system is responsible for controlling devices on behalf of applications Even though applications could control devices directly delegating the task to the operating system keeps the application
246. osen approach to address translation 91 Chapter 3 Memory Management Besides the addresses the data structure for address mapping typically contains many other fields that control access protection or help page replacement Is your description of the data structure detailed enough to include these fields 2 Consider the previous example except the system is using 32 bit virtual and 36 bit physical addresses 3 Consider the previous example except the system is using 54 bit virtual and 44 bit physical addresses Allocation Within A Process 92 Process Memory Layout A typical process runs within its own virtual address space which is distinct from the virtual address spaces of other processes The virtual address space typically contains four distinct types of content 7 Executable code This part of the virtual address space contains the machine code instructions to be executed by the processor It is often write protected and shared among processes that use the same main program or the same shared libraries Static data This part of the virtual address space contains the statically allocated variables to be used by the process Heap This part of the virtual address space contains the dynamically allocated variables to be used by the process Stack This part of the virtual address space contains the stack to be used by the pro cess for storing items such as return addresses procedure arguments temporarily saved r
247. ovan hardware MMU Um str nky o velikosti 1 4 16 64 a 1024 kB pro zmaten nep tele jim k tiny pages small pages large pages a sections n kter um d lit do ty subpages Tabulka str nek je dvou rov ov a na velikosti str nek 1024 kB Ochrany jsou e eny zaveden m 16 dom n v 16 reg istrech jsou pops na pr va supervisor a user proces k dom n ka d str nka pat do n jak dom ny Je tak mo n pou vat pouze TLB Obr zky ARM 1022E Technical Reference Manual Chapter 4 Memory Management Units Obr zek 4 1 ukazuje p eklad adresy Obr zek 4 3 ukazuje form t polo ky str nkovac tabulky rovn 1 Obr zek 4 5 ukazuje form t polo ky str nkovac tab ulky rovn 2 C je cacheable B je write back bufferable AP je cosi co rozli uje sub pages SBZ should be zero Nejsou bity accessed a dirty nep edpokl d se paging Dal zaj mav vlastnosti procesoru V k du v ech instrukc je mo n uv st pod m nku kdy se m prov st co odstra uje nutnost branch prediction a nebezpe prediction misses pro mal v tve k du Software Implementation To be done Krom obvykl ch probl m s p stupem ke sd len m struktur m maj v ceprocesorov syst my probl my je t v situac ch kdy jednotliv procesory cachuj informace souvisej c s memory managementem Dv situace Mapov n v TLB Pokud se zm n address space mapppings na uniprocesoru se fl
248. p edpokl d se stabiln b h syst mu a nulov re ie scheduleru Dal drobnost domain si m e ci zda chce nebo nechce dost vat p ebyte n as proce soru Mezi ty dom ny kter ho cht j dost vat se p ebyte n as procesoru rozd luje n hodn s kvantem n jak ch 100 us nic lep ho zat m nevymysleli a pr to sta Jeden detail co d lat s dom nami kter ekaly na event Prost se nacpou zp tky do fronty neuspokojen ch dom n jako kdyby se nov za azovaly p inejhor m t m budou spot ebov vat p ebyte n as procesoru Pro dom ny kter m sta jen mal procento asu procesoru ale pot ebuj reagovat rychle se d zadat je t latency hint ten se pou ije pro v po et deadline m sto periody v p pad e dom na ekala d le ne svou periodu Pou it pro interrupt handling Interrupt handling je neobvykl zato v ak odstra uje jeden ze z kladn ch probl m dosud zm n n ch scheduler toti e st aktivit spojen s devices je v podstat pl novan sign ly od hardware a nikoliv opera n m syst mem Kdy p ijde inter rupt jeho handler v kernelu jej pouze zamaskuje a po le event dom n zodpov dn za jeho obsluhu P edpokl d se e tato dom na ek na event scheduler j tedy v Chapter 2 Process Management souladu s jej mi parametry napl nuje dom na obslouZ device a znovu povol inter rupt Pokud pak syst m nest h nen
249. pf mam dv prom nny A a B i i ializovan na D na jednom procesoru mam k d A 1 read B a na druhym procesoru B 1 read A norm ln je a jednu 0 p i prolozeni dv 1 ale kv li prefetch m u taky p e st poka d 0 bari ra n co jako v css clear both neboli te dod lej cos neud lal a ned lej cos je t d lat nem l Java taky d l reorder ve stylu viz p klad pou vat synchronized a volatile 1 Explain what is a race condition 2 Explain what is a critical section 3 Explain the conditions under which a simple 7 8 e na tu jednu 10 11 12 13 14 15 16 17 18 19 I code fragment on an integer variable can lead to a race condition when exe cuted by multiple threads Explain the conditions under which omitting the volatile declaration on an integer variable can lead to a race condition when the vari able is accessed by multiple threads Describe the Mutual Exclusion synchronization task Draw a Petri net illustrat ing the synchronization task and present an example of the task in a parallel application Describe the Rendez Vous synchronization task Draw a Petri net illustrating the synchronization task and present an example of the task in a parallel ap plication Describe the Producer And Consumer synchronization task Draw a Petri net illustrating the synchronization task and present an example of the task in a
250. pical lifetime ranging between tens of thousands and tens of millions of erasures Devices that masguerade as disks contain controllers that make sure the wear is spread evenly across the entire device rather than being focused on a few blocks This is called wear levelling Devices without wear levelling built into the controller need appropriate handling in software 114 Chapter 4 Device Management Network Cards Scatter gather Checksumming Segmentation Parallel Ports To be done Serial Ports To be done Printers To be done Modems To be done Rehearsal Questions List the features that a hardware device that represent a bus typically provides List the features that a hardware clock device typically provides List the features that a hardware keyboard device typically provides List the features that a hardware mouse device typically provides Qh axe e Ne List the features that a hardware terminal device with command interface typ ically provides Ga List the features that a hardware terminal device with memory mapped inter face typically provides 7 List the features that a hardware disk device typically provides 8 Explain the properties that a hardware disk interface must have to support hardware ordering of disk access reguests 9 Describe at least three strategies for ordering disk access reguests Evaluate how the strategies optimize the total time to execute the reguests
251. r circuits The illustration of constructing a flip flop which in fact represents a single bit of memory is below Chapter 1 Introduction Figure 1 2 Principle Of Composing Flip Flops From Gates Note that besides well known construction patterns of useful circuits approaches to design a circuit given the required logical function are also very well established In fact many circuits are sold as designs rather than chips the designs are merged depending on the application and only then embedded in multipurpose chips References 1 Ken Bigelow Play Hookey Digital Logic Tutorial http www play hookey com digital Basic Computer Architecture The figure depicts a basic architecture of a desktop computer available in the late 1970s The architecture is simple but still representative of the contemporary desktop computers Advances in the architecture since 1970s are outlined in the subsequent sections Edited by Foxit Reader Copyright C by Foxit Corporation 2005 2009 For Evaluation Only Chapter 1 Introduction processor memory peripheral general i registers instruction decoder memoyaray array peripheral logie logic address a address a EA E control unit M address bus normaln bily AO A1 A2 A32 bus Idatabus O 1 sd interface controlbus y y y UID HOLD HLDA interrupt controller INTR INTA hradlo nefunguje okamzi
252. r mode zakazane nebezpec operace jako zmen strankovaci tabulky pristupova prava k pameti pro proces cist zaoisovat spoustet Cooperative And Preemptive Switching P epnut kontextu m e byt preemptivn v takov m p pad opera n syst m sebere po ta jednomu procesu a p id l ho dal mu nebo kooperativn v takov m p pad se proces mus vzd t po ta e dobrovoln kooperativn men overhead nedoch z k p epnut v nevhodn ch okam ic ch preemptivn robustn j syst m proces si nem e uzurpovat po ta Switching In Kernel Or In Process P ep n n kontextu procesoru je mimochodem prakticky v echno co mus d lat im plementace thread Sta ka d mu threadu p id lit z sobn k a CPU co je k d kter m e vykon vat i aplikace Tedy pokud opera n syst m z n jak ho d vodu nenab z thready aplikace si je m e naprogramovat sama Odtud pojmy user man aged threads pro thready kter jsou implementov ny v aplikaci a kernel managed threads pro thready kter jsou implementov ny v kernelu Jin terminologie pou v user threads pro thready kter jsou implementov ny v aplikaci a kernel o nich nev lightweight processes pro thready kter jsou implementov ny v kernelu a aplikace je pou v a kernel threads pro thready kter jsou implementov ny v kernelu a aplikace o nich nev to oboje musi umet procesor a musi to
253. r synchronization is a shared memory that supports atomic compare and swap alongside atomic reads and writes any fair deterministic solution of the mutual exclusion problem for N processes has been proven to need at least N 2 shared variables vic procesoru probl m s cachema musim zneplatnit vSechny cache 89 zapis do pam ti Active waiting is useful when the potential for contention is relatively small and the musim zneplatnib to m sto duration of waiting is relatively short In such cases the overhead of active waiting es ee ee is smaller than the overhead of passive waiting which necessarily includes context ee ee ee switching Some situations also require the use of active waiting for example when rocesorech vznika s cbe p nu vasi there is no other process that would wake up the passively waiting process gt d se zlep it Example Memory Model On Intel 80x86 Processors identifikovat fast path The Intel 80x86 processors guarantee that all read and write instructions operating tj to co je nejb n j Qon shared memory are atomic when using aligned addresses Other instructions may ey AME or may not be atomic depending on the particular processor In particular read and BEEN IUIS write instructions operating within a single cache line are often atomic while read aktivn ek n je to koda proc asu lep je pasivn ek n kdy ek m eknu to procesoru on si m hod n kam bokem a a se
254. race conditions by controlling or limiting the concurrency when executing code where race conditions can occur This code is typically denoted as critical sections Synchronization Problems To better understand what kind of process synchronization is necessary to avoid race conditions models of synchronization problems are used Each model depicts a par ticular scenario of concurrent execution and presents particular requirements on pro cess synchronization Petriho site 2btabkamm Petri nets are often used to describe the synchronization problems Petri net consists places muzou v nich byt Of places and transitions Places can hold tokens transitions can fire by consuming tokeny input tokens and producing output tokens Roughly places correspond to significant transitions po nich se process states transitions correspond to significant changes of process state predavaji tokeny mezi places sezere nejake References tokeny ze vstupu a vyrobi token y na vystupu ech 1 Carl Adam Petri Kommunikation mit Automaten Mutual Exclusion vzajemne vylouceni Mutual Exclusion models a scenario where several processes with critical sections execute concurrently The synchronization problem requires that no two processes execute their critical sections simultaneously treba kdyz mam sdilenou promennou race condition casove zavisla chyba zavisi na konkretnim naplanovani 59 guard token kterej potrebuju mit
255. rating system From the bottom up the system structure then starts with the host operating system that includes the hypervisor and then the guest operating systems hosted above the hypervisor Notes Still a draft Just a curiosity Still a draft Just a curiosity Still a draft Just a curiosity Still a draft Just a curiosity Still a draft Understanding is essential Understanding is essential Understanding is essential Understanding is essential 1D OO VOV ghe OS CAO ES Understanding is essential A j Understanding is recommended k Understanding is recommended r N Understanding is recommended A a Understanding is essential m Aa Understanding is essential kr O1 Understanding is optional A G Understanding is optional A N Understanding is optional A oo Understanding is recommended A NO Understanding is recommended N e Understanding is recommended N A Understanding is recommended N N Understanding is recommended 19 Chapter 1 Introduction 20 proces program stav program obsah pameti stav registry CPU adresa provadene instrukce data promenne v pameti heap stack zarizeni grafika disk vlakno thread programy ktere sdileji kus pameti proces program jeho vlakna Chapter 2 Process Management vlakna sdileji pamet heap spolecnej stack jinej maj jiny stack point
256. re of processor pripraven bezet pac neco chcitime used by each task roughly balanced PT Te babe UII z MERU When resolving the conflicts between the individual scheduling requirements it helps to consider classes of applications vytvoreni procesu fork zombie proces dobehl e An interactive application spends most of its time waiting for input When the ale ten kdo ho spustil po nem input does arrive the application must react quickly It is often stressed that a user jeste muze neco chtit treba of an interactive application will consider the application slow when a reaction to exit code an input does not happen within about 150 ms Fluctuations in reaction time are also perceived negatively A batch application spends most of its time executing internal computations in or der to deliver an external output In order to benefit from various forms of caching that happen within the hardware and the operating system a batch application must execute uninterrupted for some time A realtime application must meet specific deadlines and therefore must execute long enough and often enough to be able to do so When faced with the conflict between the individual scheduling requirements the operating system would therefore lean towards responsiveness for interactive appli cations efficiency for batch applications predicability for realtime applications Na stavov m diagramu je pak zjevn vid t co je lohou pl nov n Je to
257. re than one process into the critical section Unfortunately a process waiting to enter the critical section can be overtaken infinitely many times violating the fairness property Additionally all processes waiting to enter the critical section can form an infinite cycle violating the liveness property A safe solution that also guarantees bounded waiting is known as the Dekker Algo rithm nter the critical section Indicate the intent to bIWantToEnter true while bHeWantsToEnter If the other process indicates the same intent and it is not our turn back off to give the other process a chance if iWhoseTurn MY_TURN bIWantToEnter false while iWhoseTurn MY TURN bIWantToEnter true Code of critical section comes here Chapter 2 Process Management iWhoseTurn HIS TURN bIWantToEnter false Another similar algorithm is the Peterson Algorithm Indicate the intent to enter the critical section bIWantToEnter true totez zjednodusen Be polite and act as if it is not our turn to enter the critical section zde p edpokl d m e iWhoseTurn HIS TURN ten z pam ti a z pis Wait until the other process either does not do pam ti jsou atomick intend to enter the critical section or tj i p i v c procesorech acts as if its our turn to enter while bHeWantsToEnter amp amp iWhoseTurn MY_TURN
258. ream File Operations The OpenFile operation searches multiple hardcoded paths when name without a path is supplied memory mapped file omezeni nejde m nit velikost souboru write se obvykle propaguje Chapter 5 File Subsystem Mapped File Operations S rozmachem str nkov n se stalo b n Ze ka d str nka pam ti je spojena s daty na disku co je mo n v principu vyu t tak pro p stup k soubor m pokud oper a n syst m d aplikac m mo nost specifikovat s jak mi daty na disku jsou str nky spojen Tato mo nost se ozna uje term nem memory mapped files Memory mapped files um l nap klad ji MULTICS kolem roku 1965 dnes je pod poruj prakticky v echny syst my v etn Linuxu a Windows Typick mi operacemi je dvojice map a unmap kde map k kterou st kter ho souboru mapovat kam do pam ti unmap pak toto mapov n ru Inherentn m prob l mem memory mapped files je probl m zm ny d lky souboru p i z pisu nebo nelze prost ci e z pis za namapovan blok pam ti m soubor prodlou it Dal probl my vznikaj v situaci kdy se pro p stup k souboru pou vaj sou asn stream i mapped operace tam opera n syst m zpravidla p ev d stream operace na mapped intern se v echny souboryp stup k buffer m kernelu mapujou do pam ti Example Linux Mapped File Operations void xmmap void xstart size t length int prot int fl
259. ring The controller responded by relinquishing control of the bus to the peripheral device Al though not typical this mechanism has been used for example by high end network hardware Multiple Busses The speed of the processor differs from the speed of the memory and other devices To compensate for the difference multiple busses are introduced in place of the pro 15 start Edited by Foxit Reader Copyright C by Foxit Corporation 2005 2009 For Evaluation Only Chapter 1 Introduction cessor bus from the basic architecture described earlier PC has a north bridge that connects processor to memory and graphic card on AGP at one speed and to south bridge at another speed and a south bridge that connects integrated devices to north bridge at one speed and PCI slots at another speed processor front side bus popis viz vyse DDR2 8 5 Sons memory GB s north bridge PCI X graphics direct media interface SATA 150 2 GB s devices MB s 133 MB s PCI devices PCI X devices USB2 60 devices MB s 500 MB s south bridge Figure 1 9 Multiple Busses Example audio devices OS resource manager amp virtual machine nad zelezem startuje jen 1 procesor ostatni az pak Operating System Structure The design of an operating system architecture traditionally follows the separation of concerns principle This principle suggests structuring the operating system
260. rity Subsystem 3 Vysv tlete co to je capability Security Subsystem Implementation Probl mem implementace je dodr en deklarovan ho bezpe nostn ho modelu Example DoD TCSEC Classification Security klasifikace Trusted Computer System Evaluation Criteria TCSEC or Or ange Book the Canadian Trusted Computer Product Evaluation Criteria CTCPEC and the Information Technology Security Evaluation Criteria ITSEC The goal of these documents is to specify a standard set of criteria for evaluating the security capabilities of systems DoD TCSEC Level D Systems that fail to meet requirements of any higher class Level C1 Provides separation of users and data and access control on individual basis so that users can prevent other users from accidentaly accessing or deleting their data Level C2 In addition requires auditing of security related events Level B1 In addition requires informal statement of the security policy model and no errors with respect to that statement Level B2 In addition requires formal statement of the security policy model and no covert channels Level B3 In addition requires testability of the formal statement of the security policy model Level A1 In addition requires verifiability of the formal statement of the security policy model on the architecture level and verifiability of the informal statement of the security policy model on the implementation level Ze str nky http
261. rocess lifecycle and draw a transition diagram ex plaining when and how a process passes from one phase to another Explain cooperative context switching and its advantages Explain preemptive context switching and its advantages Explain the round robin scheduling algorithm by outlining the code of a func tion GetProcessToRun that will return a process to be scheduled and a time after which another process should be scheduled Explain the simple priority scheduling algorithm with static priorities by out lining the code of a function GetProcessToRun that will return a process to be scheduled and a time after which another process should be scheduled Explain the simple priority scheduling algorithm with dynamically adjusted priorities by outlining the code of a function GetProcessToRun that will re turn a process to be scheduled and a time after which another process should be scheduled 17 Explain the earliest deadline first scheduling algorithm by outlining the code 18 19 20 21 2 of a function GetProcessToRun that will return a process to be scheduled and a time after which another process should be scheduled Explain the function of the Solaris process scheduler by describing how the algorithm decides what process to run and for how long Explain the function of the Linux process scheduler by describing how the algorithm decides what process to run and for how long Explain the function of the Windows process s
262. roku ne chybu sektoru tak jej do t to mapy p id a zobraz warning ve vhodn chv li se se do tu k n emu skute n ak soubory le c v hotfixed sektorech p esunou jinam a hotfix se vypr zdn P i power outage se podle dirty flagu ve spareblocku pozn e v e nen v po dku re obnova trochu heuristikacov ry pak m e pou t magic identifiers kter jsou p tomn ve v ech zaj mav ch struktur ch pro nalezen F nodes a directories kter jsou nav c linked to each other FAT32 amp dlouh jm na Pozn mka stranou B strom je vyv en strom s daty ve v ech uzlech B strom je dlouh jm na n kolik vyv en strom s daty pouze v listech Jinak snad norm ln dres fovejch polo ek 1 polo ka 13 znak wanes Example EXT2 And EXT3 And EXTA File Systems um po sob jdouc bloky The filesystem uses the classical structure starting with a bootstrap area and contin uing with blocks each block containing a copy of the superblock filesystem descrip tors free block bitmap free inode bitmap inode table fragent and data area These EXT2 3 4 blocks serve the same role as bands or groups in other filesystems and should not be v echny v znamn j rs confused with equal sized blocks within the data area vypadaj podobn Free space is allocated by blocks A free block bitmap is used to keep track of free blocks struktura bootstrap A file is represented by an inode The i
263. rom their home nodes to remote nodes The decision to migrate a process is based on mul tiple criteria which include the communication cost the memory requirements the processor usage To avoid thrashing overriding importance is assigned to memory requirements When accessing resources a migrated process can either access the local resources of the remote node or the remote resources of the home node In general access to local resources is faster than access to remote resources but some remote resources cannot be replaced by local resources Mosix therefore intercepts accesses of migrated pro cesses to resources and directs them towards local resources when transparency can be preserved and towards remote resources otherwise To facilitate access to remote resources the migrated process communicates with its process deputy on the home node 153 Chapter 6 Network Subsystem 154 To guarantee transparency when accessing user credentials Mosix requires that all computers in a cluster share the same UID and GID space To guarantee transparency when accessing files but avoid doing all file system op erations remotely Mosix relies on DFSA Direct File System Access optimizations These optimizations recognize cluster file systems that are mounted across the entire cluster and do most file system operations locally on such file systems Mosix refuses to migrate processes that use shared memory etc References 1 Mosix http
264. rozhodnout kdy a kter proces p epnout ze stavu ready to run do stavu running a zp t Pokud nen dn proces ve stavu ready to run spou t se um l idle process alias zahale HLT 41 Edited by Foxit Reader Copyright C by Foxit Corporation 2005 2009 For Evaluation Only Chapter 2 Process Management References 1 James Dabrowski Ethan Munson Is 100 Milliseconds Too Fast schedulling staticke predem urcim pro mission critical hard real time aplikace embedded systemy vsechny joby znam predem dynamicke no preemption pusitm cekam az skonci planuju cely job voluntary preemption job se muze sam prerusit full forced preemption planovac muze aktivitu prerusit u VP a FP uz planuju processes threads fibres fubrils no preemption Round Robin FIFO Dispatcher Tak cyklick pl nov n FIFO prost spou t t procesy jeden po druh m Ot zkou vylepseni priorita vic front je d lka kvanta v pom ru k re ii p epnut kontextu p li kr tk kvantum sni uje aperiodicke joby joby efektivitu p li dlouh kvantum zhor uje responsiveness spoustene podle casu udalosti Static Priorities Processes are assigned static priorities the process with the highest priority is sched uled Either constant guantum is used or shorter guantum is coupled with higher voluntary premption priority and vice versa kdyz se mi to hodi aktivne Confusion can arise when comparing priorit
265. s Finally a tree form directory uses trees to store the array of entries the file name hashes and the array of a few largest free entries in the entry extents Attributes use either short form leaf form node form or tree form depending on their size The forms of attribute storage are similar to the forms of directory storage 133 Chapter 5 File Subsystem CDFS R O a variace slow seek je pomalej nerefuje se je potfeba ostfit deskriptory povinny i volitelny except that the names and values of attributes are kept together with name hashes while the entries were kept separate from name hashes Metadata modifications are journalled References 1 SGI XFS Filesystem Structure http oss sgi com projects xfs papers xfs filesystem structure pd 2 SGI XFS Overview and Internals http oss sgi com projects xfs training index html Example CD File System Standard ISO9660 a ECMA119 Disk rozd len na sektory zpravidla 2048 bytes prvn ch 16 sektor pr zdnych pro bootstrap loader Zbytek disku je pops n sekvenc volume descriptors jeden per sektor nejd le it j je Primary Volume Descriptor s adresou root directory path table a dal mi zbyte nostmi copyright abstract bibinfo Adres e jsou usual stuff name of 30 chars max attributes soubor je ud n po te n m sektorem a d lkou teoreticky je mo n uv st v c adres ov ch primary volume descriptor Pol
266. s Update the data are cached reads and writes operate on cache write by one processor updates caches of other processors Exclusive the data are cached reads and writes by one processor invalidate caches of other processors The LL and SC instructions can be used to implement a variety of test and set and compare and swap operations The LL instruction reads data from memory and ad ditionally stores the address that was read in the LLaddr register and sets the LLbit register to true The processor will set the LLbit register to false whenever another processor performs a cache coherent write to the cache line containing the address stored in the LLaddr register The SC instruction stores data to memory if the LLbit register is true and returns the value of the LLbit register References 1 MIPS Technologies MIPS32 4K Processor Core Family Software User Manual 2 Joe Heinrich MIPS R4000 Microprocessor User Manual Example Memory Model In Java The combination of portability with parallelism necessitated the introduction of a memory model into the Java programming language The rules of the memory model are as follows Operations of a single thread are carried out in the program order The lock and unlock methods on the same object order the locking and un locking threads The start and join methods on the same object order the calling and called threads Writing and reading the same volatile field orders t
267. s sor attention the handling of the hold signal is delegated to a direct memory access controller The controller has several transfer request inputs associated with transfer counters and takes care of taking over the processor bus and setting the address and control signals during transfer Example ISA Bus The ISA Industry Standard Architecture bus DMA cycle is commenced by the pe ripheral device requesting a transfer using one of the DRO DMA Request signals There are 4 or 8 DRQ signals DRQO to DRQ3 or DRO7 and 4 or 8 corresponding DACK DMA Acknowledge signals DACKO to DACK3 or DACKT each associated with one set of transfer counters in the controller When the controller sees the peripheral device requesting a transfer it asks the pro cessor to relinquish the bus using the HRO Hold Request signal The processor answers with the HLDA Hold Acknowledge signal and relinquishes the bus This typically happens at the end of a machine cycle Once the bus is not used by the processor the controller performs the device to memory or memory to device bus transfer in a manner that is very similar to the nor mal processor to memory or memory to processor bus transfer The controller sends the memory address on the address bus together with the AEN Address Enable sig nal to indicate that the address is valid responds to the peripheral device requesting the transfer using one of the DACK signals and juggles the MEMW and IOR or the M
268. s and ELF segments The sections carry information useful for static linking the segments carry information useful for dynamic execution An ELF file does not need to support both linking and execution object files can only contain sections and executable files can only contain segments The ELF header is the very first part of an ELF file and describes the structure of the file Besides the usual magic number that identifies the ELF format it tells the exact type of the file including address size and data encoding and the processor architec ture that the program is for Other important fields include the starting address of the program objdump f bin bash bin bash file format elf32 i386 architecture i386 flags 0x00000112 EXEC P HAS SYMS D PAGED start address 0x0805b4e0 gt objdump f lib libc so 6 lib libc so 6 file format elf32 i386 architecture i386 flags 0x00000150 HAS SYMS DYNAMIC D PAGED start address 0x00015070 Figure 2 5 ELF File Format Header Example The section header table lists all the sections of the file Each section has a name a type a position and length within the file and flags Examples of important sections include bss a section that represents the uninitialized memory of the program data a section that contains the static variables e text a section that contains the program code init and fini sections that contain the program code responsible
269. s device independent and makes it possible to safely share devices among multiple applications The operating system concentrates the code for controlling specific devices in device drivers The details of controlling individual devices tend to depend on the device model version manufacturer and other factors A device driver can hide these de tails behind an interface that is the same for a class of similar devices This makes it possible to keep the rest of the operating system code largely device independent as well To be done Architektura I O syst mu P tomnost p eru en ovliv uje strukturu driveru bude m t st obsluhuj c po adavky na p eru en od hardware kter je volan asynchronn kdykoliv p ijde p eru en a st obsluhuj c po adavky na operace od software kter je volan synchronn kdy aplikace nebo opera n syst m zavolaj ovlada Mezi t mito stmi se komunikuje v t inou pomoc front a buffer vznik probl m se zamyk n m takto sd len ch dat proto e obsluha p eru en od hardware m e b et sou asn s obsluhou operace od software Tento probl m se e pou it m mechanism kter dovoluj napl novat na pozd ji vykon n operac kter jsou sou st obsluhy p eru en od hardware v Linuxu bottom half handlers a tasklets ve Windows deferred procedure calls v Solarisu pinned interrupt thread pools Pro ozna en t chto dvou st driveru
270. s supported a fragment should generally fit a memory page on the host A dirent node which contains the inode number of the directory that the entry belongs to and the name and inode number of the file that the entry describes Acleanmarker node which marks a successfully erased block All the inode and dirent nodes contain a version number An update of a node is done by writing a new version of the node at the tail of the log When the file system is mounted the blocks of the log are scanned not necessarily from head to tail due to independent garbage collection of blocks creating an overview of the latest versions of all nodes Garbage collection frees space for the tail of the log by picking a random block and copying whatever of its content is not outdated to the tail of the log Statistical pref erence is given to blocks with at least some outdated content so that proper balance between precise wear levelling and increased wear associated with copying is main tained References 1 David Woodhouse JFFS The Journalling Flash File System http sources redhat com jffs2 jffs2 pdf Example Spiralog File System Tohle je zaj mavy syst m od Digitalu zaloZeny na log structure The file system consists of multiple servers and clerks Clerks run near client appli cations and are responsible for caching and cache coherency and ordered write back Servers run near disks and are reponsible for carrying out idempotent at
271. s used for process affinity and priority inher itance The scheduler always runs the thread with the highest priority Multiple threads with the same priority are run in a round robin fashion The time quanta are fixed at around 120 ms in server class Windows versions and variable from around 20 ms for background processes to around 60 ms for foreground processes in desktop class Windows versions Administrative privileges are required to set priorities from the realtime range To avoid the need for administrative privileges for running multimedia applications which could benefit from realtime priorities Windows Vista introduces the Multi media Class Scheduler Service Processes can register their threads with the service under classes such as Audio or Games and the service takes care of boosting the priority of the registered threads based on predefined registry settings Example Nemesis Deadline Scheduler Opera n syst m Nemesis Cambridge University 1994 1998 c lem je podpora Qual ity of Service pro distributed multimedia applications V Nemesis se pl nuj dom ny kernel p id l CPU dom n pomoc activation co nen nic jin ho ne upcall rutiny specifikovan v p slu n m domain control bloku V jimku tvo situace kdy byla dom n odebr na CPU d ve ne ta indikovala p ipravenost na dal activation v takov m p pad se pokra uje v m st odebr n CPU Krom stavu ready to run m e dom n
272. segmentu checkprot zkontrolov n ochrany fault handle page fault na dan adrese lockop zamknut a odemknut str nek swapout dost o uvoln n co nejv ce str nek sync dost o ulo en dirty str nek a samoz ejm bal k dal ch Spr vce seg vn um mapovat soubory jako shared nebo private P i private mapov n se pou v copy on write kter p i z pisu p emapuje str nku do anonymn pam ti Jako drobn rozhodnut kdy je dost pam ti vyhrad se nov str nka a nakop ruj se data kdy ne pou ije se sd len str nka kter t m p dem p estane b t sd len Anonymn pam je zaj mav p i prvn m pou it je automaticky zero filled co je d le it Spravuje j swapfs layer kter ale nefunguje p mo a e tak e by si pro ka dou str nku pamatoval pozici ve swap partition Kdyby to tak toti bylo spot e bov val by se swap je t ne by se v bec za alo str nkovat tak e m sto toho je ke ka d anonymn str nce struktura anon map kter si v p pad vyswapov n za pamatuje pozici na disku Prostor swapu je rezervov n ale nikoliv alokov n u v okam iku alokace str nky co dovoluje synchronn hl en out of memory do dos tupn ho swapu se po t i nezam en fyzick pam Pr nap klad AIX tohle nem a out of memory se hl s sign lem Velmi zaj mav je tak integrace spr vce soubor se spr vcem pam ti Jednak kv l
273. sesssesssessesecsesssesesess 13 Operating System SECTS osecsrenei irent iecit bd treve ebrios ttbi tier er n pagi oe 16 2 Process Management S 21 Process ALONE lt ci cte Rd e E EU E ER ca EE ERE dav e 21 Process And Thread Once pts iopdansine bod ar aq ebedisinln uM onde lese 21 Starting A PROGBSS PM MM 21 What Is The Interface esee tne tente tenente 30 INSIDER TE 34 Achieving Parallelism 1 cree ciere ttt re reae e needed 34 Multiprocessing On rniprOcessOIS sscisssnaissoossssenncinsousisnntssassascbensanscacibiionns 34 Multiprocessing On Multiprocessors see 39 Cooperative And Preemptive Switching ssssssssseeees 40 Switching In Kernel Or In Process iesus eie tnter stet erased 40 Proc ss Lifecycle am P 40 How To Decide Who Runs cccccccssssssssssscsescescsecccssesesscsascesssesecsecaseecscseces 41 What Is The Interface eese tenerent teet oa nn 49 INSOLITE O r ArNe T EASES ERES SENE aeS 50 Process Communication cccccsecssesseecesssssccesessesseeceeseeeececessesseseseseesseeceessseeeesessesees 52 Means Of Communication eerte treni tete trente testen 52 Shared Memory Lace O O ERE UA TIG Ma O eR ERIS 52 M ssage Passing niies er P iee Eire 53 Remote Procedure CAlI o eerte eerte den ooo one ete 57 REMC IDIOT ED TT 58 Process Synchronization ous iocceniac constitu i
274. single pointer addition operation Similarly tracing does not concern dead objects making deallocation potentially an empty operation All of this gets a bit more com plicated when destructors become involved though for a call to a destructor is not an empty operation The asynchronous nature of calls to destructors makes them unsuitable for code that frees contented resources A strict enforcement of referential integrity also requires garbage collection to handle situations where a call to a destructor associated with an unreachable block makes that block reachable again Rehearsal By now you should understand what a memory layout of a typical process looks like You should be able to describe how executable code static data heap and stack are stored in memory and what are their specific requirements with respect to process memory management Concerning the stack you should be able to explain how return addresses function arguments and local variables are stored on stack and how the contents of the stack can be elegantly accessed using relative addressing Concerning the heap you should be able to outline the criteria of efficient heap man agement and relate them to typical heap usage patterns You should be able to explain the working of common heap management algorithms in the light of these criteria and outline heap usage patterns for which these algorithms excel and fail You should be able to explain the principal approach to
275. size and the relative starting address Offset Length Contents 00h 2 Magic 0AA55h 02h 2 Length of last block 04h 2 Length of file in 512B blocks L 06h 2 umber of relocation table entries R 08h 2 Length of header in 16B blocks H 0Ah 2 inimum memory beyond program image in 16B blocks OCh 2 aximum memory beyond program image in 16B blocks OEh 4 Initial stack pointer setting SS SP 12h 2 File checksum 14h 4 Initial program counter setting CS IP 18h 2 Offset of relocation table 1Ch 1Ah 2 Overlay number ICh Rx4h Relocation table entries Hx10h L 200h Program image Figure 2 4 DOS EXE Format dynamicke mam v programu Linking linkovani staticke a dynamicke info o tom co chci volat na t is common for many programs to share the same libraries The libraries can be kterem miste v programu m inked to the program either statically during compilation or dynamically during byt adresa toho co chci volat execution Both approaches can have advantages static linking creates independent dynamicky linker poskytuje OSProsram images robust to system upgrades dynamic linking creates small program nahraje nekam do pameti images efficient in memory usage Both approaches reguire program image formats knihovny fyzicky jen jednou E E ibi bri exported symbols that the program image provides virtualne tolikrat kolikrat je potreba PA Symbols that the program mage requires tj musim ji rozdelit brn kod jen
276. slit stavov diagram p epnut m kontextu se proces dost v ze stavu ready to run do stavukernel se jakoby running a zp t p padn m eme ci p esn ji running in kernel a doplnit jestenakopiruje do adr pr stav running in application Mezi t mito stavy se p eskakuje syscalls a n vraty zkazdyho procesu nich a interrupts a n vraty z nich Vol n m sleep se proces ze stavu running in ker fyzicky 1x mapovan nel dostane do stavu asleep z n j se vol n m wakeup dostane do stavu ready tonX run Jm na t chto vol n jsou pouze p klady Specificky pro UNIX existuje je t stav o kdyz je chyba v kernelu kernel rozdelit na vrstvy s ruznymi pravy monoliticky kernel 1 kus kodu vsechna prava privileged mode kernel mode chyba gt leti mikrokernel kernel je jen mala cast OS OS sprava procesu pameti zarizeni user interface filesyste z toho mikrokernel pamet cca 1 2 management kodu zarizeni cely kod komunikace a staci extra procesy filesystem mgr i vic pager mem nekt dev drivers aplikace da se dobre i sitove RPC Process Lifecycle Virtualizace jak na jednom HW pustit nekolik OS o uroven vys nez u multitaskingu hodi se pro aplikace co se spolu perou treba nekolik apache servedited by Foxit Reader ruzne verze jedne systemove knihovny gt kazda aplikace a kazGpyright C by Foxit Corporation 2005 2009 user ma vlastni OS treba u hostingu jasny maloco vytizi pc For Eva
277. so specify a mapping object size in dwMaximumSize The function creates a file mapping object backed by the operating system paging file rather than by a named file LPVOID MapViewOfFile HANDLE hFileMappingObject DWORD dwDesiredAccess DWORD dwFileOffsetHigh DWORD dwFileOffsetLow DWORD dwNumberOfBytesToMap LPVOID MapViewOfFileEx HANDLE hFileMappingObject DWORD dwDesiredAccess DWORD dwFileOffsetHigh DWORD dwFileOffsetLow DWORD dwNumberOfBytesToMap LPVOID lpBaseAddress Flags FILE MAP WRITE FILE MAP READ FILE MAP ALL ACCESS FILE MAP COPY Address is suggested if the address is not free the call fails BOOL UnmapViewOfFile LPCVOID lpBaseAddress The address must be from a previous MapViewOfFile Ex call Message Passing Message passing is a mechanism that can send a message from one process to another The advantage of message passing is that it can be used between processes on a single system as well as between processes on multiple systems connected by a network without having to change the interface between the processes and message passing Message passing is synchronous when the procedure that sends a message can not return until the message is received Message passing is asynchronous when the procedure that sends a message can return before the message is received The procedures that send or receive a message are blocking when they can wait before returning and non block
278. sobem e pokud current itatel gt 0 current itatel current jmenovatel pokud current jmenovatel current itatel 0 obnovit p vodn hodnoty pokud current itatel 0 current jmenovatel ozna proces za violated deadline request period Jako example t i procesy A B C A dovol 3 z 5 B 4 z 5 C 7 z 10 kvantum bude rovno period Tabulka dky as sloupce current window constraint deadline 0 3 5 1 4 5 1 7 10 1 1 3 4 2 3 4 2 6 9 2 run A missed B C 2 2 3 3 2 3 3 5 8 3 run C missed A B 47 Chapter 2 Process Management 46 3 2 2 4 1 2 4 4 7 4 run A missed B C 4 1 1 5 1 1 5 3 6 5 run B missed A C 5 3 5 6 4 5 6 3 5 6 run C missed A B Funkce je celkem jasn Dokud proces st h zmen uje se po et period zb vaj c ch do konce window Pokud se n hodou do konce window mohou v echny periody pro vihnout po et period do konce window se nezmen uje aby se t m zbyte n nevypl calo poveden window Kdy n jak proces nestihne zapo t se e nestihl Pokud se nestihlo fat ln proces se ozna za violated m se indikuje e scheduler nemohl zaru it window constraint Rozhodnut koho spustit pak bere nejprve ty procesy kter m nejv ce hroz pro vih nut periody Z t ch se berou nejprve ty procesy kter m hroz pro vihnut window constraint z t ch nejprve ty kter m ve window zb v v ce per
279. speed gt capacity Example Network File System jak aa to Three major versions of the NFS protocol are 2 3 and 4 Version 2 of the NFS protocol introduces the NFS protocol and the mount protocol RPC pou t na p vodn FSboth built over RPC Mount protokol dovoluje klientovi poslat mount request na speed probl m kdy server server odpov zasl n m file handle na mounted directory operace jsou MNT tam je 10 lid najednou UMNT UMNTALL DUMP mount list EXPORT export list File handle by m l ist J ie kazdy m 10x teoreticky b t opaque 32 bytes pro klienta typicky obsahuje file system ID I node pomalejc a generation ID NFS protokol pak nab z b n souborov operace s v jimkou open a sharing mus se n jak Close proto e je stateless GET SET na atributy LOOKUP READ WRITE CREATE piena it UR pe REMOVE MK RM na directories prost ed Bezstavovost s sebou samoz ejm nese ur it probl my Prvn jsou file permissions na tom zalo eno NFS UNIX je standardn testuje pouze p i otev en NFS mus po d jako e en se per prava API missions p i otev en testuj na klientovi a sd l se prostor UID a GID a relaxuj se mount protocol n kter kontroly vlastn k souboru m e v e pr vo execute implikuje pr vo read filehandle identifikacepalsi je maz n otev en ch soubor op t se e na klientovi Posledn zm n n z l _ jeatomicita operac
280. spozici informace o v m n informac mezi aktivitami co zpravidla nejsou nen rozhodnuteln zda m e exis tovat posloupnost akc dovoluj c v kone n m efektu aktivit n jakou akci Proto se vym lej je t jin modely Dal z klasick ch je zalo en na security levels a integrity levels Aktivity maj clearances data maj classes ekne se e nen p Z ys wus pustn st informace z vy ch security classes ne m me clearances ani zapisovat informace do ni ch security classes ne m me clearances a podobn e nen p pustn zapisovat informace do vy ch integrity classes ani st informace z ni ch integrity classes Tohle m ov em jin probl m toti k dokonal implementaci by bylo pot eba sle dovat ka d bit informace co by bylo n kladn a tedy se pou vaj zjednodu en MAM kem toho je pozvoln drift dat do vy ch security a ni ch integrity classes Example Security Enhanced Linux The framework introduces policies that tell how subjects processes can manipulate objects devices files sockets Subjects and objects have types which are stored in a security context in the form of a triplet of user role type Security context of files is stored in extended attributes To be done Rehearsal Questions 1 Vysv tlete termin authorization 2 Vysv tlete co to je access control list 160 Chapter 7 Secu
281. sses where enough empty space can be set aside between the blocks without exhausting the virtual address space or by using hardware that supports segmenta tion where blocks can be moved in the virtual address space as necessary Chapter 3 Memory Management Example Virtual Address Space Of A Linux Process In Linux the location of blocks of memory within the virtual address space of a pro cess is exported by the virtual memory manager of the operating system in the maps file of the proc filesystem gt cat proc self maps 00111000 00234000 r xp 00000000 03 01 3653725 lib libc 2 3 5 so 00234000 00236000 r xp 00123000 03 01 3653725 lib libc 2 3 5 so 00236000 00238000 rwxp 00125000 03 01 3653725 lib libe 2 3 5 s0 00238000 0023a000 rwxp 00238000 00 00 0 007b5000 007c 000 r xp 00000000 03 01 3653658 l1ibfld 2 3 5480 007c 000 007d0000 r xp 00019000 03 01 3653658 lib 1d 2 3 5 so 007d0000 007d1000 rwxp 0001a000 03 01 3653658 lib ld 2 9 5 80 008ed000 008ee000 r xp 008ed000 00 00 0 vdso 08048000 0804d000 r xp 00000000 03 01 3473470 bin cat 0804d000 0804e000 rw p 00004000 03 01 3473470 bin cat 09ab8000 09ad9000 rw p 09ab8000 00 00 0 heap b7d88000 b7 88000 r p 00000000 03 01 6750409 usr lib locale local b7 88000 b7 89000 rw p b7 88000 00 00 0 b7 96000 b7 97000 rw p b7 96000 00 00 0 bfd81000 bfd97000 rw p bfd81000 00 00 0 stack The example shows the location of blocks of memory
282. sses and threads are considered the processor context is typically associated with the thread rather than the process Exceptions to this rule include special purpose registers whose content does not concern the execution of the thread but rather the execution of the process Context switching and similar operations that involve saving and restoring the pro cessor context especially interrupt and exception handling and system calls happen very frequently Processors therefore often include special support for these opera tions Example Intel Processor Context Switching The Intel 80x86 line of processors provides multiple mechanisms to support context switching The simplest of those is the ability to switch to a different stack when switching to a different privilege level This mechanism makes it possible to switch the processor context without using the stack of the executing process Although not essential this ability can be useful when the stack of the executing process must not be used for example to avoid overflowing or mask debugging Another context switching support mechanism is the ability to save and restore the entire processor context of the executing process to and from the TSS Task State Seg ment structure as a part of a control transfer One issue associated with this ability is efficiency On Intel 80486 a control transfer using the CALL instruction with TSS takes 170 to 180 clock cycles A control transfer using th
283. stacks is covered by keeping the pointer to the top of the stack associated with the thread rather than the process often as a part of the processor state rather than the memory state Exceptions to this rule include thread local storage whose content does not concern the execution of a process but rather the execution of a thread Other State The process state can contain other parts besides the processor state and the memory state Typically these parts of the process state are associated with the devices that the process accesses and the manner in which they are saved and restored depends on the manner in which the devices are accessed Most often a process accesses a device through the operating system rather than directly The operating system provides an abstract interface that simplifies the de vice state visible to the process and keeps track of this abstract device state for each process It is not necessary to save and restore the abstract device state since the operating system decides which state to associate with which process In some cases a process might need to access a device directly In such a situation the operating system either has to save and restore the device state or guarantee an exclusive access to the device Multiprocessing On Multiprocessors viz poznamky vyse Parallelism on machines with multiple processors or multiprocessors Vedle b hu v ce proces pomoc opakovan ho p ep n n kontext
284. stem abstrac tions The following sections summarize well known concepts found in contempo rary hardware and well known structure and abstractions found in contemporary operating systems The sections are styled as a crash course on things either known in general or outside the scope of this book presented to familiarize the reader with the background and terminology Needless to say none of the things outlined here is definitive Rather than that they simply appear as good solutions at this time and can be replaced by better solutions any time in the future Hardware Building Blocks Contemporary hardware builds on semiconductor logic One of the basic elements of the semiconductor logic is a transistor which can act as an active switching element creating a connection between its collector and emitter pins when current is injected into its base pin The transistor can be used to build gates that implement simple logical functions such as AND and OR as sketched below 5V 5V AxB A B Figure 1 1 Principle Of Composing NAND And NOR Gates From Transistors Note that the principal illustration uses bipolar transistors in place of more practical field effect transistors and a simplified composition out of individual transistors in place of more practical direct gate construction Gates that implement simple logical functions can be used to construct more complex functions such as buffers shifters decoders arithmetic units and othe
285. strained scheduler k m n nap klad jako patch pro Linux Ka d proces m zadanou request period to je as za kter mus dostat timeslice a window constraint to je zlomek missed total vyjad uj c e z ka d ho okna o total period ch se sm pro vihnout missed period Scheduler vyb r procesy z kladn podle earliest deadline first algoritmu spust se proces kter mu nejd v vypr perioda P edpokl d se ale e jsou periody zaokrouhleny na nejbli konec timeslice tedy je pravd podobn e ad proces vypr perioda stejn Z t ch kter m kon perioda stejn se spust ten kter m nejmen window constraint Z t ch kter maj nulov itatel window constraint se spust ten kter m nejv t jmenovatel window constraint P i v em stejn m round robin Pokud se proces poda ilo obslou it p ed koncem deadline uprav se window con straint a deadline n sleduj c m zp sobem pokud current jmenovatel gt current itatel jmenovatel nebo pokud current jmenovatel current itatel gt current itatel current jmenovatel pokud current jmenovatel current itatel 0 obnovit p vodn hodnoty pokud je proces ozna en za violated obnovit p vodn hodnoty a odzna it ho deadline request period Pro v echny procesy neobslou en p ed koncem deadline se perioda vynech a win dow constraint a deadline se uprav n sleduj c m zp
286. syst mu pro pr ci s adres i Funkce rozhran uve te v etn argument a s mantiky 5 Popi te obvykl rozhran opera n ho syst mu pro zamyk n soubor Funkce rozhran uve te v etn argument a s mantiky 6 Vysv tlete rozd l mezi advisory a mandatory z mky pro zamyk n soubor Vysv tlete pro tyto druhy z mk existuj File Subsystem Internals na disku bloky nasek m fajl na bloky Disk Layout nasypu na disk pamatuju si kde le ej bloky soubor Bunch of blocks Tree Log Handling of Files seznam blok mu u ulo it na disk k tomu souboru P echod z p sek na disky prvn n pad se sekven n m ukl d n m soubor M to ext2 dv v hody toti rychlost a malou re ii Nev hodou je pot eba zn t p edem d lku as souboru a pochopiteln fragmentace muzu ulozit seznam n kam bokem fat Prvn n pad jak tohle odstranit je ud lat linked list C64 m l tohle na floppy nev hody jsou z ejm Extr mn pomal random access velikost blok nen vyleps en mocnina dvou patn se ma e a tak mu u si pro po sob jsouc bloky pamatovat rozsah ext4 ntfs Dal modifikace je nechat linked list ale vyt hnout ho z blok a d t do tabulky Typick e en MS DOSu Nev hodn to za ne b t kdy se cel tahle tabulka nevejde do pam ti lidi napadlo m t j po kousk ch u soubor v sledek je t eba CP M nebo UNIX Co d lat kdy je tahle
287. system looks like both for the case when the operating system code resides in the user space and for the case when the operating system code resides in the kernel space 8 Propose an interface through which a process can start another process and wait for termination of another process Achieving Parallelism 34 The operating system is responsible for running processes as necessary In the sim plest case the operating system runs processes one at a time from beginning to com pletion This is called sequential processing or batch processing Running processes one at a time however means that each process usurps the whole computer for as long as it runs which can be both inflexible and inefficient The operating system therefore runs multiple processes in parallel This is called multiprocessing Multiprocessing On Uniprocessors Multiprocessing on machines with a single processor or uniprocessors is based on the ability of the operating system to suspend a process by setting its state aside and later resume the process by picking its state up from where it was set aside Processes Edited by Foxit Reader Copyright C by Foxit Corporation 2005 2009 For Evaluation Only Chapter 2 Process Management can be suspended and resumed frequently enough to achieve an illusion of multiple processes running in parallel In multiprocessing terminology the state of a process is called process context with the act of setting the process
288. t this point you should know how processes can exchange information You should be able to distinguish the various ways of exchanging information based on their applicability efficiency and utility You should be able to characterize basic properties of message passing mechanisms and to relate these properties to both the architecture of the operating system and the reguirements of the applications Based on your knowledge of how processes communicate using message passing you should be able to design an intelligent message passing API You should be able to explain how remote procedure calls mimic local procedure calls and how certain issues limit ideal substitutability of the two mechanisms You should be able to explain why the code of stubs can be generated and what information is necessary for that Ouestions 1 Propose an interface through which a process can set up a shared memory block with another process 2 Define synchronous and asynchronous message passing 3 Define blocking and non blocking message sending and reception 4 Explain how polling can be used to achieve non blocking message reception 5 Explain how callbacks can be used to achieve non blocking message reception 6 Explain when a synchronous message sending blocks the sender 7 Explain when an asynchronous message sending blocks the sender 8 Propose a blocking interface through which a process can send a message to another process Use synchronous messa
289. tabulka moc velk CP M p id v line rn dal bloky adres speci ln soubor UNIX stromov v tv I nodes seznam jmen indexov n 44 a 3 P TE x x 4 z Ukl d n aloka n ch informac o souborech do adres ov ch polo ek t eba al SUEFOm 0 lO ADIA os twee M Z v p aspis 2 rions qeiasski ers Prae CP M m je amp t jednu zna nou nev hodu Pokud toti nen mo n odd lit jm no e ren souboru od jeho aloka n informace nen mo n d lat hard linky Handling of Directories voln m sto typicky bitmapa Trivi ln p pad jedno rov ov ho adres e koncept ROOTu v MS DOSu Hierar chick adres e ukl d n podadres do nad azen ch adre DOS ck klasika tot u UNIXu Jak do adres e zad lat linky koncept hard linku a symbolic linku 126 Chapter 5 File Subsystem V hody a nev hody hardlinku rychlost patn maz n nep ekro hranici file sys t mu Symbolick link tot Drobnost p i z lohov n a kop rov n soubor nutnost slu by rozezn vaj c link od norm ln ho souboru Handling of Free Space P id lov n voln ho m sta probl m velikosti blok Hlediska pro v t bloky rychlost mal re ie hlediska pro men bloky mal fragmentace Evidence voln ho m sta seznam voln ch blok a bitmapy Funkce bitmap je jasn seznam voln ch blok se zd b t nev hodn a na schopnost m t v pam ti
290. tabulky kde osy ur uj aktivitu a prost edek zapisuj povolen akce Udr ovat takovou tabulku v kuse by v ak bylo nepraktick tak e se ukl d po skupin ch odtud access control lists a capabilities Access Control Lists Access control lists je technika kde se s ka d m prost edkem ulo seznam aktivit a jim dovolen ch akc ACL um leckter UNIXy v t ch jsou zpravidla jako aktivity br ni users nebo groups a akce jsou klasick RWX nad soubory Z t ho principu vlastn vych zej i standardn atributy u UNIX soubor Nev hod ACL je ada z ejm nejv t z nich je stati nost vzhledem k aktivit m kv li kter se ACL d laj pro users a ne pro processes To m e v st k situaci kdy procesy maj zbyte n siln pr va e se mimo jin vytv en m pseudo users pro n kter procesy i dodate n m omezov n m pr v Dal v c je scalability vlastn nut m prost edky ukl dat informace o aktivit ch kter ch m e b t hafo Odtud pokusy o d d n pr v z hierarchicky nad azen ch objekt a zaznamen v n zm n sdru ov n pr v do skupin a podobn 159 Chapter 7 Security Subsystem Capabilities Capabilities je technika kdy si ka d aktivita nese seznam prost edk a nad nimi povolen ch akc P i p stupu k prost edku se pak aktivita prok e svou capabil ity kterou syst m verifikuje Toto je mechanizmus kter b n syst my p li asto nem
291. te ale preklopi se az za chvilku proto pracuju v taktu je danej cas vyhrazenej pro preklopeni Figure 1 3 Basic Computer Architecture Example At the core of the architecture is the control unit of the processor In steps timed by the external clock signal the control unit repeats an infinite cycle of fetching a code of the instruction to be executed from memory decoding the instruction fetching the operands of the instruction executing the instruction storing the results of the instruction The control unit uses the arithmetic and logic unit to execute arithmetic and logic instructions Processor Bus The control unit of the processor communicates with the rest of the architecture through a processor bus which can be viewed as consisting of three distinct sets of wires denoted as address bus data bus and control bus The address bus is a set of wires used to communicate an address The data bus is a set of wires used to com municate data The control bus is a set of wires with functions other than those of the address and data buses especially signals that tell when the information on the address and data buses is valid The exact working of the processor bus can be explained by a series of timing dia grams for basic operations such as memory read and memory write X Cas y napeti Edited by Foxit Reader Copyright C by Foxit Corporation 2005 2009 For Evaluation Only Chapter 1 Introduction CLK binary clock si
292. than shared access to files Conceptually the file system uses traditional disk layout with storage pools of blocks bitmaps to keep track of block usage distributed index nodes that point to lists of blocks stored in as many levels of a branching hi erarchy as required by file size and journals to maintain metadata consistency The distribution relies on most data structures occupying entire blocks and on introduc ing a distributed block locking protocol GFS supports pluggable block locking protocols Three block locking protocols cur rently available are DLM Distributed Locking Manager uses distributed architecture with a distributed directory of migrating lock instances GULM Grand Unified Locking Manager uses client server architecture with repli cated servers and majority quora NOLOCK makes it possible to completely remove locking and use GFS locally Computational Resource Sharing Network Load Balancing Klasick aplikace v distribuovan m syst mu kde se procesy pfesouvaj na m n za t en uzly Sna se o ni i klasick syst my nap klad Mosix i Beowulf pro Linux Problem with uniform resource access Example Mosix The goal of Mosix is to build clusters of homogeneous computers that allow trans parent load balancing Mosix has been developed since 1981 for various flavors of Unix and finally settled on Linux Mosix spreads load among the computers in a cluster by migrating processes f
293. ting system into accessing protected resources on its behalf or into denying service The system call interface must be flexible Features such as wrapping or monitor ing services provided by the operating system should be available Adding new services without changing the system call interface for the existing services should be possible Note that services provided through the system call interface are typically wrapped by libraries and thus look as services provided by libraries This makes it possible to call all services in a uniform way Example CP M System Call Interface CP M run on processors that did not distinguish any privileges Its system call in terface therefore did not have to solve many of the issues related to efficiency and robustness that concern contemporary systems Instead the system call interface has been designed with binary compatibility in mind 27 Chapter 2 Process Management 26 When calling the BDOS module the application placed a number identifying the requested service in register C other arguments of the requested service in other registers and called the BDOS entry point at address 5 The entry point contained a jump to the real BDOS entry point which could differ from system to system The services provided by BDOS included console I O and FCB based file operations ReadKey mvi Gpl keyboard read service call 5 call BDOS entry point cpi a 0Dh is returned key code E
294. tr nkov n LPVOID MapViewOfFile HANDLE hFileMappingObject DWORD dwDesiredAccess DWORD dwFileOffsetHigh DWORD dwFileOffsetLow DWORD dwNumberOfBytesToMap LPVOID MapViewOfFileEx HANDLE hFileMappingObject DWORD dwDesiredAccess DWORD dwFileOffsetHigh DWORD dwFileOffsetLow DWORD dwNumberOfBytesToMap LPVOID lpBaseAddress BOOL UnmapViewOfFile LPCVOID lpBaseAddress Namapuje objekt reprezentuj c mapovany soubor Flags FILE MAP WRITE FILE MAP READ FILE MAP ALL ACCESS FILE MAP COPY Ted opravdu nev m co se stane kdyZ tyto flagy odporuj flag m u CreateFileMapping asi chyba Whole File Operations To be done Example Linux Whole File Operations ssize t sendfile int out fd int in fd off t xoffset size t count To minimize the data copying overhead it is possible to copy the content of one file to another Currently only sending from a file to a socket is supported Example Windows Whole File Operations Directory Operations Prvn opera n syst my za naly s jedno rov ov m adres em e il se hlavn for m t jm na a atributy P i ly probl my s vyhled v n m a koliz jmen Objevily se v ce rov ov adres e a zaveden relativn ch odkaz v i current directory Jako posledn se objevila koncepce link kterou se dotvo il koncept adres ov ho grafu jak je zn m dnes jm no souboru je 122 jen atribut tj mu u na jeden soubor namapovat v c jmen
295. tr nky nebo p i z pisu libovoln str nky musel m t program v registru PSW shodn kl Registr PSW bylo mo n nastavit pouze v supervisor re imu 81 Chapter 3 Memory Management Example CDC Cyber 6000 each application had to be allocated a single partition starting at the address in Reference Address RA register limit at the length in Field Length FL register U fixed partitions bylo nav c vid t Ze p r mal ch aplikac m e zablokovat syst m na ne nosn dlouhou dobu pokud je mal po et partitions Aby se tomu zabr nilo zavedlo se periodick odkl d n proces na disk swapping U fixed partitions se tak za alo v ce nar et na situaci kdy se program v bec neve el do fyzick pam ti Zavedlo se postupn nahr v n st programu tak jak byly pou v ny overlaying Bohu el toto m rn zpomaluje vol n procedur m rn zat uje program tora a ne e probl m velk ho heapu Variable Partitions Proto e fixed partitions maj vysokou vnit n fragmentaci nejsou pro swapping p li vhodn Zavedly se tedy variable partitions princip je z ejm Probl mem variable partitions je extern fragmentace p padn tak mo nost zm ny velikosti segment za b hu Fragmentace by se mohla e it set s n m segment za b hu ale to se rad ji ned l proto e to dlouho trv a kv li relokaci to nemus b t trivi ln Example CDC Cyber 6000 mainframe ve v r
296. trukturu ste n asociativn TLB Obr zek cache je v MC68060 User s Manual Section 5 Caches Obr zek 5 4 vysv tluje strukturu ste n asociativn cache Cache dovoluje ty i re imy pr ce 85 Chapter 3 Memory Management 86 write through cachuje ten a zapisuje data rovnou copy back cachuje ten i z pis inhibited te i zapisuje data rovnou ve verzi precise zaru uje po ad p stupu shodn s po ad m instrukc ve verzi imprecise dovoluje n kter m ten m p edb hnout z pisy MIPS32 Address Translation Tohle se zn z Nachosu Na ipu je pouze TLB se 48 polo kami ka d polo ka ma puje dv str nky o velikosti od 4 KB do 16 MB po ty n sobc ch V polo ce je jeden address space ID porovn v se s ASID v CP0 a jedna virtu ln adresa dv fyz ick adresy pro sudou a lichou str nku smart proto e se porovn v podle virtu ln adresy pro virtu ln adresu maska ur uje velikost a flag global ignoruje se ASID pro ka dou str nku dirty a valid flag a detaily pro zen cache coherency nezaj mav Pro napln n polo ky TLB je k dispozici extra instrukce m e bu naplnit n hodnou polo ku nebo vybranou N hoda se odvozuje od po t n instruk n ch cykl tak je k dispozici wired TLB entry index registr kter k do kolika prvn ch polo ek TLB se n hoda nem strefovat Mimochodem je to v echno dost zjednodu en ale nevad po
297. tupn pro v echny z kazn ky Pokud ano p j pokud ne ek P edpokl d se e pokud si alespo jeden z kazn k m e vybrat pln limit asem bude muset n co vr tit a t m budou pen ze na uspokojen ostatn ch z kazn k What Is The Interface Kdy u v me kdy a pro a jak synchronizovat zb v se je t pod vat na to jak prost edky k synchronizaci m opera n syst m d t aplikac m Samoz ejm z t chto prost edk vypad vaj takov v ci jako je zak z n a povolen p eru en proto e k t m nem e opera n syst m aplikaci pustit Podobn t k je to s aktivn m ek n m proto e procesy nemus v dy sd let pam Tak e co zb v D le it m faktem je e dokud se pohybujeme v oblasti proces sd lej c ch pam sta n m jeden synchroniza n prost edek k naprogramov n ostatn ch Odtud pak obl ben lohy na naprogramov n jednoho synchroniza n ho prost edku pomoc jin ho API pro zamykani zamek gt lockt gt unlock gt private bool locked kdo zamyk zam enej z mek ek a bude odem enej pak implementuju synchro jako zamek lock critical section zamek unlock a implementace aspon lock while TS locked unlock locked false spinlock zamek s aktivn m ek n m rekurzivn lock co kdy vol m lock na z mku kterej mam zam enej spinlock pro aktivn ek n
298. u je mo n navrhnout tak syst m s n kolika procesory a na ka d m spou t t jin proces Typick jsou SMP Symmetric Multiprocessor architektury kde v echny procesory vid stejnou pam a periferie nebo NUMA Non Uniform Memory Access architektury kde v echny procesory vid stejnou pam a periferie ale p stup na n kter adresy je v razn optimalizov n pro n kter procesory Hyperthreading to be done Example Intel Multiprocessor Standard P edpokl d SMP Jeden procesor se definuje jako bootstrap processor BSP ostatn jako application processors AP spojen jsou p es 82489 APIC Po resetu je funk n pouze BSP v echny AP jsou ve stavu HALT APIC dod v p eru en pouze PIC u BSP a je povolen maskov n A20 ach ta zp tn kompatibilita BIOS vypln speci ln datovou strukturu popisuj c po et procesor po et sb rnic zapojen p eru en a podobn a spust whatever system you have Syst m pak p iprav startup k d pro 39 Chapter 2 Process Management prepnuti kontextu zapisu registry nactu registry pamet pres mapovani strankovaci tabulky zarizeni procesy pristupuji pres operacni system tj neresi se instrukce assembleru vetsina z nich neco nekam presouva prepnuti vlaken jednoho procesu staci prepnout kontext procesoru pamet je sdilena typicky postup kontext si ulozim na svuj zasobnik zasobnik vlakna zapamatuju si umisten
299. u soubor NFS AXI 3 Vysv tlete jak probl my p in bezstavovost NFS p i testov n p stupov ch pr v a jak jsou tyto probl my e eny sue 4 Vysv tlete jak probl my p in bezstavovost NFS p i maz n otev en ch soubor a jak jsou tyto probl my e eny AXI 5 Vysv tlete jak probl my p in bezstavovost NFS p i zamyk n soubor a jak jsou tyto probl my e eny Still a sketch Understanding is essential Understanding is essential Understanding is recommended Just a curiosity Just a curiosity Understanding is essential Understanding is essential Understanding is essential Just a curiosity Understanding is recommended Just a curiosity Understanding is recommended Understanding is recommended 155 Chapter 6 Network Subsystem 156 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Understanding is recommended Understanding is optional Understanding is optional Understanding is recommended Just a curiosity Understanding is essential Understanding is essential Understanding is recommended Understanding is optional Understanding is optional Just a curiosity Just a curiosity Understanding is essential Understanding is recommended Understanding is optional Understanding is optional Understanding is optional Just a curiosity Just a curiosity Just a cur
300. ue Podle syst mem po tan priority se udr uje tak d lka kvanta ni priority maj del kvantum proto e se ek e tak asto nepob a tak kdy u se dostanou na adu tak a n co ud laj System Pou v priority 60 99 pro procesy b c v kernelu Tato t da je intern sys t mov ne ek se e by do n hrabali use i b v n t eba page daemon Proces kter p i vol n kernelu obdr kritick resources dostane do asn tak priority 60 99 Realtime Pou v priority 100 159 priorita procesu a p id lovan kvantum se nas tavuje syscallem priocntl a od okam iku nastaven se nem n Bordel je v tom e real time priority m e b t v t ne system priority tedy ob as by bylo pot eba p eru it kernel To se ale norm ln ned l proto e preemptivn kernel by byl slo it procesy se p ep naj nej ast ji p i opou t n kernel mode kdy se zkontroluje flag runrun indikuj c nutnost p epnout kontext Jako e en se k flagu runrun p id je t flag kprunrun indikuj c nutnost p enout kontext uvnit kernelu a definuj se body kdy je i v kernelu bezpe n p epnout kontext V t chto bodech se pak testuje kprunrun V sledkem je zkr cen prodlev p ed rozb hnut m ready to run realtime proces Tak se po t s t m e lov k si bude moci p id vat vlastn t dy priorit Ka d t da implementuje 7 obslu n
301. unix 3 STREAM CONNECTED var run dbus system bus socket unix 2 DGRAM G var run hal hotplug socket unix 2 DGRAM udevd unix 2 ACC STREAM LISTENING tmp xmms_ceres 0 unix 3 STREAM CONNECTED tmp X11 unix X0 unix 3 STREAM CONNECTED tmp ICE unix 4088 Example Linux Netlink Sockets Netlink sockets represent a class of sockets used for communication between pro cesses and kernel The sockets are represented by a netlink family that is specified in place of protocol when creating the socket NETLINK ARPD ARP table NETLINK ROUTE routing updates and modifications of IPv4 routing table e NETLINK ROUTE6 routing updates and modifications of IPv6 routing table NETLINK FIREWALL IPv4 firewall Messages sent over the netlink socket have a standardized format Macros and li braries are provided for handling messages of specific netlink families Example Windows Winsock Sockets From the application programmer perspective Winsock sockets offer an interface that is in principle based on that of the Berkeley sockets From the service program mer perspective Winsock offers an interface that allows service providers to install multiple protocol libraries underneath the unified API The interface called SPI Ser vice Provider Interface distinguishes two types of services transport and naming and allows layering of protocol libraries Remote Procedure Call This is described in detail in the Middleware materials
302. upuje str nka velikosti 2 n dr ty ni ch bit vedou p mo do pam ti cache typicky ste n as nesmim vyhazovat str nku ociativn polo ka 64B Chapter 3 Memory Management opera n syst m musim to hashovat bo to je opa n ne to pot ebuju pou t Proto e prohled v n tabulek p i ka d m p stupu do pam ti by bylo pomal vymyslel se Translation Lookaside Buffer kter je ov em jako ka d asociativn pam n kladn S TLB souvis je t dv d le it v ci jedna je vyprazd ov n TLB p i p epnut adresov ho prostoru druh je idea nechat spr vu a prohled v n str nkovac ch tabulek v hradn na opera n m syst mu a v hardware m t pouze TLB Page Replacement a kdy nemam volnej frame hrazov n str nek je pochopiteln v da jde o to vyhodit v dycky tu spr vnou to je pozd musim to u m t p edem n hodn algoritmus nen pln patnej dlouhodob sp vyhazuje star str nky str nku Zjevn m krit riem je minimalizace po tu v padk str nek za b hu aplikace tedy optim ln algoritmus by vyhodil v dycky tu str nku kter bude pot eba za ne jdel dobu To se sice ned p edem zjistit ale jde ud lat jeden pr chod programu pro zm en v padk str nek a dal u s optim ln m str nkov n m pokud pro gram b deterministicky Tento algoritmus slou spolu s algoritmem vyb raj c m n
303. ushuje TLB Na multiprocesorech je pot eba flushnout TLB na v ech procesorech z toho vypl v nutnost synchronizace p i zm n mapov n a to je pomal Trikem se to d e it t eba u R4000 kde se procesu prost posune ASID m se invaliduj v echny jeho polo ky v TLB Virtual address caches V t ina caches sice pou v fyzick adresy ale proto e hardware s caches na virtu ln adresy m e b et rychleji ob as se tak objev Tam je pak stejn probl m jako u TLB Example Linux HAL jako rozhran kter zp stup uje memory manager dan platformy zbytek ker nelu p edpokl d troj rov ov str nkov n Linux 2 4 20 nap klad include asm i386 pgtable 2level h a include asm i386 pgtable 3level h Linux 2 6 9 nap klad pgtable 2level h a pgtable 2level defs h a pgtable 3level h a pgtable 3level defs h a pgtable h v include asm i386 Neznamen to e by se n jak simulovaly 3 rovn kernelu pro 2 rovn procesoru prost se v makrech ekne e ve druh rovni je jen 1 polo ka Fyzick str nky se eviduj strukturami struct page mem map t v seznamu mem map jako algoritmus pro v b r ob ti se z ejm pou v LRU bez bli ch detail proto e kernel vypad v elijak Linux 2 4 20 include linux mm h 87 Chapter 3 Memory Management 86 Fyzick str nky jsou p i azeny do z n kter odr ej omezen pro n kter rozsahy fyzick pam ti n
304. vaj ale u distribuovan ch syst m nach z zna n uplatn n p klady jsou ca pabilities u Amoeby Machu i EROSu nebo credentials v CORBE Probl mem capabilities je ot zka kam je um stit Samoz ejm nen mo n d t je jen tak k dispozici proces m proto e ty by mohly zkou et je pad lat Jedn m z e en je m t capabilities v protected pam ti to je t eba p klad Machu procesy maj jen handles do sv ch tabulek capabilities tabulky samy jsou v kernelu Jin e en je ifrov n capabilities to d l Amoeba Ka d objekt m u sebe 48 bit n hodn slo toto slo plus rights z capability se pro enou oneway funkc a ta se p id do capabil ity kterou m u ivatel k dispozici Pokud nem to t st nem e si zm nit capability aby ukazovala na jin objekt ani aby ne la jin pr va A koliv to na prvn pohled vypad jako e capabilities a access control lists jsou ekvivalentn jsou v nich d le it rozd ly Capabilities mohou n le et jednotliv m proces m tedy je mo n je pou t nap klad p i ochran dat p ed vyzrazen m t m ze se untrusted proces m omez initial capabilities Levels delimit Security and Integrity Zp t na o n co vy rove model ochran zalo en na zmi ovan tabulce m jeden v n nedostatek toti nen z n j jasn patrn co a jak bude chr nit Je z ejm e pr va budou tranzitivn ale bohu el pokud nejsou k di
305. vide guarantees on theneudelat hit go vlastne behavior of scheduling ed peers el peds m e Turnaround requires that scheduling minimizes the duration of each task datama z disku running in kernel e Throughput requires that scheduling maximizes the number of completed tasks Vsechno kontrolovat syscall interrupt e Efficiency requires that scheduling maximizes the resource utilization v aplikaci syscallly running in userspace zabaleny do nejaky knihovny tj nevolan d to primo vypada to ruzne fronty pro blokujici volanMany of the requirements can conflict with each other Imagine a set of short tasks 4 jako funkce e Fairness requires that the operating system can provide guarantees on the equal sleep run in kernel treatment of tasks with the same scheduling parameters zamek disk sit and a single long task that for sake of simplicity do not content for other resources kdyz prijde intr od zarizeni than the processor A scheduler that aims for turnaround would execute the short ze jsem se dockal tak me to tasks first thus keeping the average duration of each task low A scheduler that aims probudi resp me to hodi for throughput would execute the tasks one by one in an arbitary order thus keeping do ready to run dokud cekam the overhead of context switching low A scheduler that aims for fairness would ex v nejake sleep fronte NEJSEN cute the tasks in parallel in a round robin order thus keeping the sha
306. within the virtual address space of the cat command The first column of the example shows the address of the blocks the second column shows the flags the third fourth fifth and sixth columns show the offset device inode and name of the file that is mapped into the block if any The blocks that contain executable code are easily distinguished by the executable flag Similarly the blocks that contain read only and read write static data are easily distinguished by the readable and writeable flags and the file that is mapped into the block Finally the blocks with the readable and writeable flags but no file contain the heap and the stack The address of the blocks is often randomized to prevent buffer overflow attacks on the process The attacks are carried out by supplying the process with an input that will cause the process to write past the end of the buffer allocated for the input When the buffer is a locally allocated variable it resides on the stack and being able to write past the end of the buffer means being able to modify return addresses that also reside on the stack The attack can therefore overwrite some of the input buffers with malicious machine code instructions to be executed and overwrite some of the return addresses to point to the malicious machine code instructions The process will then unwittingly execute the malicious machine code instructions by returning to the modified return address Randomizing the addresses of the
307. would contain code dealing with interrupt handling and context switch ing the layers above that would follow with device drivers memory management file systems user interface and finally the least privileged layer would contain the applications MULTICS is a prominent example of a layered operating system designed with eight layers formed into protection rings whose boundaries could only be crossed using specialized instructions Contemporary operating systems however do not use the layered design as it is deemed too restrictive and requires specific hardware support References 1 Multicians http www multicians org Microkernel Systems A microkernel design of the operating system architecture targets robustness The privileges granted to the individual parts of the operating system are restricted as much as possible and the communication between the parts relies on a specialized communication mechanisms that enforce the privileges as necessary The commu nication overhead inside the microkernel operating system can be higher than the communication overhead inside other software however research has shown this overhead to be manageable Experience with the microkernel design suggests that only very few individual parts of the operating system need to have more privileges than common applications The microkernel design therefore leads to a small system kernel accompanied by additional system applications that provide most o
308. xcOlb gt lsusb Bus 001 Bus 001 ID 046d ID 0000 Device 002 Device 001 cOlb Logitech 0000 Inc MX310 Optical Mouse lsusb vv s 1 2 Bus 001 Device 002 Device Descriptor bLength bDescriptorType bcdUSB bDeviceClass bDeviceSubClass bDeviceProtocol bMaxPacketSize0 idVendor idProduct bcdDevice iManufacturer iProduct iSerial bNumConfigurations Configuration Descriptor bLength bDescriptorType wTotalLength bNumInterfaces bConfigurationValue iConfiguration bmAttributes Remote Wakeup axPower Interface Descriptor bLength bDescriptorType bInterfaceNumber bAlternateSetting bNumEndpoints ID 046d c01b Logitech Inc MX310 Optical Mouse 18 Defined at Interface level 0x046d OxcOlb 18 00 1 Logitech Inc MX310 Optical Mouse Logitech USB PS 2 Optical Mouse 2 0 1 9 2 34 1 1 0 0 Oxa 98mA bInterfaceClass bInterfaceSubClass bInterfaceProtocol ilnterface bLength bDescriptorType bEndpointAddress bmAttributes Transfer Type Synch Type Usage Type Human Interface Devices Boot Interface Subclass Mouse ONFWRrRFOOSB WO Endpoint Descriptor 0x81 EP 1 IN Interrupt none Data 109 Chapter 4 Device Management devices bus kbd network printer audio gpm scanner power mgmt disk paralel port sensors PARALELN PORT B dat dr t adresa v pc co se tam na tu adresu poslalo tfeba tam nav sit ledky
309. y tou dobou v podstat jen device drivery Example CP M Table 3 1 Struktura pam ti CP M Adresa Obsah 0000h 0002h Warm start vector JN 0005h 0007h System call vector JN 005Ch 006Bh Parsed FCB 1 006Ch 007Bh Parsed FCB 2 0080h 00FFh Command tail area 0100h BDOS Transient program ai BDOS BIOS BDOS BIOS RTOP BIOS Fixed Partitions Pam se p i startu syst mu pevn rozd l na partitions do ka d partition se um st jedna aplikace V z vislosti na architektu e syst mu se mohou ud lat bu odd len fronty aplikac kter se budou zpracov vat v jednotliv ch partitions nebo jedna spole n fronta aplikac Example IBM OS 360 kal tomu multiprogramming with a fixed number of tasks MFT Pozd ji bylo zaveden multiprogramming with a variable number of tasks MVT Klasick probl my jsou vnit n fragmentace a umis ov n aplikaci do partitions s t m souvis tak probl m relokace a ochrany dat Relokace se e bu bez podpory hardware prostou pravou bin rn ho k du ap likace nebo s podporou hardware pak je zpravidla k dispozici b zov registr B zov registr m jednu drobnou p ednost tou je relokace za b hu aplikace Ochrana je mo n bu zaveden m pr v ke str nk m nebo omezen m adresov ho prostoru Example IBM 360 pam rozd len na 4KB bloky ka d m l 4b kl a p znak fetch protect P i ten fetch protected s
310. y necessary Example Linux Processor Context Switching For examples of a real processor context switching code for many different processor architectures check out the sources of Linux Each supported architecture has an extra subdirectory in the arch directory and an extra asm subdirectory in the include directory The processor context switching code is usually stored in file arch kernel entry S The following fragment contains the code for saving and restoring processor context on the Intel 80x86 line of processors from the Linux kernel before the changes that merged the support for 32 bit and 64 bit processors and made the code more compli cated The savE ALL and RESTORE ALL macros save and restore the proces sor registers to and from stack The fixup sections handle situations where segment registers contain invalid values that need to be zeroed out define SAVE ALL cld pushl es X pushl ds X pushl eax pushl ebp pushl edi pushl esi pushl edx R d YP yyy eA Eae un pe E v pushl ecx pushl ebx movl __USE movl edx movl edx es define RESTORE INT REG popl ebx N popl ecx N popl edx N popl esi N popl edi N popl ebp N popl eax define _ RESTORE REGS X I RESTORE INT REGS 111 popl ds N 2223 popl es N section fixup ax X 444 movl 0 esp jmp
311. y the processor and what these instructions look like You should be able to explain how the abstract concept of a process state maps to the content of memory and registers You should be able to outline how a process gets started and where the machine code instructions and the content of memory and registers comes from You should understand how machine code instructions address memory and how the location of the program image in memory relates to the addressing of memory You should understand how an operating system gets to the point where it can start an arbitrary process from the point where the computer has just been turned on You should know what facilities enable a process to interact with the system libraries and the operating system Based on your knowledge of how processes are used you should be able to design an intelligent API used to create and destroy processes and threads Questions 1 Explain what is a process 2 Explain how the operating system or the first application process gets started when a computer is switched on 3 Explain what it means to relocate a program and when and how a program is relocated 4 Explain what information is needed to relocate a program and where it is kept 5 Explain what it means to link a program and when and how a program is linked 6 Explain what information is needed to link a program and where it is kept 7 Explain what the interface between processes and the operating
312. yk se a tedy chyln hacky NFS3 u je m program MOUNTPROG version MOUNTVERS void MOUNTPROC NULL void 0 typedef struct exportnode xexports struct exportnode dirpath ex dir groups ex groups exports ex next l 151 Chapter 6 Network Subsystem ARTE zamky fhstatus MOUNTPROC_MNT dirpath 1 pu p mountlist MOUNTPROC DUMP void 2 kdy vypr z mek je void MOUNTPROC UMNT dirpath 3 neplatnej pokud jsem si void MOUNTPROC UMNTALL void 4 neza dal o prodlou en exports MOUNTPROC EXPORT void 5 kdy to spadne tak to neva exports MOUNTPROC EXPORTALL void 6 po k m co si kdo prodlou 1 to asi m l po uplynut 100005 grace period vim o vSech platnejch zamcich Version 3 of the NFS protocol introduces the NLM protocol for managing locks which can be used with any version of the NFS protocol Recovery of locks after BUSS crash is solved by introducing lease and grace periods The server only grants a lock autentikace compound ops zzet zeny commit on close delegations z vazek Version 4 of the NFS protocol abandons statelessness and integrates the mount NFS serveru upozornit klientaand NLM protocols and introduces security compound operations that can pass file na mo n zm ny souboru handle to each other extended attributes replication and migration client caching n co jako z mek kdy ten soubor chce
313. ynamically allocated variables than the virtual memory manager Usually the heap allocator resides within a shared library used by the processes of seznamy blok the operating system The kernel of the operating system has a separate heap alloca mam seznam str nek jsou tor souvisl hlavi ka velikost bloku Heap Allocators flag owned je n Obvykl mi po adavky na alok tor jsou rychlost schopnost rychle alokovat a uvolnit 100 neboPam spornost mal re ie dat alok toru a mal fragmentace funk nost resiz i new size ing align zero fill kolik cht l owned true za to frknu novou hlavi ku Alok tory eviduj volnou a obsazenou pam zpravidla bud pomoc seznam nebo pomoc bitmap Bitmapy maj dobrou efektivitu p i alokaci blok velikosti bl zk je jich granularit nev hodou je intern fragmentace taky se v nich blb hled voln blok po adovan d lky U linked lists asi taky nen co dodat re ie na seznam ex tern fragmentace sekven n hled n odd len seznamy pln ch a pr zdn ch blok first fit asem na zvl tn seznamy blok obvykl ch velikost aka zones scelov n voln ch blok za tku hodn mal ch blok next fit probl my sekven n prohled v n P i alokaci nov ho bloku je mo n pou t n kolik strategi Nejjednodu je first fit az PIE oda p padn modifikace next fit Dal m je best fit kter ov em
314. ze t count Ssize t write int fd void xbuf size t count ssize t pread int fd void buf size t count off t offset ssize t pwrite int fd void buf size t count off t offset The read and write operations are trivial more interesting are their vectorized and asynchronous counterparts Ssize t readv int fd struct iovec vector int count Ssize t writev int fd struct iovec xvector int count struct iovec void iov base size t iov len l int aio read struct aiocb xaiocbp int aio write struct aiocb xaiocbp int aio error struct aiocb xaiocbp Ssize t aio return struct aiocb xaiocbp int aio suspend struct aiocb xcblist int n struct timespec xtimeout int aio cancel int fd struct aiocb xaiocbp int lio listio int mode struct aiocb xlist int nent struct sigevent sig struct aiocb tnt aio fildes ortrf t aio offset void xaio buf Size t aio nbytes int aio regprio int posix_fadvise int fd off_t offset off_t len int advice int posix_fallocate int fd off_t offset off_t len Advice can be given on future use of the file Flags describing the future use include POSIX FADV NORMAL for no predictable pattern POSIX_FADV_SEQUENTIAL and POSIX_FADV_RANDOM for specific patterns POSIX_FADV_NOREUSE for data accessed once and POSIX_FADV_WILLNEED and POSIX_FADV_DONTNEED to determine whether data will be accessed in the near future Example Windows St

Tůmův text s mými vpisky

Contents

Download Pdf Manuals

Related Search

Related Contents

Tůmův text s m&yacute;mi vpisky

Contents

Download Pdf Manuals

Related Search

Related Contents

Tůmův text s mými vpisky