Home

Presentation Slides

image

Contents

1. Final Comments Python threads are a useful tool but you have to know how and when to use them e O bound processing only e Limit CPU bound processing to C extensions that release the GIL e Threads are not the only way Copyright C 2009 David Beazley http www dabeaz cot Copyright C 2009 David Beazley http www dabeaz cot Part 9 Processes and Messages Concept Message Passing Copyright C 2009 Davi An alternative to threads is to run multiple independent copies of the Python interpreter In separate processes Possibly on different machines Get the different interpreters to cooperate by having them send messages to each other id Beazley http www dabeaz co Message Passing send recv On the surface it s simple Each instance of Python is independent Programs just send and receive messages Two main issues What is a message What is the transport mechanism Copyright C 2009 David Beazley http www dabeaz com Messages e A message is just a bunch of bytes a buffer e A serialized representation of some data e Creating serialized data in Python is easy 114 Copyright C 2009 David Beazley http www dabeaz com pickle Module e A module for serializing objects Serializing an object onto a file import pickle pickle dump someobj e Unserializing an object from a file someobj pi
2. An Example Launching a subprocess and hooking up the child process via a pipe e Use the subprocess module import subprocess p subprocess Popen python child py stdin subprocess PIPE stdout subprocess PIPE p stdin write data Send data to subprocess p stdout read size Read data from subprocess Python Python p stdin pe sys stdin p stdout a pE sys stdout Copyright C 2009 David Beazley http www dabeaz com l 20 Pipes and Pickle e Most programmers would use the subprocess module to run separate programs and collect their output e g system commands e However if you put a pickling layer around the files it becomes much more interesting Becomes a communication channel where you can send just about any Python object Copyright C 2009 David Beazley http www dabeaz com l 2 l A Message Channel A class that wraps a pair of files channel py import pickle class Channel object def nitt self out_f in_f self out_f out_f self in f in f def send self item pickle dump item self out_f self out_f flush def recv self return pickle load self in_ Zi Send Receive implemented using pickle Copyright C 2009 David Beazley http www dabeaz com l 22 Some Sample Code e A sample child process child py import channel import sys ch channel Channel sys stdout sys stdin while True item ch recv ch send child item
3. e Parent process setup parent py import channel import subprocess p subprocess Popen python child py stdin subprocess PIPE stdout subprocess PIPE ch channel Channel p stdin p stdout Copyright C 2009 David Beazley http www dabeaz com 23 Some Sample Code e Using the child worker gt gt gt ch send Hello World Hello World gt gt gt ch send 42 Se An produced by the child gt gt gt ch send DEET 1 2 3 4 gt gt gt ch send host python org port 80 host python org port 80 gt gt gt This output is being e You can send almost any Python object numbers lists dictionaries instances etc Copyright C 2009 David Beazley http www dabeaz com l 24 Big Picture e Can easily have 0s 1000s of communicating Python en wee Herve Python J les Python Python Copyright C 2009 David Beazley http www dabeaz com Interlude e Message passing is a fairly general concept However it s also kind of nebulous in Python No agreed upon programming interface Vast number of implementation options Intersects with distributed objects RPC cross language messaging etc Copyright C 2009 David Beazley http www dabeaz com Part 10 The Multiprocessing Module Copyright C 2009 David Beazley http www dabeaz cot multiprocessing Module A new library module added in Python 2 6 Ori
4. e Consider this CPU bound function def count n while n gt 0 n 1 Sequential Execution count 100000000 count 100000000 e Threaded execution tl Thread target count args 100000000 tl start t2 Thread target count args 100000000 t2 start e Now you might expect two threads to run twice as fast on multiple CPU cores Copyright C 2009 David Beazley http www dabeaz com 79 Bizarre Results Performance comparison Dual Core 2Ghz Macbook OS X 10 5 6 Sequential 24 6s Threaded 45 5s 1 8X slower e f you disable one of the CPU cores Threaded 38 0s Insanely horrible performance Better performance with fewer CPU cores It makes no sense Copyright C 2009 David Beazley http www dabeaz com 80 Interlude e It s at this point that programmers often decide to abandon threads altogether e Or write a blog rant that vaguely describes how Python threads suck because of their failed attempt at Python supercomputing e Well yes there is definitely some suck going on but let s dig a little deeper 8l Copyright C 2009 David Beazley http www dabeaz com Part The Inside Story on Python Threads The horror The horror Col Kurtz 82 Copyright C 2009 David Beazley http www dabeaz com What is a Thread Python threads are real system threads e POSIX threads pthreads e Windows threads Fully man
5. An Introduction to Python Concurrency David Beazley http www dabeaz com Presented at USENIX Technical Conference San Diego June 2009 Copyright C 2009 David Beazley http www dabeaz cot This Tutorial Python An interpreted high level programming language that has a lot of support for systems programming and which integrates well with existing software in other languages e Concurrency Doing more than one thing at a time Of particular interest to programmers writing code for running on big iron but also of interest for users of multicore PCs Usually a bad idea except when it s not Copyright C 2009 David Beazley http www dabeaz cot Support Files ge Code samples and support files for this class http www dabeaz com usenix2009 concurrent Please go there and follow along Copyright C 2009 David Beazley http www dabeaz com An Overview We re going to explore the state of concurrent programming idioms being used in Python A look at tradeoffs and limitations Hopefully provide some clarity A tour of various parts of the standard library Goal is to go beyond the user manual and tie everything together into a bigger picture Copyright C 2009 David Beazley http www dabeaz com Disclaimers e The primary focus is on Python e This is not a tutorial on how to write concurrent programs or parallel algorithms e No mathematical proo
6. threading Condition Producer Thread Consumer Thread item produce_item with items cv with items_cv items append item x items pop 0 Do something with x e First you use the locking part of a CV synchronize access to shared data items Copyright C 2009 David Beazley http www dabeaz com 67 Condition Variables e Common Use Producer Consumer patterns items items_cv threading Condition Producer Thread Consumer Thread item produce_item with items_cv with items cv while not items items append item Mee ee items_cv notify x items pop 0 Do something with x Next you add signaling and waiting Here the producer signals the consumer that it put data into the shared list Copyright C 2009 David Beazley http www dabeaz com 68 Copyright C 2009 Davi Condition Variables Some tricky bits involving wait ZS Consumer Thread Before waiting you have with items_cv z while not items to acquire the lock items_cv wait items pop 0 Do something with x wait releases the lock when waiting and reacquires when woken Conditions are often transient and may not hold by the time wait returns So you must always double check hence the while loop 69 id Beazley http www dabeaz com Copyright C 2009 Davi Interlude Working with all of the synchronization primitives is a lot trickier than
7. args Py BEGIN ALLOW THREADS Threaded C code Py END ALLOW THREADS Copyright C 2009 David Beazley http www dabeaz com 0 l The GIL and C Extensions e The trouble with C extensions is that you have to make sure they do enough work e A dumb example mindless spinning void churn int n while n gt 0 n How big do you have to make n to actually see any kind of speedup on multiple cores Copyright C 2009 David Beazley http www dabeaz com 02 The GIL and C Extensions e Here s some Python test code def churner n count 1000000 while count gt 0 churn n C extension function count 1 Sequential execution churner n churner n Threaded execution tl threading Thread target churner args n t2 threading Thread target churner args n tl start t2 start Copyright C 2009 David Beazley http www dabeaz com l 03 The GIL and C Extensions Speedup of running two threads versus sequential execution 2 0 o CP 1 5 o Speedup 1 0 KR be Extension code Sp runs for 4 l og Ho microseconds A Won per call 0 2 500 5 000 7 500 10 000 n e Note 2 Ghz Intel Core Duo OS X 10 5 6 Copyright C 2009 David Beazley http www dabeaz com l 04 Why is the GIL there Simplifies the implementation of the Python interpreter okay sort of a lame excuse e Better suited for reference counting Pytho
8. Thread 4 e A check is simply made every 100 ticks 87 Copyright C 2009 David Beazley http www dabeaz com The Periodic Check What happens during the periodic check In the main thread only signal handlers will execute if there are any pending signals e Release and reacquisition of the GIL e That last bullet describes how multiple CPU bound threads get to run by briefly releasing the GIL other threads get a chance to run 88 Copyright C 2009 David Beazley http www dabeaz com What is a Tick Ticks loosely map to interpreter instructions gt gt gt import dis gt gt gt dis dis countdown def countdown n while n gt 0 0 SETUP_LOOP 33 to 36 print n 3 LOAD_FAST 0 n ninsl 6 LOAD CONST 1 0 i 9 COMPARE OP 4 gt e Instructions in Tick 12 JUMP_IF_FALSE 19 to 34 15 POP_TOP the Python VM 16 LOAD FAST 0 n i 19 PRINT_ITEM Tick2 20 PRINT NEWLINE 21 LOAD FAST 0 n Tick 3 24 LOAD CONST 2 1 27 INPLACE_SUBTRACT i 28 STORE_FAST 0 n Tick4 31 JUMP_ABSOLUTE 3 Copyright C 2009 David Beazley http www dabeaz com 89 Tick Execution e Interpreter ticks are not time based e Ticks don t have consistent execution times e Long operations can block everything gt gt gt nums xrange 100000000 gt gt gt 1 in nums gt tick 6 6 seconds False gt gt gt e Try hitting Ctrl C ticks are u
9. beazley Software Python 3 0 p multiprocessing Pool 2 Make a process pool digest_map for path dirs files in os walk TOPDIR for name in files fullname os path join path name digest_map fullname p apply_async compute digest fullname Go through the final dictionary and collect results for filename result in digest_map items digest_map filename result get e This runs in about 5 6 seconds Copyright C 2009 David Beazley http www dabeaz com 150 Part Alternatives to Threads and Processes Copyright C 2009 David Beazley http www dabeaz com Alternatives n certain kinds of applications programmers have turned to alternative approaches that don t rely on threads or processes Primarily this centers around asynchronous I O and I O multiplexing You try to make a single Python process run as fast as possible without any thread process overhead e g context switching stack space and so forth Copyright C 2009 David Beazley http www dabeaz com Two Approaches e There seems to be two schools of thought e Event driven programming e Turn all I O handling into events e Do everything through event handlers asyncore Twisted etc e Coroutines ge Cooperative multitasking all in Python e Tasklets green threads etc Copyright C 2009 David Beazley http www dabeaz com 53 Events and Asyncore asyncore
10. dabeaz cot Accessing Shared Data e The two threads Thread 1 Thread 2 Low level interpreter execution Thread 1 LOAD GLOBAL LOAD CONST 1 x 2 1 BINARY ADD STORE GLOBAL 1 x thread switch thread switch Thread 2 LOAD GLOBAL LOAD CONST BINARY_SUB STORE GLOBAL 1 x 1 x 2 1 Copyright C 2009 David Beazley http www dabeaz com 4 Low level interpreter code Thread 1 Thread 2 LOAD GLOBAL 1 x LOAD CONST 2 1 thread LOAD GLOBAL 1 x switch LOAD CONST 2 1 BINARY SUB E Ee STORE_GLOBAL 1 x BINARY_ADD ae STORE GLOBAL 1 x switch These operations get performed with a stale value of x The computation in Thread 2 is lost Copyright C 2009 David Beazley http www dabeaz com 42 Accessing Shared Data s this actually a real concern x 0 A shared value def foo global x for i in xrange 100000000 x 1 def bar global x for i in xrange 100000000 x 1 t1 threading Thread target foo t2 threading Thread target bar tl start t2 start tl join t2 join Wait for completion print x Expected result is 0 Yes the print produces a random nonsensical value each time e g 83412 or 1627732 Copyright C 2009 David Beazley http www dabeaz com 43 Race Conditions e The corruption of shared data due to thread scheduling is often known as a race condition e It s often quite diabolical a pr
11. library module e Implements a wrapper around sockets that turn all blocking I O operations into events from asyncore import dispatcher class MyApp dispatcher socket def handle_accept self accept def handle _connect self connect addr recv maxbytes def handle_read self send msg sites def handle write self Create a socket and wrap it s MyApp socket Copyright C 2009 David Beazley http www dabeaz com 54 Events and Asyncore e To run asyncore provides a central event loop based on I O multiplexing select poll import asyncore asyncore loop Run the event loop Event Loop select pol1l handle zi Copyright C 2009 David Beazley http www dabeaz com Asyncore Commentary Frankly asyncore is one of the ugliest most annoying mind boggling modules in the entire Python library e Combines all of the fun of network programming with the elegance of GUI programming sic e However if you use this module you can technically create programs that have concurrency without any threads processes Copyright C 2009 David Beazley http www dabeaz com Coroutines e An alternative concurrency approach is possible using Python generator functions coroutines e This is a little subtle but I ll give you the gist e Pret a quick refresher on generators Copyright C 2009 David Beazley http www dabeaz com 57
12. only works from other threads e A thread can t join itself 36 Copyright C 2009 David Beazley http www dabeaz com Daemonic Threads lf a thread runs forever make it daemonic t daemon True t setDaemon True e f you don t do this the interpreter will lock when the main thread exits waiting for the thread to terminate which never happens e Normally you use this for background tasks 37 Copyright C 2009 David Beazley http www dabeaz com Interlude e Creating threads is really easy e You can create thousands of them if you want Programming with threads is hard Really hard Q Why did the multithreaded chicken cross the road A to To other side get the Jason Whittington 38 Copyright C 2009 David Beazley http www dabeaz com Access to Shared Data e Threads share all of the data in your program e Thread scheduling is non deterministic ge Operations often take several steps and might be interrupted mid stream non atomic e Thus access to any kind of shared data is also non deterministic which is a really good way to have your head explode Copyright C 2009 David Beazley http www dabeaz cot Accessing Shared Data Consider a shared object x 0 And two threads that modify it Thread 1 Thread 2 It s possible that the resulting value will be unpredictably corrupted 40 Copyright C 2009 David Beazley http www
13. release Release the lock Similar to a normal lock except that it can be reacquired multiple times by the same thread e However each acquire must have a release ge Common use Code based locking where you re locking function method execution as opposed to data access Copyright C 2009 David Beazley http www dabeaz com 57 RLock Example e Implementing a kind of monitor object class Foo object lock threading RLock def bar self with Foo lock def spam self with Foo lock self bar Only one thread is allowed to execute methods in the class at any given time However methods can call other methods that are holding the lock in the same thread Copyright C 2009 David Beazley http www dabeaz com 58 Semaphores A counter based synchronization primitive m threading Semaphore n Create a semaphore m acquire Acquire m release Release e acquire Waits if the count is 0 otherwise decrements the count and continues release Increments the count and signals waiting threads if any Unlike locks acquire release can be called in any order and by any thread Copyright C 2009 David Beazley http www dabeaz com 59 Semaphore Uses e Resource control You can limit the number of threads performing certain operations For example performing database queries making network connections etc Signaling Semaphores can be
14. used to send signals between threads For example having one thread wake up another thread Copyright C 2009 David Beazley http www dabeaz com 60 Resource Control e Using a semaphore to limit resources sema threading Semaphore 5 Max 5 threads def fetch_page url sema acquire try u urllib urlopen url return u read finally sema release n this example only 5 threads can be executing the function at once if there are more they will have to wait 6l Copyright C 2009 David Beazley http www dabeaz com Thread Signaling e Using a semaphore to signal done threading Semaphore 0 Thread Thread 2 statements done acquire statements statements statements statements done release statements e Here acquire and release occur in different threads and in a different order Often used with producer consumer problems 62 Copyright C 2009 David Beazley http www dabeaz com Events e Event Objects e threading Event e isSet Return True if event set e set Set event e clear Clear event e wait Wait for event This can be used to have one or more threads wait for something to occur Setting an event will unblock all waiting threads simultaneously if any e Common use barriers notification Copyright C 2009 David Beazley http www dabeaz com 63 Event Example e Using an event to ensure proper initi
15. 09 David Beazley http www dabeaz com threading module e Python threads are defined by a class import time import threading class CountdownThread threading Thread def nit self count threading Thread init self self count count def run self This code while self count gt 0 print Counting down self count self count 1 the thread time sleep 5 return You inherit from Thread and redefine run executes in Copyright C 2009 David Beazley http www dabeaz com 3 3 threading module To launch create thread objects and call start tl CountdownThread 10 Create the thread object tl start Launch the thread t2 CountdownThread 20 Create another thread t2 start Launch e Threads execute until the run method stops Copyright C 2009 David Beazley http www dabeaz com 34 Functions as threads e Alternative method of launching threads def countdown count while count gt 0 print Counting down count count 1 time sleep 5 tl threading Thread target countdown args 10 tl start e Creates a Thread object but its run method just calls the given function Copyright C 2009 David Beazley http www dabeaz com 3 5 Once you start a thread it runs independently e Use t join to wait for a thread to exit t start Launch a thread Do other work wait for thread to finish t join Waits for thread t to exit e This
16. Generator Refresher ge Generator functions are commonly used to feed values to for loops iteration def countdown n while n gt 0 yield n n 1 for x in countdown 10 print x e Under the covers the countdown function executes on successive next calls gt gt gt c countdown 10 gt gt gt c next 10 gt gt gt c next 9 gt gt gt Copyright C 2009 David Beazley http www dabeaz com 58 An Insight Whenever a generator function hits the yield statement it suspends execution def countdown n while n gt 0 yield n n 1 e Here s the idea Instead of yielding a value a generator can yield control You can write a little scheduler that cycles between generators running each one until it explicitly yields Copyright C 2009 David Beazley http www dabeaz com l 59 Scheduling Example e First you set up a set of tasks def countdown_task n while n gt 0 print n yield n 1 A list of tasks to run from collections import deque tasks deque countdown_task 5 countdown_task 10 countdown_task 15 1 Each task is a generator function Copyright C 2009 David Beazley http www dabeaz com l 60 Scheduling Example Now run a task scheduler def scheduler tasks while tasks task tasks popleft try next task Run to the next yield tasks append task Reschedule except StopIteration pass Run it sched
17. aged by the host operating system e All scheduling thread switching e Represent threaded execution of the Python interpreter process written in C 83 Copyright C 2009 David Beazley http www dabeaz com The Infamous GIL e Here s the rub Only one Python thread can execute in the interpreter at once e There is a global interpreter lock that carefully controls thread execution e The GIL ensures that sure each thread gets exclusive access to the entire interpreter internals when it s running 84 Copyright C 2009 David Beazley http www dabeaz com GIL Behavior Whenever a thread runs it holds the GIL e However the GIL is released on blocking I O gt wo S IS BA SC L a SE Es Es wk SM Oe Ss x S d Cha AN GI V xS amp So w KX amp S So any time a thread is forced to wait other ready threads get their chance to run e Basically a kind of cooperative multitasking 85 Copyright C 2009 David Beazley http www dabeaz com CPU Bound Processing To deal with CPU bound threads the interpreter periodically performs a check By default every 100 interpreter ticks 86 Copyright C 2009 David Beazley http www dabeaz com The Check Interval The check interval is a global counter that is completely independent of thread scheduling oe 100 ticks S 100 ticks Main Thread gt 100 ticks Thread 3
18. alization init threading Event def worker init wait Wait until initialized statements def initialize statements Setting up statements Picea init set Done initializing Thread target worker start Launch workers Thread target worker start Thread target worker start initialize Initialize Copyright C 2009 David Beazley http www dabeaz com 64 Event Example e Using an event to signal completion def master item create_item evt Event Worker Thread worker send item evt cee gt item evt get_work Other processing processing processing Done evt set Wait for worker evt wait e Might use for asynchronous processing etc Copyright C 2009 David Beazley http www dabeaz com 65 Condition Variables e Condition Objects cv threading Condition lock cv acquire Acquire the underlying lock cv release Release the underlying lock cv wait Wait for condition cv notify Signal that a condition holds cv notifyAll Signal all threads waiting ge A combination of locking signaling e Lock is used to protect code that establishes some sort of condition e g data available Signal is used to notify other threads that a condition has changed state Copyright C 2009 David Beazley http www dabeaz com 66 Condition Variables e Common Use Producer Consumer patterns items items_cv
19. ckle load f e Here a file might be a file a pipe a wrapper around a socket etc Copyright C 2009 David Beazley http www dabeaz com l 5 pickle Module Pickle can also turn objects into byte strings import pickle Convert to a string s pickle dumps someobj Load from a string someobj pickle loads s You might use this embed a Python object into a message payload Copyright C 2009 David Beazley http www dabeaz com l 6 cPickle vs pickle There is an alternative implementation of pickle called cPickle written in C e Use it whenever possible it is much faster import cPickle as pickle pickle dump someobj f e There is some history involved There are a few things that cPickle can t do but they are somewhat obscure so don t worry about it Copyright C 2009 David Beazley http www dabeaz com Pickle Commentary e Using pickle is almost too easy e Almost any Python object works e Builtins lists dicts tuples etc e Instances of user defined classes e Recursive data structures e Exceptions Files and network connections e Running generators etc Copyright C 2009 David Beazley http www dabeaz com Message Transport Python has various low level mechanisms e Pipes ge Sockets e FIFOs e Libraries provide access to other systems e MPI XML RPC and many others Copyright C 2009 David Beazley http www dabeaz com l 9
20. eaz com Queue Example e Running the two processes if name main_ from multiprocessing import Process JoinableQueue q JoinableQueue Launch the consumer process cons_p Process target consumer args q cons_p daemon True cons D Start Run the producer function on some data sequence range 100 Replace with useful data producer sequence q Wait for the consumer to finish q join 143 Copyright C 2009 David Beazley http www dabeaz com Commentary e If you have written threaded programs that strictly stick to the queuing model they can probably be ported to multiprocessing e The following restrictions apply e Only objects compatible with pickle can be queued Tasks can not rely on any shared data other than a reference to the queue 144 Copyright C 2009 David Beazley http www dabeaz com Other Features e multiprocessing has many other features Process Pools e Shared objects and arrays Synchronization primitives e Managed objects ge Connections e Will briefly look at one of them Copyright C 2009 David Beazley http www dabeaz com 45 Process Pools e Creating a process pool p multiprocessing Pool numprocesses e Pools provide a high level interface for executing functions in worker processes e Let s look at an example Copyright C 2009 David Beazley http www dabeaz com l 46 Pool Example e Defi
21. eduling may be significant depends on the OS Copyright C 2009 David Beazley http www dabeaz com 93 CPU Bound Threads e As we saw earlier CPU bound threads have horrible performance properties Far worse than simple sequential execution 24 6 seconds sequential 45 5 seconds 2 threads e A big question Why e What is the source of that overhead Copyright C 2009 David Beazley http www dabeaz com 94 Signaling Overhead e GIL thread signaling is the source of that e After every 100 ticks the interpreter Locks a mutex e Signals on a condition variable semaphore where another thread is always waiting ge Because another thread is waiting extra pthreads processing and system calls get triggered to deliver the signal 95 Copyright C 2009 David Beazley http www dabeaz cot A Rough Measurement Sequential Execution OS X CPU e 736 Unix system calls e 117 Mach System Calls Two threads OS X CPU e 1149 Unix system calls e 3 3 Million Mach System Calls e Yow Look at that last figure 96 Copyright C 2009 David Beazley http www dabeaz cot Multiple CPU Cores e The penalty gets far worse on multiple cores e Two threads OS X CPU e 1149 Unix system calls e 3 3 Million Mach System Calls Two threads OS X 2 CPUs e 1149 Unix system calls e 9 5 Million Mach System calls 97 Copyright C 2009 David Beazley h
22. erformance you re not coding Python you re using C extensions This is what most of the big scientific computing hackers are doing It s called using the right tool for the job 23 Copyright C 2009 Davi Commentary Concurrency is usually a really bad option if you re merely trying to make an inefficient Python script run faster Because its interpreted you can often make huge gains by focusing on better algorithms or offloading work into C extensions For example a C extension might make a script run 20x faster vs the marginal improvement of parallelizing a slow script to run on a couple of CPU cores 24 id Beazley http www dabeaz com Part 3 Python Thread Programming 25 Copyright C 2009 David Beazley http www dabeaz com Concept Threads What most programmers think of when they hear about concurrent programming An independent task running inside a program e Shares resources with the main program memory files network connections etc Has its own independent flow of execution stack current instruction etc 26 Copyright C 2009 David Beazley http www dabeaz com Thread Basics python program py statement Program launch Python statement loads a program and starts WE executing statements D Copyright C 2009 David Beazley http www dabeaz com 27 Thread Basics python program py statement statemen
23. fs involving dining philosophers or anything like that will assume that you have had some prior exposure to topics such as threads message passing network programming etc Copyright C 2009 David Beazley http www dabeaz com Disclaimers like Python programming but this tutorial is not meant to be an advocacy talk e In fact we re going to be covering some pretty ugly e g sucky aspects of Python You might not even want to use Python by the end of this presentation e That s fine education is my main agenda Copyright C 2009 David Beazley http www dabeaz com Part Some Basic Concepts Copyright C 2009 David Beazley http www dabeaz com Concurrent Programming e Creation of programs that can work on more than one thing at a time Example A network server that communicates with several hundred clients all connected at once Example A big number crunching job that spreads its work across multiple CPUs Copyright C 2009 David Beazley htt Multitasking e Concurrency typically implies multitasking TaskA E ae run mun S neun task switch Task B run run e If only one CPU is available the only way it can run multiple tasks is by rapidly switching between them Copyright C 2009 David Beazley http www dabeaz com 9 Parallel Processing You may have parallelism many CPUs e Here you ofte
24. get consume_item item e Critical point You don t need locks here Copyright C 2009 David Beazley http www dabeaz com 74 Queue Signaling ge Queues also have a signaling mechanism q task_done Signal that work is done q join Wait for all work to be done e Many Python programmers don t know about this since it s relatively new e Used to determine when processing is done Producer Thread Consumer Thread for item in produce_items while True q put item item q get Wait for consumer consume_item item q join q task_done Copyright C 2009 David Beazley http www dabeaz com 75 Queue Programming e There are many ways to use queues e You can have as many consumers producers as you want hooked up to the same queue producer S consumer producer GEES consumer producer In practice try to keep it simple Copyright C 2009 David Beazley http www dabeaz com 76 Part 6 The Problem with Threads 77 Copyright C 2009 David Beazley http www dabeaz com An Inconvenient Truth e Thread programming quickly gets hairy End up with a huge mess of shared data locks queues and other synchronization primitives Which is really unfortunate because Python threads have some major limitations e Namely they have pathological performance 78 Copyright C 2009 David Beazley http www dabeaz com A Performance Test
25. ginally known as pyprocessing a third party extension module This is a module for writing concurrent Python programs based on communicating processes A module that is especially useful for concurrent CPU bound processing Copyright C 2009 David Beazley http www dabeaz cot Using multiprocessing e Here s the cool part You already know how to use multiprocessing Ata very high level it simply mirrors the thread programming interface e Instead of Thread objects you now work with Process objects Copyright C 2009 David Beazley http www dabeaz com 29 multiprocessing Example e Define tasks using a Process class import time import multiprocessing class CountdownProcess multiprocessing Process def init self count multiprocessing Process init self self count count def run self while self count gt 0 print Counting down self count self count 1 time sleep 5 return You inherit from Process and redefine run Copyright C 2009 David Beazley http www dabeaz com l 30 Launching Processes To launch same idea as with threads if name main_ pl CountdownProcess 10 Create the process object pl start Launch the process p2 CountdownProcess 20 Create another process p2 start Launch e Processes execute until run stops e A critical detail Always launch in main as shown required for Windows Copyrigh
26. it looks There are a lot of nasty corner cases and horrible things that can go wrong Bad performance deadlock livelock starvation bizarre CPU scheduling etc All are valid reasons to not use threads 70 id Beazley http www dabeaz com Part 5 Threads and Queues H Copyright C 2009 David Beazley http www dabeaz com Threads and Queues e Threaded programs are often easier to manage if they can be organized into producer consumer components connected by queues Thread send item Producer e Instead of sharing data threads only coordinate by sending data to each other e Think Unix pipes if you will 72 Copyright C 2009 David Beazley http www dabeaz com Queue Library Module Python has a thread safe queuing module e Basic operations from Queue import Queue q Queue maxsize Create a queue q put item Put an item on the queue q get Get an item from the queue q empty Check if empty q full Check if full e Usage You try to strictly adhere to get put operations If you do this you don t need to use other synchronization primitives Copyright C 2009 David Beazley http www dabeaz com 73 Queue Usage e Most commonly used to set up various forms of producer consumer problems from Queue import Queue q Queue Producer Thread Consumer Thread for item in produce_items while True q put item item q
27. level framework e The various components might be a mix of languages Python C C etc ge Concurrency may be a core part of the framework s overall architecture Python has to deal with it even if a lot of the underlying processing is going on in C 20 Copyright C 2009 David Beazley http www dabeaz cot Programmer Performance e Programmers are often able to get complex systems to work in much less time using a high level language like Python than if they re spending all of their time hacking C code The best performance improvement is the transition from the nonworking to the working state John Ousterhout Premature optimization is the root of all evil Donald Knuth You can always optimize it later Unknown 21 Copyright C 2009 David Beazley http www dabeaz com Performance is Irrelevant e Many concurrent programs are I O bound e They spend virtually all of their time sitting around waiting e Python can wait just as fast as C maybe even faster although haven t measured it e f there s not much processing who cares if it s being done in an interpreter One exception if you need an extremely rapid response time as in real time systems 22 Copyright C 2009 David Beazley htt Copyright C 2009 David Beazley htt You Can Go Faster Python can be extended with C code Look at ctypes Cython Swig etc If you need really high p
28. ley http www dabeaz com Processes e Tasks might run in separate processes PROCESS mst run run run i IPC Process amua gt gt VEIE OLETASIN TATE Task B run run run e Processes coordinate using IPC e Pipes FIFOs memory mapped regions etc Copyright C 2009 David Beazley http www dabeaz com l 5 Distributed Computing ge Tasks may be running on distributed systems Task A E Ta ti Task B e For example a cluster of workstations e Communication via sockets Copyright C 2009 David Beazley http www dabeaz com 6 Copyright C 2009 Davi Part 2 Why Concurrency and Python id Beazley http www dabeaz com Copyright C 2009 Davi Some Issues Python is interpreted What the hardware giveth the software taketh away Frankly it doesn t seem like a natural match for any sort of concurrent programming Isn t concurrent programming all about high performance anyways Se id Beazley http www dabeaz com Why Use Python at All Python is a very high level language And it comes with a large library e Useful data types dictionaries lists etc Network protocols e Text parsing regexs XML HTML etc e Files and the file system e Databases e Programmers like using this stuff Copyright C 2009 David Beazley http www dabeaz com Python as a Framework Python is often used as a high
29. n get simultaneous task execution Task A ieee es ee ATC eI E ee eo ee EE as run run run ae a cee a ach Tke Say kun o GE e Note If the total number of tasks exceeds the number of CPUs then each CPU also multitasks Copyright C 2009 David Beazley http www dabeaz com 0 Task Execution e All tasks execute by alternating between CPU processing and I O handling run z run run run 1 0 system Call e For I O tasks must wait sleep e Behind the scenes the underlying system will carry out the I O operation and wake the task when it s finished Copyright C 2009 David Beazley http www dabeaz com CPU Bound Tasks e A task is CPU Bound if it spends most of its time processing with little I O UC UC run run l run Examples e Crunching big matrices e mage processing Copyright C 2009 David Beazley http www dabeaz com I O Bound Tasks e A task is I O Bound if it spends most of its time waiting for I O Sale Le Lafe run vO run run Examples e Reading input from the user e Networking File processing e Most normal programs are I O bound Copyright C 2009 David Beazley http www dabeaz com Shared Memory e Tasks may run in the same memory space Tror run run run E E E T A A E A E EEE AE AR AETR ANE AE EEEE EE Sg Simultaneous access to objects ge Often a source of unspeakable peril Copyright C 2009 David Beaz
30. n s memory management scheme Simplifies the use of C C extensions Extension functions do not need to worry about thread synchronization e And for now it s here to stay although people continue to try and eliminate it Copyright C 2009 David Beazley http www dabeaz com Part 8 Final Words on Threads Copyright C 2009 David Beazley http www dabeaz com Using Threads e Despite some issues there are situations where threads are appropriate and where they perform well e There are also some tuning parameters Copyright C 2009 David Beazley http www dabeaz com I O Bound Processing e Threads are still useful for 1 O bound apps e For example A network server that needs to maintain several thousand long lived TCP connections but is not doing tons of heavy CPU processing e Here you re really only limited by the host operating system s ability to manage and schedule a lot of threads e Most systems don t have much of a problem even with thousands of threads Copyright C 2009 David Beazley http www dabeaz com Why Threads e f everything is I O bound you will get a very quick response time to any I O activity Python isn t doing the scheduling So Python is going to have a similar response behavior as a C program with a lot of I O bound threads Caveat You have to stay I O bound Copyright C 2009 David Beazley http www dabeaz cot
31. ne a function that does some work e Example Compute a SHA 512 digest of a file import hashlib def compute_digest filename digest hashlib sha512 f open filename rb while True chunk f read 8192 if not chunk break digest update chunk f close return digest digest This is just a normal function no magic Copyright C 2009 David Beazley http www dabeaz com 47 Pool Example e Here is some code that uses our function Make a dict mapping filenames to digests import os TOPDIR Users beazley Software Python 3 0 digest_map for path dirs files in os walk TOPDIR for name in files fullname os path join path name digest_map fullname compute_digest fullname e Running this takes about IOs on my machine Copyright C 2009 David Beazley http www dabeaz com l 48 Pool Example e With a pool you can farm out work e Here s a small sample p multiprocessing Pool 2 2 processes result p apply_async compute_digest README txt various other processing digest result get Get the result This executes a function in a worker process and retrieves the result at a later time The worker churns in the background allowing the main program to do other things 149 Copyright C 2009 David Beazley http www dabeaz com Pool Example e Make a dictionary mapping names to digests import multiprocessing import os TOPDIR Users
32. ninterruptible gt gt gt nums xrange 100000000 gt gt gt 1 in nums C C C nothing happens long pause KeyboardInterrupt gt gt gt Copyright C 2009 David Beazley http www dabeaz com 90 Thread Scheduling Python does not have a thread scheduler e There is no notion of thread priorities preemption round robin scheduling etc e For example the list of threads in the interpreter isn t used for anything related to thread execution e All thread scheduling is left to the host operating system e g Linux Windows etc Copyright C 2009 David Beazley http www dabeaz cot 91 GIL Implementation e The GIL is not a simple mutex lock e The implementation Unix is either A POSIX unnamed semaphore e Or a pthreads condition variable e All interpreter locking is based on signaling e To acquire the GIL check if it s free If not go to sleep and wait for a signal To release the GIL free it and signal Copyright C 2009 David Beazley http www dabeaz com 92 Thread Scheduling e Thread switching is far more subtle than most programmers realize it s tied up in the OS at Ne St e e e X 100 ticks X ba 100 ticks Thread cn SUSPENDED signal i signal R y T Thread Operating Context System Switch d signal signal signal p i Thread 2 SUSPENDED NE se ss e The lag between signaling and sch
33. ogram may produce slightly different results each time it runs even though you aren t using any random numbers e Or it may just flake out mysteriously once every two weeks Copyright C 2009 David Beazley http www dabeaz com 44 Thread Synchronization e Identifying and fixing a race condition will make you a better programmer e g it builds character e However you ll probably never get that month of your life back To fix You have to synchronize threads 45 Copyright C 2009 David Beazley http www dabeaz cot Part 4 Thread Synchronization Primitives 46 Copyright C 2009 David Beazley http www dabeaz cot Synchronization Options e The threading library defines the following objects for synchronizing threads Lock e RLock Semaphore BoundedSemaphore Event e Condition Copyright C 2009 David Beazley http www dabeaz com 47 Synchronization Options In my experience there is often a lot of confusion concerning the intended use of the various synchronization objects Maybe because this is where most students space out in their operating system course well yes actually e Anyways let s take a little tour Copyright C 2009 David Beazley http www dabeaz cot 48 Mutex Locks e Mutual Exclusion Lock m threading Lock Probably the most commonly used synchronization primitive e Primarily used to
34. opyright C 2009 David Beazley http www dabeaz com Everything is focused on messaging 135 Pipes e A channel for sending receiving objects cl c2 multiprocessing Pipe e Returns a pair of connection objects one for each end point of the pipe e Here are methods for communication c send obj c recv c send_bytes buffer c recv_bytes max c poll timeout Copyright C 2009 David Beazley http www dabeaz com Send an object Receive an object Send a buffer of bytes Receive a buffer of bytes Check for data 136 Using Pipes e The Pipe function largely mimics the behavior of Unix pipes However it operates at a higher level It s not a low level byte stream You send discrete messages which are either Python objects pickled or buffers Copyright C 2009 David Beazley http www dabeaz com 137 Pipe Example e A simple data consumer def consumer pl p2 pl close Close producer s end not used while True try item p2 recv except EOFError break print item Do other useful work here e A simple data producer def producer sequence output_p for item in sequence output_p send item Copyright C 2009 David Beazley http www dabeaz com l 38 Pipe Example if name main pl p2 multiprocessing Pipe cons multiprocessing Process target consumer args pl p2 cons start Close the input end in
35. ow Python works under the covers as well as some of the pitfalls and tradeoffs Copyright C 2009 David Beazley http www dabeaz com Thanks e hope you got some new ideas from this class e Please feel free to contact me http www dabeaz com e Also teach Python classes shameless plug Copyright C 2009 David Beazley http www dabeaz com
36. synchronize threads so that only one thread can make modifications to shared data at any given time 49 Copyright C 2009 David Beazley http www dabeaz com Mutex Locks e There are two basic operations m acquire Acquire the lock m release Release the lock Only one thread can successfully acquire the lock at any given time e f another thread tries to acquire the lock when its already in use it gets blocked until the lock is released 50 Copyright C 2009 David Beazley http www dabeaz com Use of Mutex Locks Commonly used to enclose critical sections x 0 x lock threading Lock Thread 1 Thread 2 x_lock acquire x_lock acquire Critical EE Genee Section x_lock release x_lock release e Only one thread can execute in critical section at a time lock gives exclusive access 5I Copyright C 2009 David Beazley http www dabeaz com Using a Mutex Lock e t is your responsibility to identify and lock all critical sections x 0 x_ lock threading Lock Thread 1 Thread 2 x_lock acquire x x l l x x 1 Sigs eae a x_lock release es er If you use a lock in one place but not another then you re missing the whole point All modifications to shared state must be enclosed by lock acquire release 52 Copyright C 2009 David Beazley http www dabeaz com Locking Perils e Locking looks straightforward e Until you star
37. t C 2009 David Beazley http www dabeaz com l 3 l e Alternative method of launching processes def countdown count while count gt 0 print Counting down count count 1 time sleep 5 if name main Ir pl multiprocessing Process target countdown args 10 pl start e Creates a Process object but its run method just calls the given function 132 Copyright C 2009 David Beazley http www dabeaz com Does it Work e Consider this CPU bound function def count n while n gt 0 n 1 Sequential Execution count 100000000 count 100000000 24 6s e Multiprocessing Execution pl Process target count args 100000000 pl start p2 Process target count args 100000000 p2 start Yes it seems to work Copyright C 2009 David Beazley http www dabeaz com Bee 133 Other Process Features e Joining a process waits for termination p Process target somefunc p start p join e Making a daemonic process p Process target somefunc p daemon True p start e Terminating a process p Process target somefunc Eech e These mirror similar thread functions Copyright C 2009 David Beazley http www dabeaz com 134 Distributed Memory data structures With multiprocessing there are no shared Every process is completely isolated Since there are no shared structures forget about all of that locking business C
38. t Creation of a thread vaz Launches a function D create thread f00 ee gt def foo Copyright C 2009 David Beazley http www dabeaz com 28 Thread Basics python program py statement statement create thread f00 wrt gt def foo statement Concurrent statement statement statement execution of statements 1 Copyright C 2009 David Beazley http www dabeaz com 29 Thread Basics python program py statement statement create thread f00 wr gt def foo statement statement statement statement Stas thread terminates EC on return or exit L statement return or exit statement 1 Copyright C 2009 David Beazley http www dabeaz com 30 Thread Basics W Key dea Thread is like a little e ie eae task Se independently runs Statenent inside your program E thread J D create thread foo gt def foo statement statement statement statement 1 1 statement return or exit statement 1 Copyright C 2009 David Beazley http www dabeaz com 3 threading module e Python threads are defined by a class import time import threading class CountdownThread threading Thread def __init_ self count threading Thread init self self count count def run self while self count gt 0 print Counting down self count self count 1 time sleep 5 return You inherit from Thread and redefine run 32 Copyright C 20
39. t adding it to your code e Managing locks is a lot harder than it looks 53 Copyright C 2009 David Beazley http www dabeaz com Lock Management e Acquired locks must always be released However it gets evil with exceptions and other non linear forms of control flow e Always try to follow this prototype x 0 x_lock threading Lock Example critical section x_lock acquire try statements using x finally x_lock release 54 Copyright C 2009 David Beazley http www dabeaz com Lock Management Python 2 6 3 0 has an improved mechanism for dealing with locks and critical sections x 0 x_lock threading Lock Critical section with x_lock statements using x e This automatically acquires the lock and releases it when control enters exits the associated block of statements 55 Copyright C 2009 David Beazley http www dabeaz com Locks and Deadlock e Don t write code that acquires more than one mutex lock at a time x 0 y 0 x_ lock threading Lock y_lock threading Lock with x_lock statements using x with y_lock statements using x and y e This almost invariably ends up creating a program that mysteriously deadlocks even more fun to debug than a race condition 56 Copyright C 2009 David Beazley http www dabeaz com RLock e Reentrant Mutex Lock m threading RLock Create a lock m acquire Acquire the lock m
40. the producer p2 close Go produce some data sequence xrange 100 Replace with useful data producer sequence pl Close the pipe pl close 139 Copyright C 2009 David Beazley http www dabeaz com Message Queues e multiprocessing also provides a queue e The programming interface is the same from multiprocessing import Queue q Queue q put item Put an item on the queue item q get Get an item from the queue e There is also a joinable Queue from multiprocessing import JoinableQueue q JoinableQueue q task_done Signal task completion q join Wait for completion 140 Copyright C 2009 David Beazley http www dabeaz com Queue Implementation Queues are implemented on top of pipes A subtle feature of queues is that they have a feeder thread behind the scenes Putting an item on a queue returns immediately allowing the producer to keep working The feeder thread works on its own to transmit data to consumers 141 Copyright C 2009 David Beazley http www dabeaz com Queue Example ge A consumer process def consumer input_q while True Get an item from the queue item input_q get Process item print item Signal completion input_q task_done ge A producer process def producer sequence output_q for item in sequence Put the item on the queue output_q put item 142 Copyright C 2009 David Beazley http www dab
41. ttp www dabeaz cot Multicore GIL Contention e With multiple cores CPU bound threads get scheduled simultaneously on different processors and then have a GIL battle Thread CPU 1 Thread 2 CPU 2 Release GIL signal Acquire ej Eaa S Wake run Acquire GIL fails Release GIL inal r Acquire GIL fails e The waiting thread T2 may make 100s of failed GIL acquisitions before any success Copyright C 2009 David Beazley http www dabeaz com 98 The GIL and C Code As mentioned Python can talk to C C e C C extensions can release the interpreter lock and run independently e Caveat Once released C code shouldn t do any processing related to the Python interpreter or Python objects e The C code itself must be thread safe Copyright C 2009 David Beazley http www dabeaz com 99 The GIL and C Extensions e Having C extensions release the GIL is how you get into true parallel computing e e Y Gu oO Af A Q Thread 2 gt na thon C extension 7 Python instructions Code i instructions Python A instructions Thread 2 i S III ead K K Cos XX Y M Ov A Copyright C 2009 David Beazley http www dabeaz com l 00 How to Release the GIL e The ctypes module already releases the GIL when calling out to C code e In hand written C extensions you have to insert some special macros PyObject pyfunc PyObject self PyObject
42. uler tasks This loop is what drives the application Copyright C 2009 David Beazley http www dabeaz com 6 l Scheduling Example Output e You ll see the different tasks cycling Copyright C 2009 David Beazley http www dabeaz com 62 Coroutines and I O It is also possible to tie coroutines to I O You take an event loop like asyncore but instead of firing callback functions you schedule coroutines in response to I O activity Scheduler loop select poll coroutine e Unfortunately this requires its own tutorial Copyright C 2009 David Beazley http www dabeaz com Coroutine Commentary e Usage of coroutines is somewhat exotic Mainly due to poor documentation and the newness of the feature itself e There are also some grungy aspects of programming with generators 164 Copyright C 2009 David Beazley http www dabeaz com Coroutine Info gave a tutorial that goes into more detail e A Curious Course on Coroutines and Concurrency at PYCON 09 http www dabeaz com coroutines Copyright C 2009 David Beazley http www dabeaz cot Part 12 Final Words and Wrap up Copyright C 2009 David Beazley http www dabeaz cot Quick Summary e Covered various options for Python concurrency Threads e Multiprocessing Event handling e Coroutines generators e Hopefully have expanded awareness of h

Download Pdf Manuals

image

Related Search

Related Contents

DM-D30 詳細取扱説明書  Sharper Image 8217SI Oven User Manual  TOP 600 XT PRO - Welkom op de site van Fitness Occasions  Dusk-to-Dawn Light  GUÍA DEL USUARIO 2288  LM6005 - TV Connections  Polk Audio Omni S2R Owner's Manual  GIGABYTE GA-K8VT800M User's Manual  Visualizar  Manual DL-DLT  

Copyright © All rights reserved.
Failed to retrieve file