Modern sotware developers has a lot of options to perform application logic in non-sequential way.
But you have to understand which option is the best for the task at your hand.
I wrote this text to give you the information to choose.
Why we need non-sequential code
Massive IO (input/output)
We use computers with separate subsystems for fast (and expensive) and slow (and cheap) memory - RAM (“memory”) and disk (even SSD).
Also we mean ‘network’ when we speak about IO.
So suppose you wrote web-server and you can handle requests pretty fast but after that you need to send responce by network and that will take 100x times more.
And during processing for example you spent 1us to process and 1ms to wait for data from DB.
The problem that when you wait for DB or network you do not serve other client requests. And for sure you can serve them - you have plenty of CPU power and even DB and CPU - they are more effective if you have a number of requests in parallel.
Если в этот момент к вам придут другие клиенты, они только установят на уровне операционной системы TCP соединения, и будут стоять в очереди, ждать, когда осободится ваш обработчик. Который, в свою очередь, просто ждет завершения ввода-вывода операционной системой.
For example machine learning tasks need a lot of CPU power.
And here we have just opposite situation as described above. We use all our CPU power and would like to use even more (for example second CPU on the machine) just to serve client request faster.
For example our web-server shows weather forecast. And have to download new forecast each hour.
Downloading of new forecast is long procedure and we cannot stop serving current requests. And we even do not need to dounload the new forecast as fast as possible - we can do that slow but important thing do not slow down client requests that we serve at the moment of downloading new forecast.
1st picture - sequential way. Tasks are served sequentially, one at a time. 2nd task will wait for the end of 1st task.
2nd picture pseudo=parallel - we perfome tasks by small peaces. So we will serve small tasks faster. And we will serve 2nd task faster than 1st. But of cause by the cost of slowing down 1st task.
How to do that in practice?
Operating system do that by the help of CPU - it gives each process small chunk of time.
Inside process we can for example use system call for that. When one application ‘green thread’ do some system call we intercept it and after calling system we give control to other application ‘green thread’. And when it do system call do the same. And again ang again. the is additional advantage that most system calls mean IO - application ‘green thread’in any case have to wait system to do some IO (read file, send packet to network etc). And we use this ‘natural pause’ to execute another application ‘green thread’’’
3rd picture we use two CPU so we really execute in parallel.
CPython in general cannot use multi-threading effectively due to Global Interpreter Lock (GIL).
GIL in Python exists just by historical reasons and because CPython developers want to interpret existing huge Python code most effectively.
To remove GIL you need to use some multi-threading mechanisms even for single-thread code (because in gerenal you cannot tell if it’s single- or multi-threaded). And that will slow down nearly all Python application. too high cost (https://en.wikipedia.org/wiki/Global_interpreter_lock).
In fact Python IO libraries and C-code for example in NymPy unlock GIL so multi-threading work good in code with a lot of IO or with written on C libraries.
But there is no sense in multi-threading for just a Python code - you will get even wosrt performance in comparison with single-thread (and too complicated code as well).
Ok with theory but how to do that in practice
Below are a number of way to have non-sequential Python code.
Just have running as many our application instances as we want. And let the OS do all the dirty task in managing resources between this processes.
Of cause you would not run hundreds of instances - too heavy memory overhead and CPU resources waste due to switching between processed.
And all the instances are separate - they can communicate but this is not very efficient.
Example of multiprocessing usage:
And you can even use multiprocessing in asyncio application
Let OS manage resources between our processes.
- OS manage resourses very well (for a reasonable number of processes).
- Absolute isolation. Very simple logic (if you do not need inter-process communications)
- No Python GIL.
- High cost to fork and keep running separate OS processes.
- High memory consumption (each process has separate Python interpreter).
- High cost switching between processes (this is harware based but in any case we need to do a lot of things).
- Bad choice if you need a lot of communication between your “application threads”. There are a number of very convenient tools in Python multi-processing module but in any case this is expensive inter-communications.
Where to use
- We need to use all CPU for heavy computation - run at least as many processes as number of CPU
- Effectively use wait for IO - just run a small number of processes so when one whit others will work.
- Background work without a lot of communications with main application logic. Run separate process for each backgroun task. Conside usage of appropriate tool like Celery.
Ancient way from our software engineering ancestors.
In python API is the same as for multi-processing.
As in multy-processing we delegate CPU sharing task to OS. But now we have more fine-grade control.
- The most efficient way to use all CPU.
- A lot of knowlidge and documentation - this way is mostly reaserched in many years of usage.
- Launching thread is cheaper than launching process, but still pretty expensive.
- The same about switching between threads.
- You cannot have thousands of threads. Ok you can if you want to but in this case CPU mostly will work for inter-thread switching and will have very litle time for useful work.
- Very complex in implementation. You have to read a lot and spend years in practicing it before you fill confident and can write stable applications.
- Because of Python GIL this way good only for waiting for IO and for non-python libraries.
Where to use
- Background tasks and waiting for IO.
- Small amount of parallel threads.
- Simple intre-thread communication if you do not wont spent in debugging all your life.
Green threads and asynio coroutines
green threads are executed in
execution loop as
Execution loop take
green threads/coroutines one by one and execute them.
green thread/coroutine stop execution to wait for IO or some
execution loop gets nest
All this works just because
green threads/coroutines stops at some moments
by themself and gave control back to
execution loop which start next
So this is “cooperative multi-threading” and works only if all
are polite and gave control back to
execution loop from time to time.
green threads just old version of the same technic that now we call
asyncio is official standard for Python and a lot of libraries use it.
There are a number of implementations. One example -gevent.
This is a way how to change system libraries so they will catch
control on all IO calls and call
For example if we want
urllib to use patched
socket and cooperate
from gevent import monkey
From this point
urllib will send control to
execution loop each time
somebody wait for network IO.
So just do
monkey patching and after that you can use the patched library
just as usual.
In this example we patched
If you comment
monkey.patch_all() then output will be different:
In fact there is special
gevent so no need in
But my intension was to demonstrate the technic. Also you can see how is this
implicite and had to see. Looking at the code you have no idea that in fact
it can stop execution and execute another code.
- We don’t complicated our code with explicite
async/await, as with
- Cheap switching between
green therads(actually zero cost)
- We can have very large amount of
- No cost overhead
Monkey patchingimplicite and even big experience won’t gove you good understanding how you code actually works.
Where to use
In worst case scenario you just could not make it work, but in the best case it just
multi-thread at nearly zero efforts from you.
Python standard. Special operator in the language.
- As in Python motto “explicite is better than implicite”. Easy to understand and to debug.
- Cheap switching
- We can have very large amount of
- No cost overhead
You have a lot of
asyn/await in you code and that complicate application logic.
And you will have async code that should be used as async.
Where to use
All libraries that you use have
Now this is Pythonic standard so with high probability this is your case.
Just to have a whole picture I want to say a couple of words about
In each potentially
async function you specify functions that will be called after
complition of this function.
Obviously after that you application is a some kind of finite-state machine.
You have a lot of ‘handler’ instead of some ‘logic flow’.
In fact this is good for UI which is naturally event-driven.
But this is just spagetty-code for other purposes.
I wrote some example - see it below.
I kept it simple but you need just a little imagination to understand what monster will you have if you continue to write application in this way.
At the end of the day you will have a mess of handler without any understanding how they are interconnected.
libuv and others
In fact for any
execution loop tricky part - how to understand that we got
data we was waiting for and have to contie this
From application logic point of view this is irrevalent.
execution loop implementation will have different performance
for different situations and in different OS.
There are a lot of
execution loop implementations.
To compare them - very complicated task that cannot be done ‘in-general’,
results heavily depends on application and resources it uses.
In some cases it’s convenient to combine different technics. For example you can write asyncio application with multiprocessing