Multiple process or threads? And, why?

adr1an@programming.dev · 2 months ago

Multiple process or threads? And, why?

conrad82@lemmy.world · 2 months ago

Yes it is correct. TLDR; threads run one code at the time, but can access same data. processes is like running python many times, and can run code simultaneously, but sharing data is cumbersome.

If you use multiple threads, they all run on the same python instance, and they can share memory (i.e. objects/variables can be shared). Because of GIL (explained by other comment), the threads cannot run at the same time. This is OK if you are IO bound, but not CPU bound

If you use multiprocessing, it is like running python (from terminal) multiple times. There is no shared memory, and you have a large overhead since you have to start up python many times. But if you have large calculations you can do in parallell that takes long time, it will be much faster than threads as it can use all cpu cores.

If these processes need to share data, it is more complicated. You need to use special functions to share data, like queues and pipes. If you need to share many MB of data, this takes a lot of time in my experience (10s of milliseconds).

If you need to do large calculations, using numpy functions or numba may be faster than multiple processes, due to good optimizations. But if you need to crunch a lot of data, multiprocessing is usually the way to go