Suraj Kapoor

Director of Product @WayUp, formerly @Lerer Hippeau. Technologist, Optimistic Contrarian.

Concurrency: Threading vs Multiprocessing in Python

As programs become larger and more complex, they need to do more computations while maintaining speed. Concurrency is when multiple computations are performed in parallel, and there are techniques that enable us to do this such as Threading and Multiprocessing.

What is Threading?

Threading is when several programs run concurrently in a single process. Threads run in the same memory space so sharing information amongst threads is easy but precautions (the GIL) are needed to stop different threads to write to the same memory space, at the same time.

What is Multiprocessing?

Multiprocessing uses processes (instead of threads) to run programs concurrently in separate memory space taking advantage of multiple CPUs. The processes cannot share information or state and take up more memory space.

Threading vs Processing?

This is an excellent StackOverflow answer that outlines the pros and cons. As a newbie I am in the process of experimenting with both, although my experience with threading has not been great in that code seems a lot less intuitive to write, whereas multiprocessing seems to be a lot more straightforward.

I ran a simple test to count the length of each word in a 120K text file. It took approximately 90 seconds processing time without using concurrency technique. Using the multiprocessing module, processing time was only 12 seconds. My code is here .

What are Queues?

Queues are an ordered collection of items that allow addition of an item on one end and removal of an item from another. They are like a simplified version of a list with basic pop and append commands. Usually they follow the First in, First Out (FIFO) principle. Queues are used in conjunction with threading as they allow state to be shared in the process. Python’s multiprocessing module has it’s own Queue object, which works differently. This allows data to be shared between processes.

What is the GIL (Global Interpreter Lock)?

This is a mechanism that allows for the synchronization of threads to execute one at a time. It's also the reason that threading is not that effective in Python.