📖 9 min read

Threads vs Processes#

        graph TD
    P[Process] --> PM[Own Memory Space]
    P --> PT[Threads]
    PT --> T1[Thread 1]
    PT --> T2[Thread 2]
    PT --> T3[Thread 3]
    GIL[GIL - Global Interpreter Lock] --> |only one thread runs at a time| PT
    T1 --> |I/O bound - releases GIL| IO[I/O Operations]
    T1 --> |CPU bound - blocked by GIL| CPU[CPU Tasks]
    style GIL fill:#f96,stroke:#333
    style PM fill:#9cf,stroke:#333

Process has “Own memory Space”: A process is an independent execution environment managed by the operating system, which includes its own distinct memory space. This is correctly shown at the top left of the diagram.

CPython uses one thread at a time (due to GIL): The core function of the Global Interpreter Lock (GIL): it acts as a mutex (lock) that ensures only one thread can execute Python bytecode at any given moment, even on multi-core processors.

Threads are I/O Bound (Common usage): The diagram correctly shows that threads are typically used for I/O-bound tasks in Python. When an I/O operation occurs (e.g., waiting for a network response or file read), the thread releases the GIL, allowing other threads to acquire it and run, thus achieving concurrency for these types of tasks.

CPU Bound tasks struggle with the GIL: The diagram implies that CPU-bound tasks are constrained by the GIL, as only one thread can actively compute at a time. Trying to use multiple threads for purely CPU-bound tasks in CPython will not achieve true parallelism and can even add overhead from constant context switching.

Threads (only one GIL shared)#

import threading, time

from tasks import cpu_task

t1 = threading.Thread(target=cpu_task)
t2 = threading.Thread(target=cpu_task)

start = time.time()
t1.start(); t2.start()
t1.join(); t2.join()
print("Total time:", time.time() - start)

Process finished inProcess finished in 2.0034258365631104
 2.012209892272949
Total time: 2.0227320194244385

Expected result:

Each task takes ~1–1.5 seconds
Total time ≈ sum of both, e.g. ~2–3 seconds

→ No parallel CPU execution → because one GIL

Multiprocessing (each process has its own GIL)#

from multiprocessing import Process
import time

from tasks import cpu_task


p1 = Process(target=cpu_task)
p2 = Process(target=cpu_task)

start = time.time()
p1.start(); p2.start()
p1.join(); p2.join()
print("Total time:", time.time() - start)

Process finished in 1.1095759868621826
Process finished in 1.1105716228485107
Total time: 1.2385430335998535

A more direct explanation#

from multiprocessing import Process
p = Process(target=cpu_task)
p.start()

The OS does this:

fork/spawn → create a new OS-level process
New process loads a new CPython interpreter
That interpreter initializes its own GIL
That process runs your function independently on another CPU core

So by design:

1 process → 1 interpreter → 1 GIL
4 processes → 4 interpreters → 4 GILs

This is why multiprocessing can use multiple CPU cores.

Threads (`threading` module)#

A thread is a lightweight unit of execution inside a process.
Multiple threads share the same memory space of the process.
Threads are cheap to create, but:
- In CPython, the Global Interpreter Lock (GIL) means only one thread executes Python bytecode at a time.
- Threads are mostly useful for I/O-bound tasks (networking, reading files, waiting), not CPU-bound computation.

Example:

import threading

def worker():
    print("Thread is running")

t = threading.Thread(target=worker)
t.start()
t.join()

Multiple threads can read/write shared data, but you must be careful with locks to prevent race conditions.

Processes (`os.fork()` or `multiprocessing`)#

A process is an independent program execution with its own memory space.
Processes do not share memory (unless you use special shared memory or queues).
Processes are heavier than threads but:
- They bypass the GIL, so multiple processes can run Python code in parallel on multiple CPU cores.
- Useful for CPU-bound tasks.

Example using multiprocessing:

from multiprocessing import Process

def worker():
    print("Process is running")

p = Process(target=worker)
p.start()
p.join()

Each process is fully independent. Changes in one process’s memory don’t affect the other.

Key Differences Table#

Feature	Thread (`threading`)	Process (`os.fork` / `multiprocessing`)
Memory	Shared	Separate
Creation overhead	Low	High
Parallelism in Python	Limited by GIL (CPU-bound)	True parallelism possible
Crash isolation	If one thread crashes, all threads may crash	Independent; crash doesn’t affect others
Use case	I/O-bound tasks	CPU-bound tasks

Important Note About `os.fork()`#

os.fork() is low-level, duplicates the current process exactly as is.
It’s dangerous in multi-threaded programs (like Jupyter) because it copies threads and resources, leading to potential deadlocks.
multiprocessing is safer because it handles process creation and communication for you.

So `multiprocessing` can achieve true parallelism#

By creating separate processes, while threading cannot bypass the GIL for CPU-bound tasks.

        graph LR
    subgraph Process1[Process 1 - own GIL]
        G1[GIL 1] --> Core1[CPU Core 1]
    end
    subgraph Process2[Process 2 - own GIL]
        G2[GIL 2] --> Core2[CPU Core 2]
    end
    subgraph Process3[Process 3 - own GIL]
        G3[GIL 3] --> Core3[CPU Core 3]
    end
    OS[Operating System] --> Process1
    OS --> Process2
    OS --> Process3
    style Process1 fill:#9cf,stroke:#333
    style Process2 fill:#9f9,stroke:#333
    style Process3 fill:#f9c,stroke:#333

What is Threading#

Threading in Python is a technique used to achieve concurrent execution by running multiple threads—smaller units of a process—simultaneously or in an overlapping manner.

        graph TD
    subgraph SingleProcess[Single Python Process]
        GIL[GIL - Global Interpreter Lock]
        T1[Thread 1] -->|acquire lock| GIL
        T2[Thread 2] -->|waiting| GIL
        T3[Thread 3] -->|waiting| GIL
        T4[Thread 4] -->|waiting| GIL
        GIL -->|execute bytecode| CPU[CPU]
    end
    style GIL fill:#f96,stroke:#333
    style SingleProcess fill:#eee,stroke:#999

Single Process: All threads (T1, T2, T3, T4) operate within a single Python process, as the diagram correctly shows with the large vertical line encompassing everything.
The Global Interpreter Lock (GIL): The diagram visually represents the GIL as a single lock mechanism that all threads must contend for to execute Python bytecode.
No True Parallelism (for CPU-bound tasks): The key takeaway, which the diagram illustrates, is that even with multiple threads available, only one thread can hold the lock and execute code at any given moment. This effectively serializes the execution of CPU-bound threads, preventing them from running in parallel across multiple CPU cores. The small green blocks show threads taking turns executing brief bursts of work

        graph LR
    subgraph Timeline[Execution Timeline - CPU Bound Threads]
        T1A[T1 runs] --> T2A[T2 runs] --> T1B[T1 runs] --> T2B[T2 runs]
    end
    note["Only one thread runs at a time — no speedup for CPU-bound work"]
    style Timeline fill:#eee,stroke:#999

import threading
import time
import os

def cpu_intensive_task(count_to):
    """A function that performs a CPU-bound operation."""
    while count_to > 0:
        count_to -= 1

def run_single_threaded(iterations, count_target):
    """Runs the task once in the main thread."""
    start_time = time.time()
    for _ in range(iterations):
        cpu_intensive_task(count_target)
    end_time = time.time()
    print(f"--- Single-threaded execution time for {iterations} task(s): {end_time - start_time:.4f} seconds ---")

def run_multi_threaded(num_threads, count_target):
    """Runs the task using multiple threads."""
    start_time = time.time()
    threads = []

    for i in range(num_threads):
        t = threading.Thread(target=cpu_intensive_task, args=(count_target,), name=f"Thread-{i}")
        threads.append(t)
        t.start()

    for t in threads:
        t.join()

    end_time = time.time()
    print(f"--- Multi-threaded ({num_threads} threads) execution time: {end_time - start_time:.4f} seconds ---")

if __name__ == "__main__":
    TARGET_COUNT = 50000000
    NUM_THREADS = 4

    print(f"System has {os.cpu_count()} CPU cores available.")
    run_single_threaded(NUM_THREADS, TARGET_COUNT)
    run_multi_threaded(NUM_THREADS, TARGET_COUNT)

    print("\nObservation: The execution times are very similar, confirming that threads in Python's CPython interpreter cannot run CPU-bound tasks in parallel.")

Thread Lifecycle#

        stateDiagram-v2
    [*] --> New: threading.Thread() created
    New --> Runnable: thread.start()
    Runnable --> Running: GIL acquired / scheduled by OS
    Running --> Runnable: GIL released / preempted
    Running --> Blocked: I/O wait / sleep / lock / join
    Blocked --> Runnable: I/O complete / timeout / lock released
    Running --> Terminated: run() method returns or raises
    Terminated --> [*]

Python Thread Lifecycle States

New (Created): A thread is in this state once the threading.Thread object is created but before its start() method is called.
Runnable (Ready/Alive): After thread.start() is called, the thread is considered “alive” and moves to the runnable state.
Running: The thread is actively executing its tasks. Due to the GIL in CPython, only one Python thread can truly be in the running state at any given time.
Blocked/Waiting/Sleeping: The thread may temporarily pause for I/O, synchronization, sleeping, or joining.
Terminated (Dead): The thread reaches this final state when its run() method completes. A terminated thread cannot be restarted.

CPython’s internal threading mechanism#

        graph TD
    subgraph CPython[CPython Interpreter]
        GIL[GIL - Global Interpreter Lock]
        subgraph Threads
            T1[Thread 1\nstart_new_thread C]
            T2[Thread 2\nstart_new_thread C]
            T3[Thread 3\nstart_new_thread C]
        end
        T1 -->|acquire T-state lock| GIL
        T2 -->|wait for T-state lock| GIL
        T3 -->|wait for T-state lock| GIL
        GIL --> BC[Execute Python Bytecode]
    end
    style GIL fill:#f96,stroke:#333
    style CPython fill:#eef,stroke:#669

Threads: Separate flows of execution created using start_new_thread implemented in C
T-state locks: Internal thread state locks in CPython
CPython Implementation: The GIL is an internal CPython mechanism restricting simultaneous execution of Python bytecode

`threading.local`#

import threading

from _threading_local import local
data = local()

def worker():
    data.x = threading.current_thread().name
    print(data.x)

threads = [threading.Thread(target=worker) for _ in range(3)]
for t in threads:
    t.start()

Thread Pool#

from multiprocessing.pool import ThreadPool

from threading import current_thread
def work(x):
    print(pool._pool)
    return x*x

pool = ThreadPool(processes=2)
results = pool.map(work, [1, 2, 3, 4, 5, 6, 7, 8])
print(results)

from multiprocessing.pool import ThreadPool

def work(x):
    return x*x

pool = ThreadPool(processes=4)
results = pool.map(work, [1, 2, 3, 4])
print(results)

[1, 4, 9, 16]

`thread-safe`#

The above sample code is not thread safe
Even though Python has a GIL, operations like:
```
count += 1
```
are not atomic. They expand into:
- load count
- add 1
- store count
Two threads can interleave and cause race conditions.
We can use lock to solve race condition

lock = threading.Lock()

def try_count(sleep_seconds, name):
    global count
    with lock:
        count += 1

    time.sleep(sleep_seconds)
    print(f"\nIn thread: {name}: count = {count}")

`thread.local` for per-thread state#

If you want per-thread state instead of shared state use threading.local()

import threading
import time

thread_local = threading.local()

def try_count(sleep_seconds, name):
    thread_local.count = getattr(thread_local, "count", 0)
    thread_local.count += 1
    time.sleep(sleep_seconds)
    print(f"\nIn thread: {name}: count = {thread_local.count}")


t1 = threading.Thread(target=try_count, args=(1, "T1"))
t2 = threading.Thread(target=try_count, args=(2, "T2"))
t3 = threading.Thread(target=try_count, args=(1, "T3"))

t1.start()
t2.start()
t3.start()

In thread: T1: count = 1
In thread: T3: count = 1
In thread: T2: count = 1

What is `os.fork`#

        graph TD
    Parent[Parent Process\npid = os.fork] -->|returns child PID| ParentCont[Parent continues\nwith pid != 0]
    Parent -->|fork| Child[Child Process\nexact copy of parent]
    Child -->|returns 0| ChildCont[Child runs\nwith pid == 0]
    Parent -->|shares at fork time| Mem[Memory / File Descriptors\nCode / Open Files]
    Child -->|copy-on-write| Mem
    style Parent fill:#9cf,stroke:#333
    style Child fill:#9f9,stroke:#333

import os

pid = os.fork()

if pid == 0:
    print("I am the child process!")
else:
    print(f"I am the parent process. My child has PID {pid}")

Use case: Pipes for Parent–Child Communication#

Python’s os.pipe() creates two file descriptors:

r — read end
w — write end

After fork(), the parent and child each inherit both ends, so you must close the unused ends in each process.

        graph LR
    subgraph Pipe[os.pipe - r, w]
        W[Write end w] --> R[Read end r]
    end
    subgraph Parent[Parent Process]
        PW[keep write end w] -->|os.write| W
        PR[close read end r]
    end
    subgraph Child[Child Process]
        CR[keep read end r] -->|os.read| R
        CW[close write end w]
    end
    style Parent fill:#9cf,stroke:#333
    style Child fill:#9f9,stroke:#333
    style Pipe fill:#ffc,stroke:#999

import os

r, w = os.pipe()

pid = os.fork()

if pid == 0:
    # --- Child process ---
    os.close(r)
    message = b"Hello from child!"
    os.write(w, message)
    os.close(w)
else:
    # --- Parent process ---
    os.close(w)
    data = os.read(r, 1024)
    print("Parent received:", data.decode())
    os.close(r)

Use case: Bidirectional communication#

import os
import time

pr, pw = os.pipe()  # parent writes, child reads
cr, cw = os.pipe()  # child writes, parent reads

pid = os.fork()

if pid == 0:
    # --- Child ---
    os.close(pw)
    os.close(cr)
    msg = os.read(pr, 1024)
    print("Child got:", msg.decode())
    reply = b"Message received!"
    os.write(cw, reply)
    os.close(pr)
    os.close(cw)
else:
    # --- Parent ---
    os.close(pr)
    os.close(cw)
    print("[Parent] Started, waiting 2 seconds before sending message...")
    time.sleep(2)
    os.write(pw, b"Hello child!")
    reply = os.read(cr, 1024)
    print("Parent got:", reply.decode())
    os.close(pw)
    os.close(cr)

Use case: call a function from child process#

        graph TD
    Fork[os.fork] -->|pid == 0| Child[Child Process]
    Fork -->|pid != 0| Parent[Parent Process]
    Child --> CallFunc[call my_func 10 20]
    Child --> Exit[os._exit 0]
    Parent --> PrintPID[print child PID]
    style Child fill:#9f9,stroke:#333
    style Parent fill:#9cf,stroke:#333

import os

def my_func(x, y):
    print(f"Child running my_func with {x=} and {y=}")

pid = os.fork()

if pid == 0:
    my_func(10, 20)
    os._exit(0)
else:
    print(f"Parent: child PID is {pid}")

Child process will be copied global variables from parent#

        graph TD
    subgraph Parent[Parent Process]
        PG[global_var = 42\nshared_list = ...]
    end
    Parent -->|os.fork - copy-on-write| Child[Child Process]
    subgraph Child[Child Process]
        CG[global_var = 42\nshared_list = ...\ncopied from parent]
    end
    PG -->|child reads same value at fork| CG
    Child -->|modifies its own copy| CM[child global_var = 99\nparent unaffected]
    style Parent fill:#9cf,stroke:#333
    style Child fill:#9f9,stroke:#333