A GIL-less Future for Python

The Global Interpreter Lock (GIL) has been a fundamental part of CPython's design for decades. However, a new era is dawning for Python developers, with the possibility of building CPython without the GIL.

Introduction to GIL

The Global Interpreter Lock (GIL) is a mutex that protects access to Python objects, preventing multiple native threads from executing Python bytecodes simultaneously. While it simplifies the implementation, it has been a bottleneck for multi-threaded applications.

Overview of CPython Changes

Removing the GIL requires substantial changes to CPython internals, including:

  • Reference Counting: Making reference counting thread-safe.
  • Memory Management: Replacing the internal allocator with a thread-safe one.
  • Container Thread-Safety: Using per-object locks for containers like lists and dictionaries.
  • Garbage Collection: Modifications to support the new structure.

Memory Management

The proposal includes replacing pymalloc with mimalloc, a general-purpose thread-safe allocator. This change also addresses issues related to garbage collection and container thread-safety.

Garbage Collection (Cycle Collection)

The garbage collector will require significant changes, including using "stop-the-world" to provide thread-safety and eliminating generational garbage collection.

The Historical Context of GIL

The Global Interpreter Lock (GIL) has been a part of Python since 1992, a year after Python's public release. During the early '90s, multi-core CPUs were not common, and the GIL was an effective solution to share the interpreter between different operating system threads. Guido van Rossum, Python's creator, reflected on the era, stating that the GIL became infamous when chip designers began putting multiple CPUs on one chip, leading to pressure for parallel processing.

Why GIL Was Retained

Several factors influenced the GIL's retention in Python. It was easy to implement, and single-thread programs worked fast. The GIL also prevented certain bugs, such as deadlocks, that could occur with fine-grained locks. However, similar to Linux's big kernel lock and FreeBSD's giant lock, the GIL only allowed one thread to be processed at a time, limiting efficiency.

The Push to Remove GIL

With the rise of neural network-based AI models and the need to exploit multiple types of parallelism, the GIL became a significant drawback. In languages other than Python, threads can run different parts of an AI model in separate CPU cores. The GIL's presence in Python blocked this possibility, leading to a growing consensus for its removal. A poll conducted by the Python team last month gauged the community's support for free threading and PEP 703, leading to the official acceptance of the proposal to remove the GIL.

A Cautious Approach to GIL Removal

The process of removing the GIL is long and cautious. The Python Enhancement Proposal (PEP) introduces a new build configuration flag to disable the GIL in CPython. The proposal is divided into short-term, mid-term, and long-term stages, with the GIL-free version eventually becoming the default Python interpreter. This transformation is seen as a win for the AI ecosystem and is led by Sam Gross, with Meta dedicating a serious engineering team to the effort.

Impact on Other Languages

An exciting outcome of Python's move towards a GIL-less future is its potential impact on other languages like Rust. Developers who previously turned to Rust for multi-threaded computation may now find Python without the GIL viable. This shift could lead to broader adoption of Python in areas where the GIL was previously a limitation.

Multi-Threading with GIL

With the GIL in place, multi-threading in Python can be restrictive. Even though you can create multiple threads, only one thread can execute Python bytecode at a time. Here's a simple example of multi-threading with the GIL:

 
import threading
import time

def print_numbers():
    for i in range(10):
        time.sleep(1)
        print(i)

# Creating two threads
thread1 = threading.Thread(target=print_numbers)
thread2 = threading.Thread(target=print_numbers)

# Starting the threads
thread1.start()
thread2.start()

# Waiting for both threads to complete
thread1.join()
thread2.join()

With the GIL, the threads don't truly run in parallel, leading to suboptimal performance on multi-core processors.

GIL Removal and True Parallelism

Without the GIL, Python can achieve true parallelism, allowing multiple threads to run simultaneously on different CPU cores. This can lead to significant performance improvements in CPU-bound tasks. Here's an example of how code might benefit from GIL removal:

 
from concurrent.futures import ThreadPoolExecutor

def compute_heavy_task(x):
    # Simulating a CPU-bound task
    return x * x

# Using a thread pool to run tasks in parallel
with ThreadPoolExecutor() as executor:
    results = list(executor.map(compute_heavy_task, range(1000)))

print(results)

Without the GIL, this code can utilize all available CPU cores, leading to faster execution.

Impact on Reference Counting

Removing the GIL requires reference counting changes to make it thread-safe. Here's a simplified example of how reference counting might be implemented without the GIL:

 
struct _object {
  uintptr_t ob_tid; // owning thread id
  uint32_t ob_ref_local; // local reference count for the owning thread
  Py_ssize_t ob_ref_shared; // shared reference count for other threads
  // ...
};

// Incrementing the reference count
void INCREF(PyObject *obj) {
  if (obj->ob_tid == current_thread_id()) {
    obj->ob_ref_local++;
  } else {
    atomic_increment(&obj->ob_ref_shared);
  }
}

This example illustrates how reference counting can be adapted to support multi-threading without the GIL, using a combination of local and shared reference counts.

Sources

  1. GitHub Repository - nogil-3.12
  2. GitHub Repository - nogil
  3. Research Paper on Biased Reference Counting
  4. A GIL-less Future for Python Beckons Developers
  5. Poll: Feedback to the SC on making CPython free-threaded and PEP 703
  6. PEP 703 – Making the Global Interpreter Lock Optional in CPython
  7. Python 2->3 transition was horrifically bad
  8. A fast, free threading Python
  9. Python may Not be Great for Backend but is Still Preferred for ML