Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Understanding Pickle to Make Decorators Work With Multiprocessing
Latest   Machine Learning

Understanding Pickle to Make Decorators Work With Multiprocessing

Last Updated on January 20, 2025 by Editorial Team

Author(s): Han Qi

Originally published on Towards AI.

Photo by Hert Niks on Unsplash
import os
from multiprocessing import Pool
import time
from functools import wraps
import heartrate

port_base = 10000


def initialize_worker():
# This function runs only in the worker processes
process_id = os.getpid()
port = port_base + process_id % 10000 # Unique port for each process
print(f"Tracing on port {port} for process {process_id}")
heartrate.trace(browser=True, port=port)


def track_execution_time(func):
@wraps(func)
def wrapper(*args, **kwargs):
start_time = time.time()
print(f"Starting task at {start_time}")
result = func(*args, **kwargs)
end_time = time.time()
print(f"Ending task at {end_time}")
print(f"Task duration: {end_time - start_time}")
return result

return wrapper


@track_execution_time
def meaningless_task(dummy_text_partial):
words = dummy_text_partial.split()
lorem_count = sum(1 for word in words if word.lower() == "lorem")

for i in range(5):
time.sleep(1)

return lorem_count


def main():
dummy_text = """
Lorem ipsum dolor sit amet, consectetur Lorem adipiscing Lorem elit
"""

with Pool(processes=2, initializer=initialize_worker) as pool:
results = pool.map(meaningless_task, dummy_text.split(","))

print("Word count results:", results)
pool.close()


if __name__ == "__main__":
main()

Above code uses 2 workers in a multiprocessing pool to count the number of times the string lorem appears in each clause produced by splitting a sentence on comma.

The worker processing logic is not the point of this article, but it returns Word count results: [1, 2] because Lorem ipsum dolor sit amet has 1 lorem and consectetur Lorem adipiscing Lorem elit has 2.

Heartrate is required pip install heartrate (https://github.com/alexmojaki/heartrate) if you want fancy execution tracking.
Otherwise, delete import heartrate , delete the whole def initialize_worker function and remove initializer=initialize_worker in Pool.

The problem

If you comment out the line @wraps(func) , you should get
AttributeError: Can’t pickle local object β€˜track_execution_time.<locals>.wrapper’

Why is this a problem

  1. Multiprocessing requires pickling the worker function (meaningless_task in the above example).
  2. Pickling a function requires being able to find the function at the global scope of the module
  3. Decorators wrap functions and return another function of the same name (if using @ syntax). These wrapped functions (def wrapper) are defined in a decorating function (def track_execution_time). The wrapped function goes out of scope once the decorating function returns, and so cannot be found in global scope.
Photo by Val Vesa on Unsplash

How does functools.wraps solve the problem?

wraps copies attributes from the raw function to the decorated function, so pickle can get what it needs.

From https://docs.python.org/3/library/functools.html#functools.update_wrapper, wraps copies attributes defined in WRAPPER_ASSIGNMENTS (__module__, __name__, __qualname__, __annotations__, __type_params__, and __doc__)

Which attribute does pickle need?

From https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled:

Note that functions (built-in and user-defined) are pickled by fully qualified name, not by value. [2] This means that only the function name is pickled, along with the name of the containing module and classes. Neither the function’s code, nor any of its function attributes are pickled. Thus the defining module must be importable in the unpickling environment, and the module must contain the named object, otherwise an exception will be raised.

Pickle needs __qualname__ of the function being pickled to be a globally accessible name. track_execution_time.<locals>.wrapper in the AttributeError above describes the path from the module’s global scope, but the wrapper is not accessible anymore.

Why use wraps

You don’ t need to, but it’s nice to have wraps copy the other useful attributes in case you want to use them, like __docs__ to show documentation.

def track_execution_time(func):
# @wraps(func)
def wrapper(*args, **kwargs):
start_time = time.time()
print(f"Starting task at {start_time}")
result = func(*args, **kwargs)
end_time = time.time()
print(f"Ending task at {end_time}")
print(f"Task duration: {end_time - start_time}")
return result

wrapper.__qualname__ = func.__qualname__

return wrapper

You could have removed @wraps(func) and done wrapper.__qualname__ = func.__qualname__ like above (both have the same value of meaningless_task ) after the assignment.

Why does pickle need __qualname__

Pickle needs the name of the object being pickled, so it can use that name to find the definition when unpickling.
It’s time to go down the rabbit hole to another example to learn pickling fundamentals.

Photo by Tine Ivanič on Unsplash
import pickle

def add(x, y):
return x + y


# with open("func.pkl", "wb") as f:
# pickle.dump(add, f)

pickled = pickle.dumps(add)

# del globals()["add"]

# globals()["add"] = lambda x, y: x * y

# with open("func.pkl", "rb") as f:
# loaded_add = pickle.load(f)

loaded_add = pickle.loads(pickled)

print(loaded_add(2, 3))

The above code should pickle, and unpickle successfully, and print 5 after adding 2+3.

Deleting the pickled object between pickling and unpickling

If you uncomment del globals()[β€œadd”] , you should see AttributeError: Can’t get attribute β€˜add’ on <module β€˜__main__’ from β€˜/home/hanqi/code/pickling/test_pickle.py’>

That means unpickling failed.

Pickling requires the __qualname__ of the object being pickled to be globally accessible. By artificially deleting it, the unpickling step is unable to find it.

This is a slightly different problem from before. Previously, we could not even pickle. Here we can pickle but cannot unpickle.
However, this unpickling failure indirectly explains why __qualname__ must be correctly specified in the heartrate example.

Inserting fake implementation to mess with unpickling

If you uncomment globals()[β€œadd”] = lambda x, y: x * y , you will see output 6 instead of 5. (add = lambda x, y: x * y works too)
because addition changed to multiplication (2 * 3 = 6).

This code overwrites the def add previously defined.

This shows that pickle does not care what is the implementation of the code object that was pickled initially.
Any code at runtime has an opportunity to change the unpickled implementation as long as it refers to the same name seen during pickling.

Pickle uses the __qualname__ (add in this case) to search for whatever add is bound to in the OS process that is unpickling, like the injected wrong implementation of multiplication instead of addition.

You can even assign arbitrary constants like add = 2 before unpickling and get TypeError: β€˜int’ object is not callable

The above example uses a single process. In reality, pickle is more commonly used to pass objects across different files or even machines. For example, a machine learning model is trained and pickled on a training machine with development libraries, then quantized and deployed and another machine with different hardware characteristics more suited for inference.

You can play with the commented code of pickle interfacing with files, and try to move the unpickling code to another file and revisit the theory here.

Photo by Gregoire Jeanneau on Unsplash

Won’t unpickling try to load the undecorated function?

Since we do wrapper.__qualname__ = func.__qualname__ , it makes sense to question how pickle isn’t linking to the original function.

The answer is that when unpickling, the function has already been decorated. Repeating the key point above:

pickle does not care what is the implementation of the code object that was pickled initially.

Linking to the same part of docs (https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled) and quoting the same section but zooming in:

the defining module must be importable in the unpickling environment

We just have to ensure that whatever object is bound to the pickled __qualname__ contains the implementation we want. The 1st time the file containing the original function was ran, the decorator would have already ran, and all further references to the original function in def main would be referring to the decorated version.

Multiprocessing Theory

During multiprocessing, python uses 1 of 3 methods to create child processes (fork/spawn/forkserver) https://stackoverflow.com/questions/64095876/multiprocessing-fork-vs-spawn

  • set with multiprocessing.set_start_method("spawn")
  • check with multiprocessing.get_start_method()

In fork (linux or windows wsl, but no longer the default starting Python 3.14), each child process inherits memory from the parent process and starts executing from the forking point where the tasks are dispatched at Pool(processes=2, initializer=initialize_worker) (https://stackoverflow.com/a/60910365).

In spawn (linux or windows without wsl), each child processes re-executes the source from top of script and re-imports required objects.

forkserver is a hybrid that creates another server first, before child processes are created from the server and inherit memory from it.

In all 3 cases, when a child process is started, the function has already been decorated, so pickling and unpickling works with the decorated version.

How this is really true depends on understanding lower level code shown in https://stackoverflow.com/a/71690229 which is beyond the scope of this article.

Why use decorators

Why not avoid decorators by implementing all the extra logic within the worker function?
This requires making edits to the worker function that may not be desired.

Decorators decouple the worker function from additional logic.
Such additional logic is also reusable and composable, such as stacking multiple decorators in flask (https://explore-flask.readthedocs.io/en/latest/views.html#caching).

Why heartrate

Because that was the use case from which this article was inspired.

Heart rate allows the user to comfortably verify from the counts along the left border that the correct number of loops are executed instead of staring at printed stdout in the terminal or opening log files.
Longer bars mean more hits, lighter colours mean more recent.

In other code, such counts could allow inferring how many tasks were sent to each worker if 1 loop iteration processes 1 task, which helps make task allocation decision among workers, such as using lower level API Process instead of Pool.

You can expect 2 browser windows to open, each representing 1 process.

You can see the counts updating from the time.sleep(1) . Increase range(5) to range(500) if your browser opens too slowly for you to see changes before code completes.

There is a stack trace at the bottom for more details.
You can add initialize_worker in the parent process too to prove it is not executing the worker functions.

In case you get stuck during range(500) such as code ignoring Ctrl+C (I have no idea why that happens non-predictably 70% of the time), you can kill the process by closing the terminal window in your IDE.

Helpful bash commands (executed line by line interactively) to inspect your parent-child relationships and ports.

ps -eo pid,ppid,cmd | grep heartrate # find parent pid and child pid
pstree -p 126489 # assumes you know parent pid

Port Generation

Heartrate requires creating a new port for each process, and I initially tried using a decorator to generate incrementing ports per process. That failed since the decorator only runs once for all workers, not once per worker.

initializer of Pool also runs the same function in every process, using the same initargs (not shown above), which leads to port clashing if more than 1 heartrate instance is started.
Finally, an idea to generate unique ports is randomizing with os.getpid() . Would love feedback on this or other solutions to the port issue, or suggestions for tools similar to Heartrate (since it’s 4 years old).

Summary

Both during pickling and unpickling, the pickled object must be globally accessible.
If it is not accessible, that could be because the definition is there but the path to it is wrong (decorator example), or the definition is deleted (test_pickle example).

This article only describes function-based decorators and worker functions. Class-based decorators (reference 2) and worker classes are even more complex.

References

  1. Functions are pickled by __qualname__: https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled
  2. Class based decorators: https://gael-varoquaux.info/programming/decoration-in-python-done-right-decorating-and-pickling.html
  3. Stacking decorators: https://explore-flask.readthedocs.io/en/latest/views.html#caching
  4. Multiprocessing methods: https://stackoverflow.com/questions/64095876/multiprocessing-fork-vs-spawn
  5. Memory copying behaviour between fork and spawn: https://stackoverflow.com/a/60910365
  6. Pitfalls of fork copying some but not all objects: https://pythonspeed.com/articles/python-multiprocessing/
  7. Handling slow multiprocessing: https://pythonspeed.com/articles/faster-multiprocessing-pickle
  8. Multiprocessing source: https://stackoverflow.com/a/71690229

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓