Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Python and Multi-CPU-Arch
Latest

Python and Multi-CPU-Arch

Last Updated on October 8, 2022 by Editorial Team

Author(s): Murli Sivashanmugam

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Fixing python toolchain breakages on mac M1/M2Β reliably

Photo by Christopher Gower onΒ Unsplash

Introduction

One of the design goals of python is to be platform agnostic and to run scripts without modification across environments. Python does a pretty good job of ensuring this compatibility as long as applications and libraries are written using python scripts. As python’s popularity and adaptation increased, more and more libraries started using python extensions to include natively complied code to boost performance and efficiency. Popular python libraries like pandas and NumPy can handle large amounts of data efficiently thanks to their native extension. As more and more python libraries started using natively complied code, platform compatibility issues started showing up in python environments. If one is using Mx(M1/M2) based MacBooks, it's very likely to have witnessed platform compatibility issues in a python environment. This article talks about how python library manager like pip and conda manages platform-dependent binaries, why do they break, and how to get over them on Mac M1/M2 setups in a simpler, more reliable, and replicable way.

Python binaries, like any other binary, are compiled separately for different CPU architectures and platforms(Windows/Mac/Linux). When python installs a Python library in a system, it collects and processes metadata that is specific to the local CPU and platform. If the python library includes any native extensions, it needs to be compiled into CPU and platform-specific binary to run locally. A brief understanding of how python package managers like β€˜PIP’ and β€˜conda’ manage CPU and platform dependencies will help understand its shortcomings and how to get overΒ them.

PIP Multi-Arch Support

PIP uses the β€˜wheel’ packaging standard for distributing python packages. β€˜Wheel’ is a packaging system for packaging pure python scripts and native extensions both in source code and compiled binary format. To get a better perspective on how PIP maintains different distribution formats, visit a β€˜pip’ web page of a library of your choice and click on β€œDownload Files”. On this page β€œSource Distribution” section lists the packages available in source format, and the β€œBuilt Distributions” section lists all the packages available in pre-built binary formats. For example, this page link shows the source and pre-built distributions for the β€˜pandas’ library. These distribution packages follow a naming convention as specified below.

{dist}-{version}(-{build})?-{python}-{abi}-{platform}.whl

Each section in {brackets} is a tag or a component of the wheel name that carries some meaning about what the wheel contains and where the wheel will or will not work.Β Example:

pandas-1.5.0-cp311-cp311-macosx_11_0_arm64.whl

pandasβ€Šβ€”β€Šis the library name
1.5.0β€Šβ€”β€Šis the library version
cp11β€Šβ€”β€ŠMinimum python version required
cp11β€Šβ€”β€ŠMinimum application binary interface(ABI) required
macosx_11_0_arm64β€Šβ€”β€ŠPlatform tag which is again subdivided into the following:
– macosxβ€Šβ€”β€ŠOperating system
– 11_0β€Šβ€”β€ŠMinimum required MacOS SDK version
– arm64β€Šβ€”β€ŠCpu architecture

PIP naming convention also supports β€˜wildcard’ to optimize package bundles. For example, β€˜chardet-3.0.4-py2.py3-none-any.whl’ supports both python2 and python3 and has no dependency on ABI, and can install on any platform and CPU architecture. Many python libraries use these wildcard options to optimize the number of package bundles. For more information on Python β€˜wheel’ and PIP, please refer to What Are Python Wheels and Why Should YouΒ Care?

Why does PIP installΒ fail?

Most of the time, PIP installation fails for two primary reasons. First, if a pre-built library is not available in the repository, PIP will compile the native extension source code on the host system. To do that, it would expect the build tools and other dependent libraries to be available on the host. Sometimes this becomes a challenge to locally install the build dependencies, especially when the dependency tree growsΒ deep.

Second, due to β€˜wildcard’ in wheel package names. MacBook introduced arm-based β€˜M1/M2’ CPU architecture recently. Some of the older wheel packages for macOS were listed as β€˜any’ for CPU architecture because x86 was the only supported architecture by then. If PIP resolves package dependency to one of these older versions, PIP will install this package on newer CPU architecture, assuming it would run. An example of this issue is with the package β€˜azure-eventhub’. This library is dependent on another library called β€˜uamqp’. This library lists a universal/wildcard package for macOS that is incompatible with the M1/M2 arm64 processor. If you install β€˜azure-eventhub’ on M1/M2 one would see that the package would install successfully but it will throw a runtime exception while importing thisΒ package.

Conda Multi-Arch Support

Conda takes one step further in ensuring platform portability. Conda packages not only python libraries but also the dependent libraries, binaries, compilers, and python interpreters themselves for the different operating systems and CPU architecture. This way it ensures the entire toolchain is portable across environments. Since all the dependent binaries are also packaged with python libraries, it does not expect any dependencies on the local system except for the standard C libraries. So, if conda provides better portability and fixes the shortcomings of PIP, what could go wrong? The issue is not all python packages are available in Conda. It's common to use pip within a conda environment to install python packages that are not available in conda; hence, one is exposed to the shortcomings of PIP. Again (not to nitpick) β€˜azure-eventhub’ package is an example of theΒ same.

If one runs into such a platform compatibility issue and when searching for solutions in forums, one would come across different options like installing a specific version of python or library, installing the library via other packaging systems like β€˜brew’ or installing alternate packages, etc. Many of such fixes are not reliable for production and may not be able to replicate across other systems. Curated below are three options that are simpler, reliable, and replicable ways to get over python platform compatibility issues. TheyΒ are:

  • Pip Install fromΒ Source
  • Conda &Β Rosetta
  • Docker Multi-Arch Builds

Pip Install fromΒ Source

If the build dependencies for the native code of a package are minimal, one could recompile it on the host system. When a python toolchain(libraries) fails to install, most likely, it won't be the top-level package that would break but a dependent package nested in the dependency tree. One can use the following command to instruct pip not to install the binary version of the package. For example, the following command will skip the binary pip version of β€˜uamqp’ and compile it from theΒ source.

pip install --no-binary uamqp azure-eventhub

Conda &Β Rosetta

Another approach to get over this issue is to take advantage of β€˜Rosetta’. The simplest option to run the x86 version of python over Rosetta is by using the conda platform override option.Β Example

CONDA_SUBDIR=osx-64 conda create -n myenv_x86 python=3.10
conda activate myenv_x86
conda config --env --set subdir osx-64
#Config its using x86_64 platform version of python
python -c "import platform;print(platform.machine())"

The β€œCONDA_SUBDIR” environment variable overrides the CPU architecture of conda while executing the conda environment create command. The conda config command overrides the CPU architecture in the new environment all the time so that one need not set β€œCONDA_SUBDIR” for all further commands in that environment. After creating a new environment with the conda platform overridden to x86, it behaves like any other conda environment. One can do a PIP install in this environment, and it would install the x86 version of python libraries. Switching between multiple conda environments is seamless in the same terminal, and even other tools like VS Code works seamlessly without anyΒ issues.

Docker Multi-Arch

The third option is again to take advantage of Rosetta but via β€˜docker’. This is the most portable and seamless option to work across multiple environments and users. Docker’s Multi-Platform feature can be used to force build x86 docker images on M1/M2 MacBooks. When a docker run is presented with an x86 docker image, it internally employs Rosetta to run the image. Following are the steps to build an x86 cross-platform dockerΒ image.

Sample Dockerfile:

FROM ubuntu
RUN apt-get update
RUN apt-get install -y python3
CMD ["/usr/bin/python3", "-c", "import platform; print(\"Platform:\", platform.machine())"]

Build x86 dockerΒ image:

$ docker build --platform linux/amd64 . -t img_x86

Run x86 dockerΒ image:

$ docker run --platform linux/amd64 -it img_x86
Platform: x86_64

For better portability across multiple users and environments, the β€œplatform” option in FROM command of Dockerfile can be used. This ensures the x86 image is used even if the β€œβ€Šβ€”β€Šplatform” option of the build command is not specified by theΒ user.

Sample Dockerfile:

FROM --platform=linux/amd64 ubuntu
RUN apt-get update
RUN apt-get install -y python3
CMD ["/usr/bin/python3", "-c", "import platform; print(\"Platform:\", platform.machine())"]

This docker file will build an x86 docker image without the β€œβ€Šβ€”β€Šplatform” docker buildΒ option.

$ docker build . -t img_x86
$ docker run -it img_x86
Platform: x86_64

Conclusion

Above mentioned options may not be the only way to fix python platform compatibility issues reliably, but I believe they would serve as a generic no-brainer approach for many of us to quickly get over such issues and avoid frustration and save time looking for a custom solution. Hopefully, in the near future Python ecosystem will further evolve and mature to handle multi CPUs and platforms seamlessly without any additional involvement from theΒ user.


Python and Multi-CPU-Arch was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓