Towards AI Can Help your Team Adopt AI: Corporate Training, Consulting, and Talent Solutions.

Publication

Python and Multi-CPU-Arch
Latest

Python and Multi-CPU-Arch

Last Updated on October 8, 2022 by Editorial Team

Author(s): Murli Sivashanmugam

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Fixing python toolchain breakages on mac M1/M2 reliably

Photo by Christopher Gower on Unsplash

Introduction

One of the design goals of python is to be platform agnostic and to run scripts without modification across environments. Python does a pretty good job of ensuring this compatibility as long as applications and libraries are written using python scripts. As python’s popularity and adaptation increased, more and more libraries started using python extensions to include natively complied code to boost performance and efficiency. Popular python libraries like pandas and NumPy can handle large amounts of data efficiently thanks to their native extension. As more and more python libraries started using natively complied code, platform compatibility issues started showing up in python environments. If one is using Mx(M1/M2) based MacBooks, it's very likely to have witnessed platform compatibility issues in a python environment. This article talks about how python library manager like pip and conda manages platform-dependent binaries, why do they break, and how to get over them on Mac M1/M2 setups in a simpler, more reliable, and replicable way.

Python binaries, like any other binary, are compiled separately for different CPU architectures and platforms(Windows/Mac/Linux). When python installs a Python library in a system, it collects and processes metadata that is specific to the local CPU and platform. If the python library includes any native extensions, it needs to be compiled into CPU and platform-specific binary to run locally. A brief understanding of how python package managers like ‘PIP’ and ‘conda’ manage CPU and platform dependencies will help understand its shortcomings and how to get over them.

PIP Multi-Arch Support

PIP uses the ‘wheel’ packaging standard for distributing python packages. ‘Wheel’ is a packaging system for packaging pure python scripts and native extensions both in source code and compiled binary format. To get a better perspective on how PIP maintains different distribution formats, visit a ‘pip’ web page of a library of your choice and click on “Download Files”. On this page “Source Distribution” section lists the packages available in source format, and the “Built Distributions” section lists all the packages available in pre-built binary formats. For example, this page link shows the source and pre-built distributions for the ‘pandas’ library. These distribution packages follow a naming convention as specified below.

{dist}-{version}(-{build})?-{python}-{abi}-{platform}.whl

Each section in {brackets} is a tag or a component of the wheel name that carries some meaning about what the wheel contains and where the wheel will or will not work. Example:

pandas-1.5.0-cp311-cp311-macosx_11_0_arm64.whl

pandas — is the library name
1.5.0 — is the library version
cp11 — Minimum python version required
cp11 — Minimum application binary interface(ABI) required
macosx_11_0_arm64 — Platform tag which is again subdivided into the following:
– macosx — Operating system
– 11_0 — Minimum required MacOS SDK version
– arm64 — Cpu architecture

PIP naming convention also supports ‘wildcard’ to optimize package bundles. For example, ‘chardet-3.0.4-py2.py3-none-any.whl’ supports both python2 and python3 and has no dependency on ABI, and can install on any platform and CPU architecture. Many python libraries use these wildcard options to optimize the number of package bundles. For more information on Python ‘wheel’ and PIP, please refer to What Are Python Wheels and Why Should You Care?

Why does PIP install fail?

Most of the time, PIP installation fails for two primary reasons. First, if a pre-built library is not available in the repository, PIP will compile the native extension source code on the host system. To do that, it would expect the build tools and other dependent libraries to be available on the host. Sometimes this becomes a challenge to locally install the build dependencies, especially when the dependency tree grows deep.

Second, due to ‘wildcard’ in wheel package names. MacBook introduced arm-based ‘M1/M2’ CPU architecture recently. Some of the older wheel packages for macOS were listed as ‘any’ for CPU architecture because x86 was the only supported architecture by then. If PIP resolves package dependency to one of these older versions, PIP will install this package on newer CPU architecture, assuming it would run. An example of this issue is with the package ‘azure-eventhub’. This library is dependent on another library called ‘uamqp’. This library lists a universal/wildcard package for macOS that is incompatible with the M1/M2 arm64 processor. If you install ‘azure-eventhub’ on M1/M2 one would see that the package would install successfully but it will throw a runtime exception while importing this package.

Conda Multi-Arch Support

Conda takes one step further in ensuring platform portability. Conda packages not only python libraries but also the dependent libraries, binaries, compilers, and python interpreters themselves for the different operating systems and CPU architecture. This way it ensures the entire toolchain is portable across environments. Since all the dependent binaries are also packaged with python libraries, it does not expect any dependencies on the local system except for the standard C libraries. So, if conda provides better portability and fixes the shortcomings of PIP, what could go wrong? The issue is not all python packages are available in Conda. It's common to use pip within a conda environment to install python packages that are not available in conda; hence, one is exposed to the shortcomings of PIP. Again (not to nitpick) ‘azure-eventhub’ package is an example of the same.

If one runs into such a platform compatibility issue and when searching for solutions in forums, one would come across different options like installing a specific version of python or library, installing the library via other packaging systems like ‘brew’ or installing alternate packages, etc. Many of such fixes are not reliable for production and may not be able to replicate across other systems. Curated below are three options that are simpler, reliable, and replicable ways to get over python platform compatibility issues. They are:

  • Pip Install from Source
  • Conda & Rosetta
  • Docker Multi-Arch Builds

Pip Install from Source

If the build dependencies for the native code of a package are minimal, one could recompile it on the host system. When a python toolchain(libraries) fails to install, most likely, it won't be the top-level package that would break but a dependent package nested in the dependency tree. One can use the following command to instruct pip not to install the binary version of the package. For example, the following command will skip the binary pip version of ‘uamqp’ and compile it from the source.

pip install --no-binary uamqp azure-eventhub

Conda & Rosetta

Another approach to get over this issue is to take advantage of ‘Rosetta’. The simplest option to run the x86 version of python over Rosetta is by using the conda platform override option. Example

CONDA_SUBDIR=osx-64 conda create -n myenv_x86 python=3.10
conda activate myenv_x86
conda config --env --set subdir osx-64
#Config its using x86_64 platform version of python
python -c "import platform;print(platform.machine())"

The “CONDA_SUBDIR” environment variable overrides the CPU architecture of conda while executing the conda environment create command. The conda config command overrides the CPU architecture in the new environment all the time so that one need not set “CONDA_SUBDIR” for all further commands in that environment. After creating a new environment with the conda platform overridden to x86, it behaves like any other conda environment. One can do a PIP install in this environment, and it would install the x86 version of python libraries. Switching between multiple conda environments is seamless in the same terminal, and even other tools like VS Code works seamlessly without any issues.

Docker Multi-Arch

The third option is again to take advantage of Rosetta but via ‘docker’. This is the most portable and seamless option to work across multiple environments and users. Docker’s Multi-Platform feature can be used to force build x86 docker images on M1/M2 MacBooks. When a docker run is presented with an x86 docker image, it internally employs Rosetta to run the image. Following are the steps to build an x86 cross-platform docker image.

Sample Dockerfile:

FROM ubuntu
RUN apt-get update
RUN apt-get install -y python3
CMD ["/usr/bin/python3", "-c", "import platform; print(\"Platform:\", platform.machine())"]

Build x86 docker image:

$ docker build --platform linux/amd64 . -t img_x86

Run x86 docker image:

$ docker run --platform linux/amd64 -it img_x86
Platform: x86_64

For better portability across multiple users and environments, the “platform” option in FROM command of Dockerfile can be used. This ensures the x86 image is used even if the “ — platform” option of the build command is not specified by the user.

Sample Dockerfile:

FROM --platform=linux/amd64 ubuntu
RUN apt-get update
RUN apt-get install -y python3
CMD ["/usr/bin/python3", "-c", "import platform; print(\"Platform:\", platform.machine())"]

This docker file will build an x86 docker image without the “ — platform” docker build option.

$ docker build . -t img_x86
$ docker run -it img_x86
Platform: x86_64

Conclusion

Above mentioned options may not be the only way to fix python platform compatibility issues reliably, but I believe they would serve as a generic no-brainer approach for many of us to quickly get over such issues and avoid frustration and save time looking for a custom solution. Hopefully, in the near future Python ecosystem will further evolve and mature to handle multi CPUs and platforms seamlessly without any additional involvement from the user.


Python and Multi-CPU-Arch was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓