Python and Multi-CPU-Arch
Last Updated on October 8, 2022 by Editorial Team
Author(s): Murli Sivashanmugam
Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.
Fixing python toolchain breakages on mac M1/M2Β reliably
Introduction
One of the design goals of python is to be platform agnostic and to run scripts without modification across environments. Python does a pretty good job of ensuring this compatibility as long as applications and libraries are written using python scripts. As pythonβs popularity and adaptation increased, more and more libraries started using python extensions to include natively complied code to boost performance and efficiency. Popular python libraries like pandas and NumPy can handle large amounts of data efficiently thanks to their native extension. As more and more python libraries started using natively complied code, platform compatibility issues started showing up in python environments. If one is using Mx(M1/M2) based MacBooks, it's very likely to have witnessed platform compatibility issues in a python environment. This article talks about how python library manager like pip and conda manages platform-dependent binaries, why do they break, and how to get over them on Mac M1/M2 setups in a simpler, more reliable, and replicable way.
Python binaries, like any other binary, are compiled separately for different CPU architectures and platforms(Windows/Mac/Linux). When python installs a Python library in a system, it collects and processes metadata that is specific to the local CPU and platform. If the python library includes any native extensions, it needs to be compiled into CPU and platform-specific binary to run locally. A brief understanding of how python package managers like βPIPβ and βcondaβ manage CPU and platform dependencies will help understand its shortcomings and how to get overΒ them.
PIP Multi-Arch Support
PIP uses the βwheelβ packaging standard for distributing python packages. βWheelβ is a packaging system for packaging pure python scripts and native extensions both in source code and compiled binary format. To get a better perspective on how PIP maintains different distribution formats, visit a βpipβ web page of a library of your choice and click on βDownload Filesβ. On this page βSource Distributionβ section lists the packages available in source format, and the βBuilt Distributionsβ section lists all the packages available in pre-built binary formats. For example, this page link shows the source and pre-built distributions for the βpandasβ library. These distribution packages follow a naming convention as specified below.
{dist}-{version}(-{build})?-{python}-{abi}-{platform}.whl
Each section in {brackets} is a tag or a component of the wheel name that carries some meaning about what the wheel contains and where the wheel will or will not work.Β Example:
pandas-1.5.0-cp311-cp311-macosx_11_0_arm64.whl
pandasβββis the library name
1.5.0βββis the library version
cp11βββMinimum python version required
cp11βββMinimum application binary interface(ABI) required
macosx_11_0_arm64βββPlatform tag which is again subdivided into the following:
– macosxβββOperating system
– 11_0βββMinimum required MacOS SDK version
– arm64βββCpu architecture
PIP naming convention also supports βwildcardβ to optimize package bundles. For example, βchardet-3.0.4-py2.py3-none-any.whlβ supports both python2 and python3 and has no dependency on ABI, and can install on any platform and CPU architecture. Many python libraries use these wildcard options to optimize the number of package bundles. For more information on Python βwheelβ and PIP, please refer to What Are Python Wheels and Why Should YouΒ Care?
Why does PIP installΒ fail?
Most of the time, PIP installation fails for two primary reasons. First, if a pre-built library is not available in the repository, PIP will compile the native extension source code on the host system. To do that, it would expect the build tools and other dependent libraries to be available on the host. Sometimes this becomes a challenge to locally install the build dependencies, especially when the dependency tree growsΒ deep.
Second, due to βwildcardβ in wheel package names. MacBook introduced arm-based βM1/M2β CPU architecture recently. Some of the older wheel packages for macOS were listed as βanyβ for CPU architecture because x86 was the only supported architecture by then. If PIP resolves package dependency to one of these older versions, PIP will install this package on newer CPU architecture, assuming it would run. An example of this issue is with the package βazure-eventhubβ. This library is dependent on another library called βuamqpβ. This library lists a universal/wildcard package for macOS that is incompatible with the M1/M2 arm64 processor. If you install βazure-eventhubβ on M1/M2 one would see that the package would install successfully but it will throw a runtime exception while importing thisΒ package.
Conda Multi-Arch Support
Conda takes one step further in ensuring platform portability. Conda packages not only python libraries but also the dependent libraries, binaries, compilers, and python interpreters themselves for the different operating systems and CPU architecture. This way it ensures the entire toolchain is portable across environments. Since all the dependent binaries are also packaged with python libraries, it does not expect any dependencies on the local system except for the standard C libraries. So, if conda provides better portability and fixes the shortcomings of PIP, what could go wrong? The issue is not all python packages are available in Conda. It's common to use pip within a conda environment to install python packages that are not available in conda; hence, one is exposed to the shortcomings of PIP. Again (not to nitpick) βazure-eventhubβ package is an example of theΒ same.
If one runs into such a platform compatibility issue and when searching for solutions in forums, one would come across different options like installing a specific version of python or library, installing the library via other packaging systems like βbrewβ or installing alternate packages, etc. Many of such fixes are not reliable for production and may not be able to replicate across other systems. Curated below are three options that are simpler, reliable, and replicable ways to get over python platform compatibility issues. TheyΒ are:
- Pip Install fromΒ Source
- Conda &Β Rosetta
- Docker Multi-Arch Builds
Pip Install fromΒ Source
If the build dependencies for the native code of a package are minimal, one could recompile it on the host system. When a python toolchain(libraries) fails to install, most likely, it won't be the top-level package that would break but a dependent package nested in the dependency tree. One can use the following command to instruct pip not to install the binary version of the package. For example, the following command will skip the binary pip version of βuamqpβ and compile it from theΒ source.
pip install --no-binary uamqp azure-eventhub
Conda &Β Rosetta
Another approach to get over this issue is to take advantage of βRosettaβ. The simplest option to run the x86 version of python over Rosetta is by using the conda platform override option.Β Example
CONDA_SUBDIR=osx-64 conda create -n myenv_x86 python=3.10
conda activate myenv_x86
conda config --env --set subdir osx-64
#Config its using x86_64 platform version of python
python -c "import platform;print(platform.machine())"
The βCONDA_SUBDIRβ environment variable overrides the CPU architecture of conda while executing the conda environment create command. The conda config command overrides the CPU architecture in the new environment all the time so that one need not set βCONDA_SUBDIRβ for all further commands in that environment. After creating a new environment with the conda platform overridden to x86, it behaves like any other conda environment. One can do a PIP install in this environment, and it would install the x86 version of python libraries. Switching between multiple conda environments is seamless in the same terminal, and even other tools like VS Code works seamlessly without anyΒ issues.
Docker Multi-Arch
The third option is again to take advantage of Rosetta but via βdockerβ. This is the most portable and seamless option to work across multiple environments and users. Dockerβs Multi-Platform feature can be used to force build x86 docker images on M1/M2 MacBooks. When a docker run is presented with an x86 docker image, it internally employs Rosetta to run the image. Following are the steps to build an x86 cross-platform dockerΒ image.
Sample Dockerfile:
FROM ubuntu
RUN apt-get update
RUN apt-get install -y python3
CMD ["/usr/bin/python3", "-c", "import platform; print(\"Platform:\", platform.machine())"]
Build x86 dockerΒ image:
$ docker build --platform linux/amd64 . -t img_x86
Run x86 dockerΒ image:
$ docker run --platform linux/amd64 -it img_x86
Platform: x86_64
For better portability across multiple users and environments, the βplatformβ option in FROM command of Dockerfile can be used. This ensures the x86 image is used even if the ββββplatformβ option of the build command is not specified by theΒ user.
Sample Dockerfile:
FROM --platform=linux/amd64 ubuntu
RUN apt-get update
RUN apt-get install -y python3
CMD ["/usr/bin/python3", "-c", "import platform; print(\"Platform:\", platform.machine())"]
This docker file will build an x86 docker image without the ββββplatformβ docker buildΒ option.
$ docker build . -t img_x86
$ docker run -it img_x86
Platform: x86_64
Conclusion
Above mentioned options may not be the only way to fix python platform compatibility issues reliably, but I believe they would serve as a generic no-brainer approach for many of us to quickly get over such issues and avoid frustration and save time looking for a custom solution. Hopefully, in the near future Python ecosystem will further evolve and mature to handle multi CPUs and platforms seamlessly without any additional involvement from theΒ user.
Python and Multi-CPU-Arch was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Join thousands of data leaders on the AI newsletter. Itβs free, we donβt spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI