Towards AI Can Help your Team Adopt AI: Corporate Training, Consulting, and Talent Solutions.

Publication

Summarising 3 Years of Google Colab Usage — The Good, the Bad, and The Ugly
Latest   Machine Learning

Summarising 3 Years of Google Colab Usage — The Good, the Bad, and The Ugly

Last Updated on July 17, 2023 by Editorial Team

Author(s): Ori Abramovsky

Originally published on Towards AI.

Meme generated by the author using imgflip.com

The first time I met Google’s Colab was when we searched for a serverless solution to train our models. Until that point, our models were of a smaller size which enabled us to train them on our local machines. But once we encountered a use case where much bigger models were required, a GPU was now mandatory to fulfill that need, and therefore a different solution was required. We started searching, but very quickly, Colab out-shined all the competitors. The main advantage was the ease of use; just add a notebook file to your G-drive (that most of us are using anyway), and you’re ready to roll; (almost) no need for any extra configurations. Later on, what locked us into the Colab platform was the seamless TPU support; at that point, our GPU train cycles were quite long, and as we experimented with hyperparameters tuning, the need to shorten our train cycles was quite acute. Colab enabled us to move our training process from GPU to TPUs with the only modification of a few code lines. Using TPUs significantly reduced the time per training cycle. It was too good to be true. From that point, our binding began; starting from the free offering, we soon continued to colab pro and later to colab pro+, moving more and more of our research efforts into that ecosystem. Unfortunately, it didn’t take long for the enthusiasm to start fading; first, it was due to a lack of important features (which we managed to solve using workarounds), but finally the service support was the straw that broke our camel’s back. This column’s aim is to summarise the journey we had with Google’s Colab. The target audience is new Colab users, ‘nube’ or just ones who are still experimenting with it to decide if it’s worth using at all. Spoiler alert; The bottom line will be that Colab is a unique tool without almost any competitor for specific phases of the research/development lifecycle. But once research meets a critical point, other solutions should be considered. But let’s don’t put the cart before the horses. Let’s start with a brief overview of Colab.

Google Colab

Google, on the very first paragraph of their Colab faq page, gives a very good introduction to the Colab service —

“Colaboratory, or “Colab” for short, is a product from Google Research. Colab allows anybody to write and execute arbitrary python code through the browser, and is especially well suited to machine learning, data analysis and education. More technically, Colab is a hosted Jupyter notebook service that requires no setup to use, while providing access free of charge to computing resources including GPUs”.

Colab was first introduced in 2017 as a research project by Google. It was initially aimed at researchers and students who needed a platform to work on machine learning projects without the need for specialized hardware or software, but soon the platform gained much popularity. Since its launch, Google Colab has undergone several updates and improvements, including the addition of new features such as support for more programming languages, improved hardware options, and integration with Google Drive. While there are many competitors in the serverless notebooks domain (such as Azure Notebooks, IBM Watson Studio, AWS Sagemaker, and Kaggle Kernels), the super low entry barrier and the ease of use make Google Colab a super popular choice for individuals and small teams who want to get started with machine learning and data analysis, a flexible and accessible environment to experiment and learn. Its main drawbacks are the obvious dependence on Google services, the potential data privacy concerns, and the limited resource allocation mentioned by Google at the very beginning of the Colab Faq page.

The first paragraphs from the Google Colab faq page

Now that we’re more familiar with Google Colab characteristics let’s drill down to its key properties, extensive usage experience POV, looking into 3 main sections — the good (why to consider), the bad (why to give it a second thought) and the ugly (why to reconsider).

The Good — Ease of use

The key differentiator of Google Colab is its ease of use; the distance from starting a Colab notebook to utilizing a fully working TPUs cluster is super short. Colab's common usage flow relies heavily on G-Drive integration, making complicated actions like authorization almost seamless. For example, the following 3 lines of code are the only ones needed in order to gain access to Google services such as G-Drive and BigQuery. As simple as that.

Authentication code snippet, made by the author

Colab’s only entry barrier is to have a notebook file on your drive. No need for notebook server instances, hardware provisioning, or user access mgmt. Moreover, the notebook is always available on the drive, enabling one to easily share its content or just to review it offline (similar to any other document on G-drive). A truly serverless notebook. Colab offers a seamless configuration experience; the options to choose from are ‘no GPU’/ GPU/ TPUs and ‘small or big RAM’. Providing a real abstraction, removing the need to explicitly describe the required resources. Colab also enables a seamless authentication flow; once calling a Google service (like asking to mount G-Drive into a notebook session), Colab will pop up a request to authorize this run it. No need to define roles, authorization, permissions, or any other entity commonly required on similar services.

Permit access popup from Colab

To utilize TPUs, we only need to adjust a few code lines on the net declaration code. The fact that Colab runs as part of the G-Suite ecosystem enables a super easy collaboration of the Colab notebooks output (to share results and gain feedback using Google Sheets, collect user input using Google Forms, or just generate graphs as images and publish them to the team Shared Drives). The bottom line is, the super low barrier from having an idea to exploring it, prototyping, starting a feedback loop, and finally publishing an MVP is just too good to be true.

The Bad — Lack of important features

The issues begin to appear once we start expanding our Colab use; The first main drawback is the limited session duration, or more specifically, the ambiguousness regarding the total resources one can consume. Theoretically, Colab is always available; just start your notebook, and you’re ready to roll. But in reality, Colab limits the duration of the session, especially for free users, especially at peak times, and especially for the pricier resources. Free tier users will commonly face a pop-up message verifying they are real users on the very first minutes of their run. Moreover, Colab will try to verify that the user is interactive and that it’s not just a long processing task (which is quite problematic given that AI applications will commonly include long processing parts).

Popup verifying its a real user from Google Colab

Colab commonly suggests buying pricier licenses in order to gain a smoother experience; Colab Pro and Pro+ enable getting more resources without the risk of them being taken in the middle of the run. The next main drawback will be the lack of some critical features but with a possible workaround. Such an example can flow pipelining; commonly, we would like to split our processing (especially for non-trivial cases) into a set of sub-tasks. Colab doesn’t directly support such a need. A workaround would be to rely instead on notebook pipelining (a single notebook orchestrating the run and calling the other notebooks in a sync flow). The main issue is the fact that all the triggered notebooks will use the same main notebook run configurations. In case a single notebook is GPU based, all will have to use a GPU backend, regardless of if they truly need it.

Popup suggesting upgrading to the pro account from Google Colab.

It’s important to note that Colab is a work-in-progress project; new features are constantly added. The issue is it makes the overall experience of a work-in-progress project… not a real solution. The run scheduler is a great example of that, available only for pro+ users, it theoretically enables schedule runs. But as it currently lacks the ability to define run parameters or to pre-authorize (which are both quite critical for auto-scheduling), it doesn’t really answer that need. Requires one to come up with workarounds to work it out.

Notebook scheduling configuration from Google Colab

The Ugly — Support

As annoying as it might sound, the issues we mentioned so far are not deal breakers; each has a workaround, enabling one to stick with the platform in case one wants to. Generally speaking, the main reason to decide to switch to a service provider is when it loses our confidence. Commonly it can be due to a lack of transparency or, more specifically, a lack of support. The Colab currently advised way to get support is to submit feedback on the app or to open an issue on their GitHub project. Both are not truly related to that need. Moreover, looking into the GitHub issues page, many issues are being closed as not project related (which makes sense given that ‘it doesn’t work for me’ requests shouldn’t be open on a project issues page, it’s not meant for that).

The support options popup from Google Colab

Generally speaking, our own critical point was when the Colab account was suddenly blocked. What should we do next? Trying to follow the mentioned advice on how to get support didn’t work. Looking into the project issues, many seem to face the same scenario without knowing where or who should assist, whom to talk to, and how. This is when we finally understood it was time to say goodbye and started to re-evaluate the available competitors.

Blocked account popup from Google Colab

What's next

Google Colab is probably still the best serverless notebooks solution that exists out there. Nevertheless, in many cases, it’s just not good enough. My advice for new users is to try it yourselves. Keep in mind, though, the limitations we mentioned in order to constantly verify if it’s not time to move elsewhere.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓