Use Google Colab Like A Pro
Last Updated on March 24, 2022 by Editorial Team
Author(s): Wing Poon
Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.
15 tips to supercharge your productivity
Regardless of whether youβre a Free, Pro, or Pro+ user, we all love Colab for the resources and ease of sharing it makes available to all of us. As much as we hate its restrictions, Google is to be commended for democratizing Deep Learning, especially for students from countries which would otherwise have no opportunity to participate in this βnew electrificationβ of entire industries.
Have you spent countless hours working in Colab but have yet to invest a few minutes of time to really get to know the tool and be at your most productive?
βTime is not refundable; use it with intention.β
1. Check and Report on your GPU Allocation
βLife is like a box of chocolates. You never know what youβre gonna get.β
βββForestΒ Gump
Iβm so sorry, weβre out of V100s, letβs set you up with our trusty old K80 with 8GB of RAM. You can try again in an hour if youβre feeling lucky! Uhm, yeah Google, thanks but noΒ thanks.
gpu = !nvidia-smi -L
print(gpu[0])
assert any(x in gpu[0] for x in ['P100', 'V100'])
Jokes aside, Iβm sure youβve been burned by going through a lengthy pip install, mounting GDrive, downloading, and preparing your data, only to find out three-quarters ways through your notebook that you forgot to set your runtime to βGPUβ? Put this assertion on the first cell of your notebookβββyou can thank meΒ later.
2. No More Authenticating βgdrive.mount( )β
Not all Python notebooks are the same, Colab treats Jupyter Notebooks as second-class citizens. When you upload your Jupyter Notebook to GDrive, youβll see that it appears as a blue/folder icon. When you create a new Colab notebook from scratch in GDrive (+New Button β€ More β€ Google Colaboratory), itβll appear as an orange icon with the ColabΒ logo.
This βnativeβ Colab notebook has special powersβββnamely, Google can automatically mount your GDrive on the Colab VM for you as soon as you open the notebook. If you open a Jupyter notebook and click on the Files icon (left sidebar), and then the Mount Drive icon (pop-out panel top-row), Colab will insert the following new cell into your Jupyter notebook:
from google.colab import drive
drive.mount('/content/drive')
And when you execute that cell, itβll pop up a new browser tab and you have to go through that annoying i. Select your Account, ii. Allow and iii. Copy-and-Paste iv. Switch and Close Tabs, rigamarole.
If you have a Jupyter notebook that you frequently open and it needs GDrive access, invest thirty seconds and save yourself that constant hassle. Simply create a new (native) Colab notebook (as described above), then open your existing Jupyter notebookβββwith Colabβββin another browser tab. Click: Edit β€ Clear all outputs. Then, making sure youβre in Command mode, press <SHIFT> + <CMD|CTRL> + AΒ , then <CMD|CTRL> + C. Go to your new Colab notebook and press <CMD|CTRL> + VΒ . Mount your GDrive using the Mount Drive icon method and delete the old Jupyter notebook to avoid confusion.
3. Fastest way to Copy your Data toΒ Colab
One way is to copy it from Google Cloud Storage. You do have to sign up for a Google Cloud Platform account, but Google offers a Free Tier, which gives your 5GB, as long as you select US-WEST1, US-CENTRAL1, or US-EAST1 as yourΒ region.
Besides security, another benefit of using Google Storage buckets is if youβre using TensorFlow Datasets for your own data (if you arenβt, you really shouldΒ β¦ Iβll be writing an article about why Follow me to be notified), you can bypass the copying and load your datasets directly using tfds.load():
Another way is to use the wonderful gdown utility. Note that as of this writing (Mar 2022), even though gdown is pre-installed on Colab instances, you have to upgrade the package for it to work. Youβll need to have the file ID, which you can get by Right-Clicking on the file in Drive β€ Get link β€ Anyone with link, and then plucking out the ID (shown in bold below) from the providedΒ URL:
https://drive.google.com/file/d/1sk…IzO/view?usp=sharing
NOTE: Since anyone with the link can access your file with this file-sharing permission, use this method only for your personal projects!
4. Bypass βpip installΒ β¦β on Runtime (Kernel)Β Restarts
If youβre like me and are always restarting and re-running the python kernelβββusually because youβve got non-idempotent code like dataset = dataset.map().batch()β save precious seconds and keep your notebooks readable and concise by not invoking pip install every time your restart the runtime, by creating a dummy file and testing for its presence:
![ ! -f "pip_installed" ] && pip -q install tensorflow-datasets==4.4.0 tensorflow-addons && touch pip_installed
In fact, use this same technique to avoid unnecessary copying and downloading of those large data files whenever you rerun your notebook:
![ ! -d "my_data" ] && unzip /content/drive/MyDrive/my_data.zip -d my_data
5. Import your own Python Modules/Packages
If you find yourself constantly using a helper.py, or your own private Python package, e.g. to display a grid of images or chart your training losses/metrics, put them all on your GDrive in a folder called /packages andΒ then:
import sys
sys.path.append('/content/drive/MyDrive/packages/')
from helper import *
6. Copy files To Google StorageΒ Bucket
If youβre training on Googleβs TPUs, your data has to be stored in a Google Cloud Storage Bucket. You can use the pre-installed gsutil utility to transfer files from the Colab VMβs local drive to your storage bucket if you need to run some pre-processing on data prior to kicking off each TPU trainingΒ run.
!gsutil -m cp -r /root/tensorflow_datasets/my_ds/ gs://my-bucket/
7. Ensure all Files have been Completely Copied toΒ GDrive
As Iβm sure youβre well aware, Google can and will terminate your session due to inactivity. There are good reasons not to store (huge) model checkpoints during model training to your mounted GDrive. E.g. so as not to exceed Colab I/O limits, or youβre running low on your GDrive storage quota and need to make use of the ample local disk storage on your Colab instance, etc. To ensure that your final model and data is completely transferred to GDrive if youβre paranoid, call drive.flush_and_unmount() at the very end of your notebook:
from google.colab import drive
model.fit(...) # I'm going to take a nap now, <yawn>
model.save('/content/drive/MyDrive/...')
drive.flush_and_unmount()
Note that the completion of copying/writing files to /content/drive/MyDrive/ does not mean that all files are safely on GDrive and that you can immediately terminate your Colab instance because the transfer of data between the VM and Googleβs infrastructure happens asynchronously, so performing this flushing will help ensure itβs indeed safe to disconnect.
8. Quickly open your local JupyterΒ Notebook
Thereβs no need to copy yourΒ .ipynb to GDrive and then double-click on it. Simply go to https://colab.research.google.com/ and then click on the Upload tab. Your uploaded notebook will reside on GDrive://Colab Notebooks/Β .
9. Use Shell Commands with Python Variables
OUT_DIR = './models_ckpt/'
...
model.save(OUT_DIR + 'model1')
...
model.save(OUT_DIR + 'model2')
...
!rm -rf {OUT_DIR}*
Make use of the powerful Linux commands available to youΒ β¦ why bother with importing the zip file and requests libraries and all that attendant code? In fact, get the best of both worlds by piping the output of a Linux command to a Python variable:
wget -O data.zip https://github.com/ixig/archive/data_042020.zip
unzip -q data.zip -d ./tmp
# 'wc': handy linux word and line count utility
result = !wc ./tmp/tweets.txt
lines, words, *_ = result[0].split()
10. Am I running in Colab orΒ Jupyter?
If you switch between running your notebooks on your local machine and training on Colab, you need a way to tell where that notebook is running, e.g. donβt pip install when running on your local machine. You can do thisΒ using:
COLAB = 'google.colab' in str(get_ipython())
if COLAB:
!pip install ...
11. Message Me,Β Baby!
No need to sit around waiting for your training to complete, have Colab send you a notification on your phone! First, youβll need to follow the instructions to allow CallMeBot to message you on your Signal/FB/WhatsApp/Telegram app. Takes all of one minuteβββvery simple, quick, and safe signup. Then, youΒ can:
import requests
from urllib.parse import quote_plus
number = '...'
api_key = '...'
message = quote_plus('Done Baby!')
requests.get(f'https://api.callmebot.com/signal/send.php?phone={number}&apikey={api_key}&text={message}')
12. Use IPython CellΒ Magics
Okay, this one is not specific to Colab Notebooks as it applies to Jupyter Notebooks as well, but here are the most useful ones to knowΒ about.
%%captureΒ : Silence the copious, annoying outputs from executing statements in a cell. Useful for those βpip installΒ β¦β, βtfds.load(β¦)β, and innumerable TensorFlow deprecation warnings.
%%writefile <filename>Β : Writes the text contained in the rest of the cell into a file. Useful for creating a YAML, JSON, or simple text file on-the-fly forΒ testing.
%tensorflow_version 1.xΒ : If youβre still stuck in the past (not judgingΒ β¦ well, maybe just a little!), donβt βpip install tensorflow==1.0β, itβll make a mess of dependencies, use this line magic instead, before importing TensorFlow.
13. Dock theΒ Terminal
If youβre a Pro/Pro+ user, you have access to the VM via the Terminal. You can do some super-powerful things like run your own jupyter server on it (so you can have back the familiar Jupyter Notebook UI if thatβs your preference). Also for wrangling files, the terminal is invaluable.
When you click on the Terminal icon on the left sidebar, the Terminal panel pops out on the right of the page, but thatβs really difficult to use given how narrow it is and constantly having to close it so you can see your code! Solution: Dock the Terminal as a separate Tab. After opening the Terminal, click on the Ellipsis (β¦) β€ Change page layout β€ Single tabbedΒ view.
14. Change those Shortcuts
Who can remember the obscure shortcuts that Google assigns?! Go to Tools β€ Keyboard shortcuts, and make sure you assign the following shortcuts to your own memorable two-key combinations:
- Restart runtime and run all cells in theΒ notebook
- Restart runtime
- Run cells before theΒ current
- Run selected cell and all cellsΒ after
The nice thing is that Colab remembers your changes (be sure to click on the Save button at the bottom of the pop-out) so you only need to do thisΒ once.
15. GitHub Integration
You can launch any notebook hosted on GitHub directly in Colab using the following URL:
https://colab.research.google.com/github/<org>/<repo>/β¦/<xx.ipynb>
Or you can just bookmark your favorite org/repo with the following URL and youβll be prompted by a file browser whenever you click on the bookmark:
https://colab.research.google.com/github/<org>/<repo>
Use Google Colab Like A Pro was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Join thousands of data leaders on the AI newsletter. Itβs free, we donβt spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI