How Google made “Hum to Search”?
Last Updated on May 7, 2021 by Editorial Team
Author(s): Daksh Trehan
A perfect tool to drive earworms away.
Table of Content:
- How to use Google Hum feature?
- How is Google using ML in the “Hum” feature?
Were you ever in the exam hall/conference room and all you could think of was the weird music sound that was playing in the disco last night?
Don’t worry, we’ve all been there. This phenomenon is known as an earworm. And, to drive the earworm away and ease your mind, the only trick is to sing/listen to that tune.
But, you don’t know that song and only a “hum” voice is revolving inside your head? Don’t worry, Google is here.
Google Hum is an advancement of usual music recognition systems.
Shazam, Pixel Sound Search is all fine, but they can only recognize the exact tune with the presence of pitch, tempo, and instruments. But Google took it to another level and introduced “Hum” that can even recognize song name if you “hum” at Google for 15 seconds with hum tone matching that with any particular song.
How to use Google Hum feature?
It is easy, go to your google search, tap on the “mic” and “hum” the tune/song.
How is Google using ML in the “Hum” feature?
In a typical musical recognition system, to process audio, the sample is converted to a spectrogram to find an exact audio match. But this can’t be done in the case of hummed voice, because the hummed voice doesn’t include any sort of extra additions like tempo, pitch, loudness, etc. All it has is a random tune, that our model has to match to the closest song.
To achieve the above-mentioned technique, our model needs to be very robust and must ignore everything exact for the voice note. To make it work, we need to make changes in already defined Sound Recognition models.
The humming sound is transformed into a number-based sequence for easy computation. The modified neural network is then trained with a pair of hummed and studio-recorded audio that produces embeddings for each input pair thus creating a fingerprint-like unique identity. The model must be potent enough to distinguish two different songs with the same melody but different music and instrumentation.
The trained model generates an embedding for each input hum and looks for song/tune with similar embedding in its training corpus.
The model is further made sturdy by experimenting with pitch, loudness, bass, the energy of studio recording. Also, mixing and matching two different audios of the same singer helps to achieve higher accuracies.
I just hope Google has trained with all real poor hummed versions of songs and tried themselves personally.
The training is incorporated with Triplet Loss which tries to ignore few training points thus avoiding various Neural Networks drawbacks. When we pass pair of audio and corresponding melody to our model, triplet loss tends to ignore those values of training data that are derived from an unusual melody i.e. it leaves behind the accompanying instrument audios and generates a number-based sequence for each melody.
In this article, we tried to shed a light on working of Google Hum and how Machine Learning is becoming the core of new virtual world.
Feel free to connect:
Portfolio ~ https://www.dakshtrehan.com
LinkedIn ~ https://www.linkedin.com/in/dakshtrehan
Follow for further Machine Learning/ Deep Learning blogs.
Medium ~ https://medium.com/@dakshtrehan
Want to learn more?
Are You Ready to Worship AI Gods?
Detecting COVID-19 Using Deep Learning
The Inescapable AI Algorithm: TikTok
GPT-3 Explained to a 5-year old.
Tinder+AI: A perfect Matchmaking?
An insider’s guide to Cartoonization using Machine Learning
Reinforcing the Science Behind Reinforcement Learning
Decoding science behind Generative Adversarial Networks
Understanding LSTM’s and GRU’s
Recurrent Neural Network for Dummies
Convolution Neural Network for Dummies
Published via Towards AI