Two Ways to Learn Audio Embeddings
Two Ways to Learn Audio Embeddings

Author(s): Edward Ma

Speech2Vec with Skip-gram and CBOW

Photo by Álvaro Bernal on Unsplash

Mel-frequency cepstral coefficients (MFCC), zero-crossing rate are some of classical feature for audio. It can be extracted via the library easily. However, it may not able to provide a high-quality signal or input for deep learning models nowadays.

Two teams of researchers propose a different way to learn audio embeddings but not leveraging those classical features. Chung and Glass (2018) proposes to learn word-based embeddings while Haque et al. (2019) suggests learning sentence-based embeddings.

Chung and Glass are inspired by word2vec to propose a different way to learn audio embeddings. word2vec leverages skip-gram or continuous bag-of-word (CBOW)… Read the full blog for free on Medium.

