Objects that Sound: DeepMind’s Research Show How to Combine Vision and Audio in a Single Model

Towards AI Team

3 years ago

Called AVE-Net, the new architecture remains a major breakthrough in multi-modal learning.

Published via Towards AI