CogVLM, a Revolutionary Multimodal Model Introducing Deep Fusion
Author(s): Ignacio de Gregorio

Originally published on Towards AI.

Solving the Shallow Alignment Issue

A group of researchers has presented a new model that revolutionizes the current multimodal AI design standards while blowing almost all competition out of the water.

They introduce an innovative concept, Deep Fusion, a new design primitive that mitigates the biggest problem faced by Multimodal Large Language Models (MLLMs) today, the “shallow alignment problem”.

If it were to deliver on its potential, the CogVLM model could become a seminal research paper that will draw the attention of researchers around the world to create a new family of MLLMs, deep fusion models.

The actual results? Impressive capabilities like coding math problems from images,… Read the full blog for free on Medium.

