DPO, Open-Source’s New Weapon in the AI War
The end of RLHF?

“It is only rarely that, after reading a research paper, I feel like giving the authors a standing ovation.“

If this is how one of the most prominent researchers in the world, Andrew Ng, refers to a recent research paper, you know it’s awesome.

A group of researchers from Stanford and CZ Biohub has presented DPO, a new alignment breakthrough that could give back to the open-source community the capacity to challenge the big tech companies; something thought impossible… until now.

When one looks at the numbers, it’s easy to realize that building the best Large Language Models (LLMs) like ChatGPT is a rich people’s game.

The current gold standard to build these models is as follows:

Source: Chip Huyen

You first assemble billions of documents with trillions of words and, in a self-supervised manner, you ask the model to predict the next

