Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

DINOv2
Latest   Machine Learning

DINOv2

Last Updated on July 25, 2023 by Editorial Team

Author(s): Michał Oleszak

Originally published on Towards AI.

AI Pulse #1: DINOv2, All The LLMs & Open-Source AI

A new foundational model for computer vision, making sense of the spree of open-source LLMs, and should AI be open-source?

AI Pulse is also available at pulseofai.substack.com.

In this edition:

  • DINOv2, a universal computer vision backbone;
  • A spree of open-source LLMs emerges following LlaMa’s leak;
  • Should AI models be open-sourced?

TL;DR

U+1F4E2 Meta releases the second version of their self-DIstillation with NO labels or DINO model, which can be used as a generic computer vision backbone without the need to fine-tune it.
U+1F4DD Paper: https://arxiv.org/abs/2304.07193
U+1F4BB Code: https://github.com/facebookresearch/dinov2
U+1F440 Demo: https://dinov2.metademolab.com/

The News

DINOv2 is a family of models that learn visual features from unlabeled data. These features can then be used out of the box for a wide range of downstream tasks, including image classification, segmentation, or depth estimation. The models show interesting properties, such as understanding the object’s parts and the scene geometry, which make it a suitable backbone for even more complex tasks.

The novelty is that the DINOv2 backbone, pre-trained in a self-supervised way, does not require fine-tuning. One can take it as-is and, for example, build a small linear classifier on top of it to solve any image classification task. This is in contradiction to all the self-supervised architectures to date, which typically require fine-tuning the entire network’s weight, including the backbone, in order to perform well on downstream tasks.

Meta open-sourced not only the training code but also trained models in a range of sizes.

Delving Deeper

Self-supervised learning (SSL) is a learning paradigm in which the model is trained to learn features from unlabeled data. This is very convenient for use cases in which data annotation is hard or expensive, such as medical diagnosis. But SSL techniques have also yielded performance improvements in other scenarios thanks to the fact that they can learn from larger datasets and are not influenced by biased or incorrect annotations.

DINOv2 builds heavily on top of its first version. Indeed, the authors openly state that most of the technical contributions of v2 aim at accelerating and stabilizing the training. Just like v1, DINOv2 is trained in a self-distillation process with no labels:

  • Two Visual Transformers (ViTs) are instantiated with the same architecture: the Teacher and the Student.
  • A number of random crops are cut out from each training image. Some of them are global crops and contain a large part of the original image, while others are local crops that comprise just a small part.
  • All the crops are passed through the Student network, and only global crops are passed through the Teacher.
  • The output representations from both networks are compared with the cross-entropy loss. Student’s weights are updated based on this loss to encourage them to produce output more similar to that of the Teacher. Teacher’s weights, on the other hand, are updated with an exponential moving average of the Student weights.

DINOv2’s main advantage over its predecessor is the dataset it had available for pre-training. The authors note that most SSL developments so far have been made in the context of pre-training on ImageNet, whose lack of diversity might lead to overfitting to the few dominant modes. To that end, they implement a simple yet effective clustering mechanism that allows them to collect a curated, diverse image set.

Behind the News

Meta has been leading the research on self-supervised methods for computer vision for some time. In 2021 Yann LeCun, Meta’s Chief AI Scientist, published what is now a famous blog post titled Self-supervised learning: The dark matter of intelligence. In it, LeCun claimed that SSL is one of the most promising ways to build background knowledge and approximate a form of common sense in AI systems.

Since then, Meta’s researchers released many successful SSL architectures, including MoCo or DINO. Last week, they summarized their expertise on the topic in The Self-Supervised Learning Cookbook.

A spree of open-source LLMs

TL;DR

U+1F4E2 At the end of February this year, Meta announced LLaMa, their answer to OpenAI’s GPT models. Initially, LLaMa was not intended to be open-sourced, but a week after its announcement, the model leaked on 4chan, commencing a crazy spree of other open-source LLMs that build on top of it. This piece will help you make sense of this abundance of Large Language Models and associated projects.

  1. Alpaca
    U+1F310 https://crfm.stanford.edu/2023/03/13/alpaca.html
    A fine-tuned LLaMa trained to follow instructions. Specifically, Meta’s 7B LLaMa was fine-tuned on 52K instruction-following demonstrations generated from OpenAI’s text-davinci-003, the model behind GPT-3. It is noteworthy how the authors took advantage of the synergy effect created by the abundance of LLMs: they created their model by using another LLM to generate training data to fine-tune yet another LLM.
  2. Vicuna
    U+1F310 https://vicuna.lmsys.org/
    Another fine-tuned LLaMa, this time on conversations between ChatGPT and its users. Specifically, Meta’s LLaMa has been fine-tuned on the data shared by ChatGPT users at sharegpt.com. Reasonably, the model can be expected to mimic ChatGPT’s behavior. The authors used GPT-4 to assess Vicuna and learned that it features 90% of ChatGPT quality.
  3. Koala
    U+1F310 https://bair.berkeley.edu/blog/2023/04/03/koala/
    Similar to Vicuna, Koala is a LLaMa fine-tuned on publicly available conversations. On top of ShareGPT conversations, it also uses a set of other datasets. The author’s main finding is that more data is not always better: a Koala version that uses only high-quality training data performs better than a version fine-tuned on more uncurated datasets.
  4. GPT4-x-Alpaca
    U+1F310 https://huggingface.co/chavinlo/gpt4-x-alpaca
    Just like Alpaca was trained by fine-tuning LLaMa to follow instructions, GPT4-x-Alpaca is a LLaMa fine-tuned on the GPTeacher data, a collection of instruction-following datasets generated by GPT4.
  5. ColossalChat
    U+1F310 https://github.com/hpcaitech/ColossalAI
    A model based on LLaMa. The authors expose not only the chatbot itself but also the entire training pipeline, including the Reinforcement Learning with Human Feedback (RLHF) component.
  6. ChatLLama
    U+1F310 https://github.com/juncongmoo/chatllama
    A LLaMa fine-tuned with RLHF just like ChatGPT. The authors publish the training code allowing everyone to train their own ChatGPT-like model. What’s more, the training is runnable in a single GPU and supposedly 15 times faster than that of ChatGPT.
  7. OpenAssistant
    U+1F310 https://open-assistant.io/
    A project meant to give everyone access to chatbots. As part of the effort, the authors release a large dataset, OpenAssistant Conversations, and ask everyone to contribute by submitting, ranking, and labeling model prompts and responses.
  8. FreedomGPT
    U+1F310 https://www.freedomgpt.com/
    A version of Alpaca accompanied by a simple UI, allowing to run the uncensored model locally and privately.
  9. WizardLM
    U+1F310 https://arxiv.org/abs/2304.12244
    Another LLaMa fine-tuned on instruction-following data. This time, the authors used another LLM to generate instructions of varying complexity. Starting with a set of simple instructions, they used a model to rewrite them step by step into more complex instructions.

Should AI models be open-sourced?

TL;DR

U+1F4E2 The explosion of generative models we have witnessed in the pasts months sparks the discussion about their accessibility. “Nature” makes an important contribution to the debate, speaking in favor of open-sourcing AI.

Debating AI-access

Generative AI has existed for a time already, but the Cambrian explosion we are witnessing these days took off quite recently when the end-users were given the opportunity to interact with the technology directly.

It all started with image-generating models such as DALL-E 2, Stable Diffusion, and Midjourney. Then, the time of Large Language Models came with the release of ChatGPT, followed by a number of similar chatbots. Some of them, including OpenAI’s GPT-based models and Google’s Bard, are paywalled, while others, most notably many of the LLMs built on top of Meta’s LLaMa model, are freely available.

There are as many advocates of open-sourcing AI models as there are critics of the idea. The former often point out that wide access to new algorithms accelerates both research progress as scientists build on each other’s work as well as market adoption as companies can easily build AI-based products. The critics, on the other hand, often warn against bad actors using open-source technology for unethical or dangerous ventures.

Nature’s Voice for OpenAI

A new voice adds to the debate in the form of an article published on the website of the journal Nature. In it, the author advocates for open access to AI models for everyone, presenting the following arguments.

  • Providing unrestricted access to AI models permits investigators to examine the inner workings of the model, adjust its code, and identify bugs. Active involvement and oversight by the scientific community can aid in ensuring the security of such models over time.
  • AI models that are available as open-source are crucial for the ability to replicate scientific findings, as proprietors of closed AI systems have the ability to modify their product or the data used to train it, causing its outputs to change unpredictably.
  • The use of proprietary AI in scientific research raises concerning ethical issues as texts or images used to train these models are often undisclosed and could comprise private information exchanged among social media users or material generated by children who are not able to consent to share these data.

The article goes on to call on scientists to move away from using proprietary AI in their own work where possible and switch to open models. It also urges governments to increase funding for projects oriented toward producing open-source models for research.

Our take on it

Perhaps there is a right place for both proprietary and open-source AI, just like with other forms of software. Some proponents of the open-source talk about the “Linux moment” of generative models, referring to the surge in popularity of free access to source code started by the Linux operating system. But after all, despite Linux’s popularity among developers, the proprietary Microsoft Windows is still the number one OS on the market, followed by MacOS.

Closed models can provide a ton of value for society and their creators at the same time, thus not eliminating the incentive to innovate. They just need to be properly validated and approved for safety. Is the AI Certification Engineer a job of the not-so-distant future?

Thanks for reading! AI Pulse is also available as a free newsletter on Substack. If you liked it, help me improve by subscribing and sharing it with colleagues and friends.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓