Towards AI Can Help your Team Adopt AI: Corporate Training, Consulting, and Talent Solutions.


Basics of Foundation Models
Latest   Machine Learning

Basics of Foundation Models

Last Updated on January 10, 2024 by Editorial Team

Author(s): manish kumar

Originally published on Towards AI.

Generative AI is taking the world by storm. Everybody is talking about ChatGPT, BARD, and Large Language Models (LLMs). Every day, new research and new information flood our technical newsletter subscriptions and our favorite technical blogs. This is good; in fact, it has opened the world’s eyes to so many different use cases that can be achieved in less effort than it would have been earlier. I am personally liking this scenario. Nonetheless, one thing that keeps bothering me is the fundamentals behind it. By fundamentals, I mean to understand the whole landscape of Generative AI, which is not just a synonym for Large Language Models (LLMs). How, the thing that we call Machine Learning, is different from today’s talk of the town, Generative AI? I started my quest to clear these fundamentals. Eventually, I realized I should share the journey with my fellow community members. This article is about sharing some of those fundamental learnings. I will start with my understanding of two approaches to Machine Learning.

Data Orientation vs. Task Orientation

How were we doing machine learning almost a year ago? In fact, even today. Our approach is you have a use case, you are given data, you think supervised vs. unsupervised, you think exploratory data analysis, you do feature engineering or data preprocessing, you build different models, you test those models, you adjust their parameters, and finally, you choose the best model for your use case. There is nothing wrong with that; in fact, I have followed the same processes and produced fantastic business use cases. I am just pointing out the approach that we take. Here, we are focused on data and models. Our whole aim is to refine the data and choose the best model for the use case. In other words, for your use cases, if benchmark tests are not up to the standards, you investigate errors and either improve upon the data or tune the model parameters. This is a traditional data-oriented approach to machine learning.

The other approach is a task-oriented approach to machine learning. In this approach, the focus is on picking a model that is prebuilt on a large set of different types of data sets and has the ability to multitask, which is generic in nature. Then, you can adapt the picked-up model to perform the task you are aiming to achieve. This adaptation is also called fine tuning. These models are called foundational models or generative models. In this approach, the focus is to use a foundational model for your specific tasks with fewer training examples.

So, data volume requirements are lower here. That means that most of your entire approach goes to building a limited set of quality data sets that will be the best representation of the tasks that you want to achieve. This is a paradigm shift from voluminous data to just enough accurate data. It is equally important to choose the right foundation model. That choice heavily depends upon the broad set of data the foundation is built upon. This sort of approach calls for a totally different type of mindset than that used in traditional approaches.

Understanding The Foundation Models

I will go a bit technical in this section so that we have a better understanding of the foundation models. As discussed briefly earlier, foundation models are those that are trained on a broad set of data that can be adapted (fine-tuned) to wide range of tasks specific to your business needs. They are called foundation models because, with that wide set of data, you build foundations that need not change every time you adapt it to a specific business use case. The building of foundation models is based on deep neural networks and self-supervised learning techniques. In other words, you can think of foundational models as neural networks performing generic tasks without having to be trained for one specific task. And they can handle multiple types of data (images, text, video, and audio).

Another important thing is worth understanding. Foundational models are feasible to use in our day-to-day activities or for any business use case because adaptability is the core principle behind them. They support Transfer Learning. What is Transfer Learning? Transfer Learning is the mechanism through which knowledge learned from one activity can be applied to another. Now, in a deep learning sense, the model is trained (also called pre-training) on generic abstract tasks. It is then adapted for other tasks specific to your use case via fine tuning. For example, ChatGPT is fine-tuned over the base Generative Pre-trained Transformer (GPT) model.

Dimensions of Foundation Models

Based on whatever literature I have read, I can think of foundational models of the major three dimensions.


This dimension defines the purpose of the model. The purpose of the model is to define the approach toward pre-training or building it. For example, if the purpose of the model is to generate new content like images, music, code, or contextually appropriate text, then they are generative in nature and are called Generative Models. They give you outputs that is not at all in the input training data.

There are some Foundation Models that make predictions based on input data. This can include tasks like classification (classifying emails as spam or not), regression (predicting house prices), and even more complex tasks like language translation. They look into the context and give you the predictions.

Model Architecture

Another thing you need to understand is that each foundation model has a different design and architecture that are optimized for the purpose and data it is using. Transformer architecture-based models are used for handling sequential data like text corpus. They are designed to handle long-range dependencies in textual data. Similarly, diffusion-based models are particularly used in image generation. These models follow physical diffusion by gradually changing a sample of random noise into an image (or other form of data).


Modality refers to the type of data that can be handled by the model. Some models are focused on a single data type or modality. For instance, a model might be specifically designed for processing text, analyzing images, or interpreting audio signals. Each of these models would be optimized for performance in its respective domain. Modality also helps in deciding the model architecture. A model can also be multi-modal. Multimodal models, in contrast, are capable of understanding and processing multiple types of data. For example, a multimodal model might be able to both see images and read text and use information from both modalities to make decisions or generate outputs. This type of model is particularly exciting because it mimics the human ability to process and integrate multiple types of information simultaneously. Gemini Prov Vision is a great example of a multimodal example.

Not all Foundation Models are Large Language Models(LLMs)

I am writing this article because we have a wave of information coming our way with the advent of ChatGPT and BARD. However, people seeking the fundamentals behind these applications need to understand two things. First and foremost, foundation models have been around for a long time. Number two, they have just become popular because these foundation models are now accessible to non-researchers or non-data scientists in the form of Large Language Model applications. Because of that, Large Language Models are becoming very popular. However, just understand one thing: the world of Foundation Models is much more than that of Large Language Models. The diagram below will give you an idea of what I am talking about. It shows some examples of different types of Foundation Models. We can have more than the categories shown in the diagram.

Special Note about Question and Answer (Q&A) systems

Question and Answer systems over unstructured texts are one of the most widely adopted use cases in the industry today. After easy access to LLMs, enterprises have increasingly adopted this use case. This begs the question: why did it not see the same surge before the advent of ChatGPT? Earlier, I talked about the dimensions of Foundation Models. In that, I talked about the dimension of purpose. This is the main difference between earlier Q&A systems and LLM-powered Q&A systems. Earlier, Q&A-based models like BERT were predictive in nature. These models understand a question based on keywords and their sequence and then identify or predict the correct answer from a given document text (like a paragraph or article). This is kind of predictive in nature. You predict the best match and give that as an answer. On the other side, LLM-powered Q&A is generative in nature. They understand the question not only based on the text sequence but also the intent behind it. Based on that, given the relevant text as the context, they generate the response. This means they produce an answer that might not be explicitly stated in the text but is inferred or created based on the model’s understanding of the question, its extensive training data, and the context given. Users can connect more to LLM-powered Q&A systems. Hence, the increased adoption.


I hope that this article conveys some fundamental information about Foundational models. In this article, we talked about the changing approach to Machine Learning. We covered a basic understanding of Foundation Models and introduced dimensional thinking to them. We clarified that foundational models are much more than today’s buzzword of large language models. We have also understood the difference between predictive Q&A systems and generative Q&A systems. Before I conclude, I want you all to think about factors that should lead you to choose a Foundation Model-based business solution. Sometimes, traditional ML models are suited where there are budget constraints and lesser-known facts about actual data. Foundation models are more suitable with abundant computing and vast datasets. These models offer a way to gain a long-term strategic advantage, are scalable, and can handle complex tasks. Thanks. Keep reading and coding!!!!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓