Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!


Foundation Models: Scaling Large Language Models
Latest   Machine Learning

Foundation Models: Scaling Large Language Models

Last Updated on July 25, 2023 by Editorial Team

Author(s): Luhui Hu

Originally published on Towards AI.

Inside generative AI and LLMs such as ChatGPT, GPT-4, Claude, Bard, LLaMA, ToolFormer, Google USM, PaLM, NeMo, Dolly, etc.

Foundation — Tree of Life (Photo courtesy by author)

The thrilling AI journey took off a decade ago with the advent of deep learning (DL), and it continues to forge ahead. Recently, ChatGPT emerged, making an impact as significant as AlphaGo did back in 2016. DL became a familiar term to many during that time, and now, the spotlight is on One Transformer.

Following AlphaGo’s triumph, computer vision (CV) ascended to the forefront of the AI industry, serving as the cornerstone for numerous startup ventures. Presently, the influence of ChatGPT is paving the way for Large Language Models (LLMs) to become the driving force behind the next wave of AI-focused startups.

New Moore’s Laws Achieving Zettascale Computing

As the traditional Moore’s Law reaches its twilight, new laws are emerging to define the evolution of computing performance. The exponential growth of GPU performance and supercomputer systems has accelerated AI’s advancements, with LLMs as a prime example. Despite their extensive training times, these LLMs are benefiting from the rapid growth of computational power.

Moore’s Law (source: ISSCC 2023 Plenary)

Moore’s Law, famously predicting the number of transistors on a microchip would double approximately every two years, is now being replaced by new performance-based laws: GPU performance doubles every 2.2 years, while supercomputer system performance doubles every 1.2 years. These advancements are shaping the way AI/ML technologies progress.

GPU Performance Trend (source: ISSCC 2023 Plenary)
Supercomputer System Performance Trend (source: ISSCC 2023 Plenary)

Despite the rapidly increasing performance, the training of LLMs still takes anywhere from days to months. This extended duration speaks to the complexity and vast potential of these models. As computational power continues to soar, it will unlock new possibilities for AI research and development.

Time-to-Train Large AI Models (source: ISSCC 2023 Plenary)

In the coming decade, the AI landscape is set to enter the era of Zettascale Computing. As a result of the new Moore’s Laws, AI performance is expected to dramatically outpace other computing advancements. This shift to Zettascale Computing will provide unprecedented processing capabilities, enabling further breakthroughs in AI and other domains.

AI Performance Dramatically Increasing (source: ISSCC 2023 Plenary)

The new Moore’s Laws, focusing on GPU and supercomputer performance, herald a new era for AI research and development. With the advent of Zettascale Computing, we can expect even more rapid growth in AI capabilities, impacting various industries and shaping the future of technology.

Generative AI Journey with State-of-the-Art LLMs

Generative AI (GAI) has experienced rapid advancements in text-to-images, videos, and 3D.

But ChatGPT and GPT-4 took the world by storm. These are LLMs-based GAI like other state-of-the-art LLMs: Claude, Bard, LLaMA, ToolFormer, Google USM, PaLM, NeMo, Databricks Dolly, etc. These have revolutionized NLP, enabling a myriad of applications once thought to be unattainable.

Despite their impressive capabilities and increasing computing power, LLMs face common challenges such as scalability, training efficiency, and the need for high-quality training data.

It reportedly required over 3 million GPU hours across 3072 GPUs to train GPT-3’s 175 billion parameters over a period of several months.

To address these common challenges, foundation models are emerging as a potential solution. These models aim to provide a solid base for AI development, enabling researchers and developers to build upon them and adapt them for various tasks more efficiently.

By focusing on foundation models, the AI community can tackle the limitations posed by scalability, performance, training efficiency, and data quality, ultimately unlocking the full potential of LLMs and other large-scale models (LSMs) in diverse applications.

The Era of Foundation Models

Foundation models are pre-trained AI models serving as a basis for building diverse applications and tasks. Designed to be versatile, adaptable, and robust, they offer strong leverage across a wide range of use cases.

The concept of foundation models was introduced by Stanford with two significant points: emergence and homogenization.

  1. Emergence: Referring to the implicit induction of a system’s behavior, emergence is a source of both scientific excitement and concern regarding unforeseen consequences. Foundation models learn from vast amounts of data, developing intricate patterns and relationships that can exhibit surprising behaviors.
  2. Homogenization: Foundation models consolidate methodologies for building ML systems across various applications. While this homogenization provides strong leverage for many tasks, it also creates single points of failure, raising concerns about resilience and reliability.

The astounding success of GAI and human-like ChatGPT has ushered in a new era of foundation models, laying the groundwork for large-scale models and the rise of artificial general intelligence (AGI).

Digital Evolution (by author)

Foundation models have emerged to transform the digital world. Their impact is comparable to other milestones in digital evolution, such as the invention of electricity, the advent of the internet, and the rise of cloud computing.

By bridging the gap between narrow AI and AGI, foundation models are shaping the future of AI research and development, opening up new possibilities and opportunities in the rapidly evolving digital landscape.

Key Characteristics of Foundation Models

Foundation models have rapidly become the core of AI. They share several key characteristics, highlighting their potential and significance in shaping the future of AI.

  1. Pre-trained and Adaptable: A defining characteristic of foundation models is their pre-trained nature, allowing them to serve as a starting point for various applications and tasks. Through transfer learning and fine-tuning, these models can be adapted to address specific challenges and requirements, significantly reducing development time and resources.
  2. Scalability: Designed to be scalable, foundation models can handle vast amounts of data and grow in complexity as required. This scalability enables them to tackle a broad range of tasks and accommodate the ever-increasing demands of the AI landscape.
  3. Versatility: Foundation models boast remarkable versatility, as they can be employed across multiple domains and industries. From language and vision to healthcare and finance, these models serve as a basis for a wide range of applications.
  4. Self-Supervised Learning: A key aspect of foundation models is their ability to utilize self-supervised learning techniques. By leveraging large-scale, unlabeled data, these models can learn complex representations and features, greatly improving their performance on various tasks and reducing dependence on labeled data.
  5. Robustness: Foundation models are known for their robustness, demonstrating resilience in the face of noisy, incomplete, or even adversarial data. This robustness allows them to maintain high levels of performance and accuracy across different contexts and challenges.
  6. Interoperability: Interoperability is another critical characteristic of foundation models, as they can be easily integrated with existing systems and frameworks. This seamless integration facilitates collaboration between different AI models and components, streamlining the development process and fostering innovation.
  7. Generalization: The ability to generalize is a hallmark of foundation models, enabling them to perform well on unseen data and novel tasks. This characteristic allows them to adapt to a variety of challenges, making them an invaluable asset in AI research and development.

By understanding the key characteristics of foundation models, such as their pre-trained nature, adaptability, scalability, versatility, self-supervised learning capabilities, robustness, interoperability, and generalization, we can better appreciate their potential and impact on the future of AI.

Capabilities of Foundation Models Beyond LLMs

Foundation models have made a significant impact beyond LLMs, offering a versatile and powerful approach to solving complex problems across various domains in language, vision, robotics, reasoning and search, interaction, and philosophy of understanding.

  1. Language: Foundation models excel in language, demonstrating human-like comprehension and generation of text. From machine translation and sentiment analysis to summarization and question-answering, these models are unlocking new possibilities in language-related applications and enhancing communication between humans and machines.
  2. Vision: In the realm of computer vision (CV), foundation models are transforming the way we analyze and interpret visual data. By effectively recognizing objects, detecting patterns, and segmenting images, these models are enabling advancements in fields such as autonomous vehicles, medical imaging, and surveillance systems.
  3. Robotics: By incorporating self-supervised learning and reinforcement learning techniques, foundation models are empowering robots to learn from their environments, adapt to new tasks, and interact more effectively with humans.
  4. Reasoning and Search: Foundation models are enhancing our ability to reason and search through vast amounts of data, extracting valuable insights and uncovering hidden connections. Their capabilities extend to logical reasoning, pattern recognition, and knowledge graph exploration, enabling more informed decision-making and efficient problem-solving across numerous industries.
  5. Interaction: The interactive capabilities of foundation models facilitate more natural and intuitive communication between humans and machines. By understanding and generating human-like responses, these models pave the way for seamless collaboration and improved user experiences in applications such as chatbots, virtual assistants, and customer support systems.
  6. Philosophy of Understanding: At the core of foundation models lies the philosophy of understanding, aiming to uncover the underlying principles and mechanisms that enable machines to comprehend and interpret complex data.
Foundation Models for Vision by Harnessing Self-Supervision (source: Stanford Foundation Models paper)

The capabilities of foundation models span across language, vision, robotics, reasoning and search, interaction, and philosophy of understanding, highlighting their potential to reshape the AI landscape. By exploring these capabilities, we can foster responsible innovation and unlock the full potential of foundation models in addressing the world’s most pressing challenges.

AI Engineering

AI engineering is a burgeoning discipline combining software engineering principles with AI techniques to design, build, and scale intelligent systems.

As large-scale foundation models continue to revolutionize the AI landscape, AI engineering plays a pivotal role in their development and deployment.

AI engineering offers the tools and techniques necessary to scale out large-scale models while maintaining their performance and adaptability. Some aspects of scaling out these models through AI engineering include:

  1. Distributed Training: AI engineers harness the power of distributed computing to train large-scale models on vast amounts of data, accelerating the training process and improving model performance.
  2. Data Management: AI engineers ensure that the data used for training and fine-tuning foundation models is well-organized, clean, and representative of the target domain.
  3. Resource Management: AI engineers optimize the use of computational resources, such as GPUs and TPUs, ensuring that large-scale models can be trained and deployed efficiently and cost-effectively.
  4. Model Compression and Pruning: AI engineers employ model compression and pruning techniques to reduce the size and complexity of large-scale models, making them more accessible and deployable across various platforms.
  5. Monitoring and Maintenance: AI engineers continuously monitor the performance of large-scale models, identifying potential issues and implementing necessary updates and improvements to ensure their ongoing success.

AI engineering is an essential discipline for building and scaling foundation models, providing the necessary expertise and techniques to ensure their robustness, efficiency, and adaptability.

As we continue to push AI boundaries, AI engineering will play a crucial role in unlocking the full potential of foundation models and shaping the future of AI research and development.


In closing, foundation models represent a critical milestone in the advancement of AI, providing a versatile and adaptable approach to solving complex problems across multiple domains. From language and vision to robotics and reasoning, these models are unlocking new possibilities and driving innovation across various industries.

As we continue to explore the full potential of foundation models and their role in the evolution towards AGI, it is crucial to foster responsible and ethical AI development, ensuring these models are used to benefit humanity and address the most pressing challenges of our time. With foundation models as a solid basis, we can accelerate AI research and development, unlocking new frontiers and shaping the future of intelligent systems.

LLMs Papers

  1. GPT-4 Technical Report:
  2. GPT-3: Language Models are Few-Shot Learners:
  3. Toolformer: Language Models Can Teach Themselves to Use Tools:
  4. LLaMA: Open and Efficient Foundation Language Models:
  5. Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages:
  6. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model:

Foundation Models Resources

  1. Reflections on Foundation Models:
  2. On the Opportunities and Risks of Foundation Models:

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓