Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Some Insights About Phi-4: Microsoft’s New Small Foundation Model that Punches Above its Weight
Artificial Intelligence   Latest   Machine Learning

Some Insights About Phi-4: Microsoft’s New Small Foundation Model that Punches Above its Weight

Last Updated on December 17, 2024 by Editorial Team

Author(s): Jesus Rodriguez

Originally published on Towards AI.

Some Insights About Phi-4: Microsoft’s New Small Foundation Model that Punches Above its Weight

Created Using Midjourney

I recently started an AI-focused educational newsletter, that already has over 170,000 subscribers. TheSequence is a no-BS (meaning no hype, no news, etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers, and concepts. Please give it a try by subscribing below:

TheSequence | Jesus Rodriguez | Substack

The best source to stay up-to-date with the developments in the machine learning, artificial intelligence, and data…

thesequence.substack.com

Microsoft Phi was been credited with starting the small language model(SLM) movement as an alternative to the “intelligence by scale” approach followed by the large AI labs. Released a couple of years ago as part of the famous paper “Textbooks is All You Need”, every release of Phi brings new innovations in terms of data quality and training. Phi-4 is the latest addition to Microsoft’s marquee SLM and it does not disappoint. Today, I would like to dive into some of the details behind Phi-4.

Not so small anymore, Phi-4 is a 14-billion parameter language model that emphasizes the importance of data quality in achieving performance comparable to, or even exceeding, much larger models. It builds on the success of the Phi family of models, which have consistently demonstrated that improvements in data can rival the benefits of scaling model size. The innovations of Phi-4 rely on its unique pre-training, midtrainign and post-training approaches.

Image Credit: Microsoft Research

Pre-Training: A Data-Centric Approach

Phi-4’s pre-training strategy centers around three core pillars:

  • Extensive use of high-quality synthetic data.
  • Careful curation and filtering of organic data.
  • A refined post-training process.

These elements work together to equip the model with strong reasoning and problem-solving abilities.

Leveraging Synthetic Data

Unlike most language models that rely heavily on organic data sources like web content, Phi-4 strategically incorporates synthetic data throughout its training process. The model utilizes a diverse array of synthetic data generation techniques, including multi-agent prompting, self-revision workflows, and instruction reversal. These methods enable the creation of datasets that specifically target reasoning and problem-solving skills, overcoming some limitations inherent in traditional unsupervised datasets.

The rationale behind Phi-4’s emphasis on synthetic data stems from the inherent advantages it offers over organic data:

  • Structured and Gradual Learning: Synthetic data allows for the presentation of challenges in a more digestible and progression-oriented manner, facilitating structured and gradual learning for the model. In contrast, the complex and indirect relationships between tokens in organic datasets make it harder for the model to learn effectively from next-token prediction.
  • Alignment with Inference Contexts: Training on synthetic data helps align the model’s pre-training experience with the scenarios it is likely to encounter during inference. This alignment ensures that the context seen during generation remains consistent with the data distribution the model was trained on.

Phi-4 adheres to four key principles when generating synthetic data:

  • Diversity: The data must comprehensively cover various subtopics and skills within each domain, ensuring a broad and balanced representation of knowledge.
  • Nuance and Complexity: To effectively challenge the model and facilitate learning, the synthetic data must go beyond basic examples and incorporate nuanced, non-trivial cases that reflect the inherent complexity of the domain.
  • Accuracy: The generated data must maintain a high level of accuracy. For example, code must execute correctly, mathematical proofs must be valid, and explanations should be factually correct.
  • Chain-of-Thought Reasoning: The data should encourage systematic reasoning by demonstrating different problem-solving approaches in a step-by-step manner, promoting the generation of coherent outputs for complex tasks.

Phi-4’s synthetic datasets are created from high-quality seeds sourced from diverse domains. These seeds serve as the foundation for generating exercises, discussions, and reasoning tasks specifically tailored to the model’s training objectives.

Generation of bogus questions
Consider the following trivia question:
# Question
{{ question }}
# Instructions
Your job is to turn this problem into a nonsensical one, for which the
↪ answer is invalid or unlikely to be known by anyone. For example, you
↪ might change the name from a well-known figure to a random name, or
↪ change the date from a well-known event to a random date, or the place
to a different one. For example, you might change "When did Amelia
↪ Earhart cross the Atlantic Ocean?" to "When did Edgar Greenwood cross
↪ the Atlantic Ocean?" or "How many times did Amelia Earhart cross the
↪ English Channel?".
Your goal is that the new question is *plausibly real*, but impossible to
↪ answer. You should not make the question obviously fake, silly, or
↪ fictional; for example, all country names should be real countries,
↪ and no names should be obvious homages to the original question. It
↪ should sound like a serious trivia question.
24
You may start with a very brief discussion, then end with two markdown
↪ sections:
- The section '# Response' that contains the question.
- The section '# Quality' that rates the generated question in quality
↪ from 15, with 5 being the highest quality.
A high quality question is (1) different from the given question and
↪ (2) plausible

Phi-4 employs a multi-pronged approach to seed curation:

  • Web and Code-Based Seeds: Excerpts and snippets demonstrating high complexity, reasoning depth, and educational value are extracted from web pages, books, and code repositories. A two-stage filtering process ensures quality: first, identifying pages with strong educational potential and, second, segmenting selected pages into passages and scoring them for factual and reasoning content.
  • Question Datasets: Questions collected from various websites, forums, and Q&A platforms are filtered to balance difficulty levels. This is achieved through a plurality-based technique where multiple independent answers are generated for each question, and majority voting is used to assess the consistency of responses. Questions with unanimous answers (too easy) or entirely inconsistent answers (too difficult or ambiguous) are discarded.
  • Extracting Question-Answer Pairs from Diverse Sources: Language models are leveraged to extract question-answer pairs from sources like books, scientific papers, and code, focusing on identifying deduction chains and logical progressions in the text. This technique goes beyond simply finding explicit Q&A pairs and instead aims to uncover the underlying reasoning processes within the text.

Once the seeds are curated, they are transformed into synthetic data through multi-step prompting workflows. These workflows include:

  • Rewrite and Augment: Seeds are rewritten into exercises, discussions, or structured reasoning tasks to enhance their educational value for the model.
  • Self-revision: The initial model responses are iteratively refined through a feedback loop where the model critiques and improves its own outputs, guided by rubrics focused on reasoning and factual accuracy.
  • Instruction Reversal: This technique, particularly beneficial for code generation and other specific tasks, involves reversing existing instructions. For instance, code snippets are used to generate corresponding problem descriptions or task prompts, resulting in data pairs where the instruction precedes the code.

Curation and Filtering of Organic Data

While synthetic data constitutes a significant portion of Phi-4’s training data, organic data is not completely omitted. High-quality organic data sources are carefully curated and filtered, prioritizing reasoning-dense and nuanced materials such as academic papers, educational forums, and programming tutorials. This curated organic data serves two primary purposes:

  1. Directly used in pre-training as a complementary dataset.
  2. Used as seeds for specialized synthetic data generation pipelines.

Phi-4’s approach to organic data curation emphasizes quality and relevance, with a focus on selecting content that can enhance the model’s reasoning and knowledge base.

Key considerations in organic data curation and filtering include:

  • Targeted Acquisitions: Inclusion of major repositories of reasoning-dense documents that are publicly accessible and permissible for use, such as arXiv, PubMed Central, and GitHub. Licensed books are also incorporated to ensure comprehensiveness, recency, and cleanliness.
  • Filtering Web Dumps: To capture the vast amount of information available on the web, a small fraction of the highest-quality documents are selected from bulk web dumps using small classifiers trained on LLM-generated annotations.
  • Multilingual Data: To ensure the model can handle a wide range of languages, high-quality multilingual documents from CommonCrawl and Wikipedia are included. A language identification model is used to categorize documents into 176 languages, and the same classifiers used for filtering web dumps are applied to filter for quality.
  • Custom Extraction and Cleaning Pipelines: Customized heuristics and parsers are developed for each targeted data source to ensure cleanliness and uniformity across heterogeneous organic data sources. This involves building custom pipelines to handle various file formats and developing a custom HTML-to-text extractor for general web data.

Data Mixture and Midtraining

The pre-training process for Phi-4 involves a carefully designed data mixture that balances synthetic and organic data sources. The final data mixture allocates 30% of the training tokens to web and web rewrites data sources, 40% to synthetic data, 20% to code data, and 10% to targeted acquired sources like academic data and books.

Following the initial pre-training phase, Phi-4 undergoes a midtraining stage where the context length is increased from 4K to 16K. This stage focuses on further refining the model’s long-context understanding and reasoning abilities. The data mixture for midtraining prioritizes inherently long-context data sources, including carefully selected subsets of academic, book, and code data, as well as newly created synthetic datasets that meet the longer sequence requirements.

Post-Training: Refining the Model for Practical Applications

While pre-training lays the foundation for Phi-4’s capabilities, the post-training process is crucial for transforming the model into a safe and effective AI assistant for users. This process involves:

  • Supervised Fine-Tuning (SFT): The pretrained model is fine-tuned using carefully curated user prompts and high-quality responses from various domains, including math, coding, reasoning, conversation, model identity, and safety.
  • Direct Preference Optimization (DPO): Two rounds of DPO are employed to align the model with human preferences and steer it away from unwanted behavior. The first round utilizes a novel technique called Pivotal Token Search (PTS) to generate DPO pairs that specifically target pivotal tokens, which are identified as having a significant impact on the overall correctness of the solution. The second round, referred to as judge-guided DPO, gathers preference data by comparing model-generated responses against those from GPT-4 and using GPT-4 as a judge to label the preferred response based on criteria like accuracy, style, and detail.
  • Hallucination Mitigation: Specific SFT data and DPO pairs are generated to mitigate the model’s tendency to hallucinate, encouraging it to refuse to answer rather than fabricate information when it does not know the answer.

This multi-stage post-training process aims to refine Phi-4’s performance across various benchmarks while addressing potential safety and ethical concerns.

Image Credit: Microsoft Research

Benchmarking and Performance

Phi-4 has been evaluated on a variety of standard benchmarks, including MMLU, GPQA, MATH, HumanEval, MGSM, SimpleQA, DROP, MMLUPro, HumanEval+, ArenaHard, IFEval, and PhiBench, an internal benchmark developed specifically for evaluating the diverse skills and reasoning abilities deemed critical for Phi-4’s development.

The results demonstrate that Phi-4 achieves strong performance relative to its size, particularly on reasoning-focused benchmarks, often exceeding the performance of much larger models. For instance, it outperforms its teacher model, GPT-4o, on the GPQA and MATH benchmarks. This exceptional performance is attributed to the improved data quality, training curriculum, and innovations in the post-training scheme.

Image Credit: Microsoft Research

Addressing Limitations and Future Directions

Despite its impressive capabilities, Phi-4 does have limitations, primarily stemming from its relatively small size. These limitations include:

  • Factual Hallucinations: While mitigated through targeted post-training techniques, the model can still exhibit factual hallucinations, particularly around less common knowledge.
  • Instruction Following: Phi-4 is less proficient at strictly following detailed instructions, particularly those with specific formatting requirements, as its training focused more on Q&A and reasoning tasks.
  • Occasional Reasoning Errors: Even on reasoning tasks, the model can make mistakes, highlighting the need for further refinement of its reasoning abilities.

Addressing these limitations and enhancing Phi-4’s capabilities will require further research and development in areas such as:

  • Continued Data Refinement: The quality and diversity of the training data play a crucial role in the model’s performance. Continued efforts to curate and generate high-quality synthetic and organic data will be essential for further improving Phi-4’s abilities.
  • Exploring Novel Architectures: While Phi-4 utilizes a standard transformer architecture, exploring novel architectures or modifications could potentially lead to improvements in performance and efficiency.
  • Addressing Ethical Concerns: As with any powerful AI system, it is crucial to continuously evaluate and address potential ethical concerns related to bias, fairness, and the potential for misuse.

Overall, Phi-4 represents a significant step forward in demonstrating the power of data quality in achieving high performance in smaller language models. Its innovative approach to pre-training and post-training, with a strong emphasis on synthetic data generation and careful organic data curation, provides valuable insights for the future development of efficient and capable language models.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

🔥 Recommended Articles 🔥

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101—Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c 🚀🚀🚀 ███ █████ ███████ █████████ ███████████ █████████████ ███████████████ ███████ ███████ ███████ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ Towards AI is looking for contributors! │ │ Join us in creating awesome AI content. │ │ Let's build the future of AI together → │ │ https://towardsai.net/contribute │ │ │ └───────────────────────────────────────────────────────────────────┘ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->