Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

In-Context Learning Explained Like Never Before
Latest   Machine Learning

In-Context Learning Explained Like Never Before

Last Updated on April 14, 2025 by Editorial Team

Author(s): Allohvk

Originally published on Towards AI.

As the Ocean (of knowledge) was stirred to search the elixir of life, something unexpected happened. Beautiful and magical things spontaneously started to emerge during the process…

– From the episode of Samudra Manthan (The churning of the ocean)

Emergere in Latin means “to come forth”. Emergent behaviour, therefore refer to capabilities that are not explicitly built-in, but instead “come forth” spontaneously. This behaviour has been known to occur in complex systems. Imagine a flock of birds. Each individual bird is programmed to follow some simple rules: stick close your neighbors, avoid collisions etc. Yet, amazing patterns are observed when viewing the flock flight as a whole.

Image by Unachicalinda from Pixabay

This collective outcome is an emergent property of the flock behaviour and cannot be predicted by examining any single bird’s actions. Likewise, market prices emerge from numerous individual interactions in an economy. Indeed, life itself is a result of emergent behaviour. At some point in evolution, a certain combination of lifeless atoms gave rise to “life”. In LLMs, emergent behaviour refers to the spontaneous appearance of unexpected capabilities as model size and training data scale, a behaviour neatly documented in Emergent Abilities of Large Language Models.

In-Context Learning (ICL) — A notable emergent behaviour

In particular, we focus on In-context learning (ICL) which was observed in GPT-2 and further confirmed in GPT-3. ICL basically refers to the capability whereby LLMs learn a new task from the training data provided without any fine-tuning. The data, in the form of training examples, is provided as a part of the prompt-context and consists of multiple input-label pairs & are called demonstrations. It appears that the model learns from these demonstrations directly at inference time. Since very few demonstrations suffice for a model to learn, this phenomenon is also called few-shot learning.

ICL was referred to as a surprising ability by Xie et al. It is indeed surprising since we are not fine-tuning the model & hence the model weights are unchanged. Yet, somehow the model performs well at tasks for which it has not been trained/tuned for. In fact, under certain conditions it performs better than LLMs fine-tuned for those tasks. How is this possible? Does the LLM learn something from the demonstrations at inference time & is able to use that information (without changing its weights) to deliver good output?

Consider a simple example with just 2 training samples.

  • “Albert Einstein was German
  • “Mahatma Gandhi was Indian

Let us build a prompt by concatenating these two demonstrations and append it with a test example: “Marie Curie was ”. Let us feed this entire prompt to GPT-3. It is likely that the result would be “Polish” instead of the more common “a great scientist” or “a Nobel prize winner” etc. The LLM is thus able to infer the custom task from the demonstrations — which is to identify the coutry of origin. The surprising bit is that (a) LLMs are NOT explicitly pre-trained to learn from examples and (b) You will never see such prompts (which concatenate multiple independent training data examples) in natural language texts which is primarily used for pre-training.

This is powerful stuff! Imagine you have some custom libraries & code which the LLM has never seen before. You also collect some data samples but just can’t fine-tune as you don’t have GPU’s. Well, You could leverage ICL! Emergent behaviour need not happen with scale alone. Even better-quality data or better prompts could induce emergent behaviour. Indeed, Chain-of-Thought prompting technique raises the level of output of an LLM significantly, inducing emergent behaviour.

No “magic”, please! We are in the 21st century

A paper by Schaeffer et al. tried to take the magic out of emergent behaviour by suggesting that emergent abilities are simply due to the nature of discontinuous metrics commonly used for evaluating behaviour. They suggest that emergent behaviour is not something spontaneous at all. Rather, that behaviour was there all along, hidden out of sight (getting better as models became larger), till at some point, the right output was consistently generated.

For example, consider the task of number additions. For 220+330 the answer 520 or 595 is better than (say) -9250. Instead of a Pass/Fail evaluation, can we give partial credit if the answer is close? For e.g. assume the evaluation system gave partial credit (a) if the sign was accurate (b) more credit if the scale was accurate (c) even more credit for guessing the first digit accurately etc. With such an evaluation in place, they show that there was no spontaneous emergent behavior and all what happened was that the metrics steadily improved over time as models scaled. In other words there was no magic & the behaviour could simply be explained as “models get marginally better & better at tasks as they scale”. Specifically, they state that non-linear or discontinuous metrics produce apparent emergent abilities & suggest using linear metrics which produce predictable changes in LLM behaviour.

Interestingly enough, the results of this paper were not unforeseen by the authors of the Emergent Abilities paper, who acknowledge that certain (discontinuous) metrics “may disguise compounding incremental improvements as emergence” but still go on to say that at best this is a “partial explanation”. To be fair, inventing a new set of metrics to explain an observed behaviour is much easier than predicting a new emerging behaviour itself. Moreover on certain tasks, it has been observed that the LLM performance does remain near-random until a certain threshold, beyond which there is a marked increase in accuracy. Lastly, emergent behaviour is a natural consequence of scale and complexity as empirically observed in the wild. So we should actually be surprised if there were no emergent behaviour in LLMs.

This is an area that will see great churn (excuse the pun) as theories come and go. What magical behaviour will suddenly emerge when model sizes cross (say) 100 trillion? Maybe a model that suddenly gets every coding task right? Maybe a model that can accurately predict a stock price? For sure, its sudden emergence will cause a lot of chaos & disruption! Let us try to understand how ICL happens. Maybe then, we would be better prepared…

Fine-tuning & ICL both provide demonstrations to the LLM. In Fine-tuning, we use these demonstrations as training data and use gradient descent to modify the LLM weights. In ICL, we feed demonstrations to the model via the prompt. The model looks at the demonstrations & learns the patterns & predicts the correct output (without changing its weights). The model learns the pattern on the fly! Wow, ICL is neat in the sense that we don’t have to fine-tune on every task we want. Instead, we just feed the right demonstrations to the model at run-time. How can we explain this?

Is In-Context Learning a Complete-the-Pattern exercise?

To start with, we can view the demonstration examples as a complete the pattern exercise. We have the prompt structured as:

<q1, ans1>, <q2, ans2>, … , <q_n, ans_n>

We now append this with <q_n+1> and feed to the model. The model now leverages its predict the next word goal to complete the pattern. Min et al say that ICL acts as a pattern recognition procedure, rather than as an actual “learning” procedure. They underplay the role of input-label mapping in the demonstrations & claim that the model relies more on the information gained during pre-training to generate outputs. Hmmm.

Maybe a Copy-Paste job? Hello, Induction heads!

Olsson et al. dig a little deeper and finds that transformers have induction heads that refer to abstract patterns in the prompt sequence to help predict the next token. These are different from the regular attention heads that pay attention to different aspects of sentence grammar during training. Induction heads (a) search for token(s) similar to the current token that have occurred in the past sequence (b) copy the token that followed & paste it as the next output token. They hypothesised that as you increase the model size, this behavior becomes more complex — the model becomes capable of not just copying the token but even concepts & latent meanings. An example may help:

  • Simple token copy: Say during training, the model has NOT come across the word: Samudra-Manthan. But assume we use this word plenty of times in our prompt-context. Say our prompt to the LLM is: “Samudra-Manthan was written 1000’s of years ago. The best of the world’s resources were pooled to conduct Samudra-Manthan. Samudra-Manthan means churning of the ocean. During the Samudra-”. When the LLM is asked to generate the next token, the induction heads likely conclude that “Manthan” is likely going to be the next word. They force the probability distribution generated across the tokens in the vocabulary to allocate the highest probability to the token “Manthan” even though this particular combination was never observed during training!
  • More abstract token copy: Say the demonstration examples are for translation of words from English to Kannada. So the prompt-context is something like: {EN: < query1> KA: < translation of Query1>; EN: <query2> KA: < translation f Query2>; EN:< query3>}. Now what is the LLM prediction? The induction heads in the LLM realise that they need to copy the token query3 and paste it as the next token but only after translating it to Kannada. This is just a copy-paste job but involves a small transformation prior to the pasting!
  • Even more abstract token copy: Say the prompt is something like this- {Rat: 1; Duck: 4; Bison: 6; Elephant: 7; Whale: }. Now what could the LLM generate? The induction heads realise that they need to copy Whale but only after translating it to something using some latent concept. What could that latent concept be? Maybe it reasons out that it has to do with size or lifespan and it should return a 10 or something. Or maybe just to tease humans, it may return a 6 with the justification that the latent concept has something to do with the length of the token. Whatever be the output, it is still a copy-paste job but with a transformation thrown in. By giving enough in-context examples, the model is able to guess the latent concept accurately… the final effect being similar to fine-tuning.

Induction heads are implemented by a pair of attention heads in different layers. The first induction head reads the prompt & copies (some) info from every Question-token in the prompt to the Answer-token following it. Another downstream head keeps matching the current token (query) to all the previous token keys in typical attention methodology. Because the keys are shifted-right by one token by the first head, the attention gets focussed on the relevant Answer-token(s) from the past. So together, these heads make the model search the entire prompt for past tokens that are similar to the present token, attend big-time to the token that came next, increasing its softmax probability. Induction heads are so named because they attend to tokens that would be predicted by using induction (from the examples in the prompt).

Or is In-Context Learning due to Nearest-Neighbour search?

This is not a far fetched interpretation. When the temparature hyper-parameter T is set very low, the Attention weights i.e. softmax(1/T * Q.Kt), converge to a one-hot vector. This means the model is attending to the most similar token — using a nearest-neighbor behavior to identify that token. Essentially, this means we are finding the closest match to the query from the demonstrations. This gives us a fresh perspective on ICL — we can view it as implementing a nearest neighbor algorithm over our input-output demonstration pairs, through the mechanics of attention! Now, imagine the model directly tinkering with the attention mechanism to enforce this behaviour naturally… without us setting the temparature parameter.

Something for Bayesian fans too!

Another view is to look at it from a Bayesian lens. The Bayesian inference framework explains how the model sharpens the posterior distribution over concepts based on the prompt given, effectively learning the concept. This occurs despite demonstrations being unnatural sequences that concatenate independent examples which don’t occur in natural language datasets used to pre-train models. Xie et al. explain that in-context learning emerges when language models can infer the “shared latent concept” common to the bunch of demonstration examples and use it to “locate” information acquired during pre-training to generate the output. They also underplay the role of input-label mapping in ICL by showing that ICL is robust to label randomization. They suggest that other aspects of the prompt such as input & output distributions contribute to the final result.

Apparently, In-Context Learning allows the model to learn any function!

Garg et al in Can we train a model to ‘in-context learn’ a certain function class, show that irrespective on what objective the original model was trained on, it can (under certain conditions) acquire a behaviour which makes it possible to in-context learn any (linear) function. So the model can approximate f (x_query) by conditioning on a prompt sequence of x, f(x) examples and the query. So basically, while the model is being pre-trained on an objective like masked attention, it is silently picking up the ability to also do ICL. This is meta-learning, a general paradigm for learning from data.

The model somehow develops this internal learning machinery that can handle learning a much wider range of unseen tasks by searching over an implicit parameter space to optimize some function f which is not the model’s own loss function. In other words, its pre-trained weights somehow have the ability to ensure that the model “trains” from context data at runtime to learn functions that it has never been exposed to during its training. The Induction heads paper by Anthropic folks discussed earlier provides some of the strongest evidences on how ICL may possibly occur. We now shift focus to a paper by Oswald et al who further generalize this and boldly claim that “Transformers Learn In-Context by Gradient Descent”!

Attention as Gradient Descent!

Let us try to understand how they explain ICL. I start by talking of 3–4 smaller concepts (not necessarily from their paper). Later on, we tie up these concepts to understand their explanation of ICL.

  • Concept 1Prompts have the power to change the attention weights & therefore change the model behaviour: LLMs predict the next word based on the probability distribution across all the tokens in the vocabulary. A prompt has the power to change this probability distribution. It can make the model favour certain words more than what it would have done by default. If the prompt is: “I work as a”, the LLM generated next word could be any of the several hundred professions in the world. If the prompt were changed to “I am in the Information Technology sector. I work as a”, then the probability distribution is drastically changed to focus on the dozen or so professions related to computers. The change in prompt forced a change in behaviour even though the model weights itself have not changed. This is the immense power of prompt engineering! Prompts change the activations generated by the attention mechanism and therefore control attention weights which are used to predict the next word! Theoretically, you can engineer a prompt to hijack any LLM’s output!
  • Concept 2 — Imagine fine-tuning an LLM with a set of 100 training data points. We know what happens during gradient descent — the weights are nudged slightly in each iteration towards the direction that produces a better output. Can we view this process of gradient-descent as an attention mechanism? Basically, the model is attending to a set of 100 examples. The attention mechanism decides the contribution of each training sample in nudging these weights. So gradient descent can be expressed as an attention mechanism! (Note: A more accurate statement would be — The forward pass of a linear layer trained by gradient descent can be expressed as an attention operation where the keys and values are training datapoints and the query is generated from the test input. Link.
  • Concept 3 — If gradient descent can be expressed as an attention mechanism can the opposite be true? Can attention be expressed as a gradient descent mechanism? Before we go there, let us get an intuition of how LLMs work specifically w.r.t attention phenomenon. Take a transformer with just one attention layer. We know that the next token prediction is based on attention scores of all tokens prior to (& including) the token in question. You have multiple such attention blocks in an LLM. The attention scores get better & better as we move from block to block! The last block has the best attention scores.

Ok, let us try to put these intuitions together to see where we are going. Let us consider Fine-tuning first:

  • You have the original model weights
  • You have 100 data samples available
  • You use gradient descent to update the model weights.
  • We do gradient descent 50 times & get a good tuned model!

Now, let us say we don’t have the resources to do fine-tuning 🙁

  • We do have the original model weights. We can do inference on it. Basically, this means the data flows thru’ all the transformer blocks in a forward pass till an output is generated.
  • We have 100 data samples available. Let us say we feed this as a prompt to the model hoping for ICL to happen!
  • We have attention operations being performed in each block (on the input data samples in the prompt) and we keep doing that iteratively (block on block) improving attention scores till we get a final output.
  • Let us say there are 50 attention blocks in the model. So the above process is repeated 50 times. The final activations are really good!
Source: https://www.lesswrong.com/posts/HHSuvG2hqAnGT5Wzp/no-convincing-evidence-for-gradient-descent-in-activation

Aren’t the two processes strikingly similar! Isn’t it logical to investigate whether attention forward passes can be explained as being equivalent to gradient descent? Yes, the authors do precisely that. They show that
gradient descent happens via the attention mechanism in the LLM during forward pass. One attention block = 1 gradient descent step with the training data being the in-context examples. They even show that the gradients (tiny nudges taken towards generating the desired output) at each step of the 2 different architectures are numerically comparable to one other.

But hey, aren’t model weights frozen during inference? How can it take tiny nudges towards generating the desired output? This is where a small leap of imagination comes in. In ICL, the nudges are not happening to the weights. It is the attention scores that get nudged towards producing better & better output as the input moves from block to block inside the LLM. Attention scores get better if there are better activations. So the gradients in ICL are not changes to weights but changes to the activations as we move from one block to another. The effect is numerically comparable to a gradient descent step. So it is almost as if the model is training itself via the attention mechanism instead of the regular gradient descent mechanism! Wow! Do read this healthy criticism.

Enough intuitions. We can now summarize Oswald’s paper & one more (Dai et al) that further builds on it as follows — ICL produces the equivalent of meta-gradients in its forward pass. These meta-gradients can be computed by comparing activations between ICL & non-ICL forward passes. These can then be compared with gradients generated during fine tuning. The authors find strong similarities. This is where it appears magical! Imagine the model actually learning weights during pre-training that allow it to do this — basically take any set of examples from any domain that will be fed to it in the future & generate activations that help it converge layer-by-layer to fit to those training examples — while silently obeying the loss function and the original objectives like Masked attention that it is being trained on! Fantastic, isn’t it!

Maybe somewhere amongst the model’s billions of parameters are a small subset of parameters that get activated when encountered with ICL type of input data to produce this kind of behaviour. As of today, deep transformers have been found to match OLS (Ordinary Least Squares) solutions to simple linear problems. As models get deeper, more complex forms of training might emerge. A recent study claims that it is not 1st order gradient descent which is emulated but a 2nd order convergence. Whatever be the case, we now have a possible explanation as to how ICL happens.

But is this the real reason? Just because (under certain conditions) LLMs can learn via forward pass does not mean that they are actually learning that way in reality. The jury is still out on this. Secondly, even if we do find out how ICL happens, it is difficult to explain why a model picks up that ability. We can (for example) explain practically how life originated on earth but explaining why it happened is tricky. Did it happen simply because there was a small mathematical possibility that it could happen? Did our churners in the epic of Samudra-Manthan find out the source of the origin of life 🙂 Now, that is a story for another day…

This is the 7th of a 12-series article titled My LLM diaries.

  1. Quantization in plain English
  2. LoRA & its newer variants explained like never before
  3. In-Context learning: The greatest magic show in the kingdom of LLMs
  4. RAG in plain English — Summary of 1000+ papers
  5. HNSW — Small World, Yes! But how in the world is it Navigable?
  6. VectorDB origins, Vamana & on-disk vector search algorithms
  7. LLMs on the laptop — A peek into the Silicon
  8. Taming LLMs — A study of few popular techniques
  9. Agents in plain English
  10. LLMops in plain English — Operationalizing trained models
  11. Look Ma, LLMs without Prompt Engineering
  12. Taking a step back — On model sentience, conscientiousness & other philosophical aspects

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

🔥 Recommended Articles 🔥

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101—Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c 🚀🚀🚀 ███ █████ ███████ █████████ ███████████ █████████████ ███████████████ ███████ ███████ ███████ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ Towards AI is looking for contributors! │ │ Join us in creating awesome AI content. │ │ Let's build the future of AI together → │ │ https://towardsai.net/contribute │ │ │ └───────────────────────────────────────────────────────────────────┘ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->