Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Choosing the Best Embedding Model For Your RAG Pipeline
Latest   Machine Learning

Choosing the Best Embedding Model For Your RAG Pipeline

Author(s): Nilesh Raghuvanshi

Originally published on Towards AI.

Improving Retrieval Augmented Generation (RAG) Systematically

Choosing the right option — AI generated image

Introduction

Through my experience building an extractive question-answering system using Google’s QANet and BERT back in 2018, I quickly realized the significant impact that high-quality retrieval has on the overall performance of the system. With the advent of generative models (LLMs), the importance of effective retrieval has only grown. Generative models are prone to “hallucination”, meaning they can produce incorrect or misleading information if they lack the correct context or are fed noisy data.

Simply put, the retrieval component (the “R” in RAG) is the backbone of Retrieval Augmented Generation. However, it is also one of the most challenging aspects to get right. Achieving high-quality retrieval requires constant iteration and refinement.

To improve your retrieval, it’s essential to focus on the individual components within your retrieval pipeline. Moreover, having a clear methodology for evaluating their performance — both individually and as part of the larger system — is key to driving improvements.

This series is not intended to be an exhaustive guide on improving RAG-based applications, but rather a reflection on key insights I’ve gained, such as the importance of iterative evaluation and the role of high-quality retrieval, while working on real-world projects. I hope these insights resonate with you and provide valuable perspectives for your own RAG endeavors.

Case Study: Code Generation for SimTalk

The project aimed to generate code for a proprietary programming language called SimTalk. SimTalk is the scripting language used in Siemens’ Tecnomatix Plant Simulation software, a tool designed for modeling, simulating, and optimizing manufacturing systems and processes. By utilizing SimTalk, users can customize and extend the behavior of standard simulation objects, enabling the creation of more realistic and complex system models.

Since SimTalk is unfamiliar to LLMs due to its proprietary nature and limited training data, the out-of-the-box code generation quality is quite poor compared to more popular programming languages like Python, which have extensive publicly available datasets and broader community support. However, when provided with the right context through a well-augmented prompt — such as including relevant code examples, detailed descriptions of SimTalk functions, and explanations of expected behavior — the generated code quality becomes acceptable and useful, even if not perfect. This significantly enhances user productivity, which aligns well with our business objectives.

Our only knowledge source is high-quality documentation of SimTalk, consisting of approximately 10,000 pages, covering detailed explanations of language syntax, functions, use cases, and best practices, along with some code snippets. This comprehensive documentation serves as the foundational knowledge base for code generation by providing the LLM with the necessary context to understand and generate SimTalk code.

There are several critical components in our pipeline, each designed to provide the LLM with precise context. For instance, we use query rewriting techniques such as expansion, relaxation, and segmentation, and extract metadata from queries to dynamically build filters for more targeted searches. Instead of diving into all these specific components — such as query rewriting, metadata extraction, and dynamic filtering — I will focus on the general aspects that are applicable to any RAG-based project. In this series, we will cover

  • How to evaluate the performance of multiple embedding models on your custom domain data?
  • How to fine-tune an embedding model on your custom domain data?
  • How to evaluate the retrieval pipeline?
  • How to evaluate the generation pipeline?

In general, the goal is to make data-driven decisions based on evaluation results, such as precision, recall, and relevance metrics, to optimize your RAG applications, rather than relying on intuition or assumptions.

Evaluating Embedding Models for Domain-Specific Retrieval

Embedding models are a critical component of any RAG application today, as they enable semantic search, which involves understanding the meaning behind user queries to find the most relevant information. This is valuable in the context of RAG because it ensures that the generative model has access to high-quality, contextually appropriate information. However, not all applications require semantic search — full-text search can often be sufficient or at least a good starting point. Establishing a solid baseline with full-text search is often a practical first step in improving retrieval.

The embedding model landscape is as dynamic and competitive as the LLM space, with numerous options from a wide range of vendors. Key differentiators among these models include embedding dimensions, maximum token limit, model size, memory requirements, model architecture, fine-tuning capabilities, multilingual support, and task-specific optimization. Here, we will focus on enterprise-friendly choices like Azure OpenAI, AWS Bedrock, and open-source models from Hugging Face 🤗. It is essential to evaluate and identify the most suitable embedding model for your application in order to optimize accuracy, latency, storage, memory, and cost.

To effectively evaluate and compare the performance of multiple embedding models, it is necessary to establish a benchmarking dataset. If such a dataset is not readily available, a scalable solution is to use LLMs to create one based on your domain-specific data. For example, LLMs can generate a variety of realistic queries and corresponding relevant content by using existing domain-specific documents as input, which can then be used as a benchmarking dataset.

Generating a Synthetic Dataset Based on Domain-Specific Data

Generating a synthetic dataset presented a unique challenge, especially with the goal of keeping costs low. We aimed to create a diverse and effective dataset using a practical, resource-efficient approach. To achieve this, we used quantized small language models (SLMs) running locally on desktop with a consumer-grade GPU. We wanted a certain level of variety and randomness in the dataset and did not want to spend excessive time selecting the ‘right’ LLM. Therefore, we decided to use a diverse set of SLMs, including Phi, Gemma, Mistral, Llama, Qwen, and DeepSeek. Additionally, we used a mix of code and language-specific models.

Since we wanted the solution to be general-purpose, we developed a custom implementation that allows potential users to specify a list of LLMs they wish to use (including those provided by Azure OpenAI and AWS Bedrock). Users can also provide a custom system prompt tailored to their specific needs. This flexibility makes the solution adaptable to a wide range of use cases. We also extracted our domain-specific data (SimTalk documentation) into JSON format, enriched with useful metadata. The availability of rich metadata provided the flexibility to filter specific sections of the SimTalk documentation for quick tests.

For each context chunk from the domain-specific dataset, the LLM was tasked with generating a question or query that could be answered based on that context. The system prompt was relatively simple but required iterative adjustments, such as adding domain-specific terminology and refining the structure of the prompt, to better capture the nuances of our application needs and improve the quality of the generated questions. The implementation ensured that each LLM in the list had an equal chance of being selected for generation, with tasks processed in parallel to improve efficiency.

In the end, we managed to generate about 13,000 samples in less than 30 minutes. While reviewing a few samples, we noticed that they were not perfect — some lacked specificity, while others contained minor inaccuracies — but they provided a solid starting point. These issues could be addressed in future iterations by refining the prompt further, adding more domain-specific details, and using feedback loops to enhance the quality of generated queries.

Distribution of LLMs used for synthetic dataset generation (Image by author)

This dataset provides approximately thirteen thousand examples of potential queries. Each query is paired with its corresponding top-ranked context chunk, which we expect our retrieval system to fetch. However, it is important to note that this dataset is limited, as it only includes single chunks as context. This limitation affects our ability to comprehensively evaluate the retrieval system, particularly in real-world scenarios where some queries require answers spanning multiple chunks. For example, in a code generation scenario, generating a complete piece of code might require information from multiple sections of the documentation, such as syntax definitions, function descriptions, and best practices, which are spread across multiple chunks. As a result, the evaluation may not fully capture the system’s performance on more complex queries needing information from several sources. Nevertheless, despite this limitation, the dataset is a solid starting point for understanding the system’s capabilities, and its simplicity makes it easier to iterate and improve upon in future evaluations.

Evaluating Embedding Models on Your Dataset

Next, we aimed to evaluate the performance of multiple embedding models on this dataset to determine which one performs best for the domain-specific data. To achieve this, we developed a multi-embedding model loader capable of interacting with any embedding model. The loader caches the embeddings to avoid redundant computations and speed up evaluation. It also supports batching and normalizing for specific dimensions in models that support Matryoshka Representation Learning (MRL).

The main script allows users to specify a list of embedding models to evaluate, a test dataset (20% of the dataset we created), and a list of cutoff (k) values for metrics such as Precision, Recall, NDCG, MRR, and MAP. These metrics and cutoff values help comprehensively assess different aspects of model performance. We used pytrec_eval’s RelevanceEvaluator to calculate these metrics at multiple cutoff values (1, 3, 5, 10).

For each embedding model, we first generate the document and query embeddings (if they are not already available in the cache), and then calculate similarities. We then calculate each metric at the specified cutoff values and log the results for comparison. Finally, we visualize the results for each metric at each cutoff and generate an easy-to-read report using an LLM to provide insights, such as identifying the top-performing models and optimal cutoff.

Below is a brief explanation of the metrics used for evaluation.

NDCG (Normalized Discounted Cumulative Gain) evaluates the quality of a ranked list of items by considering both their relevance and their positions in the ranking. A higher NDCG score indicates better ranking performance.

MRR (Mean Reciprocal Rank) measures the position of the first relevant item in a ranked list of results, with higher ranks leading to a higher MRR score.

MAP (Mean Average Precision) is calculated as the mean of the Average Precision (AP) scores for each query. It is particularly useful for comparing performance across multiple queries, especially when there are varied relevance judgments. It provides a single-figure measure of quality across multiple queries, considering both precision and recall.

Recall is the proportion of relevant items that were retrieved out of the total number of relevant items available. For example, if there are 10 relevant documents and 8 are retrieved, the recall is 0.8.

Precision is the proportion of retrieved documents that are relevant out of the total number retrieved. There is often a trade-off between precision and recall, where optimizing one may lead to a decrease in the other. For example, increasing recall may result in retrieving more irrelevant items, thereby reducing precision. Understanding this trade-off is crucial for determining which metric to prioritize based on the application’s needs. For example, if 10 documents are retrieved and 7 are relevant, the precision is 0.7.

NDCG@k (Image by author)
MRR@k (Image by author)
MAP@k (Image by author)
Recall@k (Image by author)

For our domain-specific dataset, azure/text-embedding-3-large (3072 dimensions) emerged as the best performer across all metrics, with azure/text-embedding-3-small (1536 dimensions) and huggingface/BAAI/bge-large-en-v1.5 (1024 dimensions) showing similar performance. Interestingly, cohere.embed-english-v3 (1024 dimensions) performed the worst on this dataset.

So, where do we go from here? You can either choose the top-performing model and focus on optimizing other components of your retrieval pipeline, or you could continue exploring to identify the most suitable model for your needs.

We chose to explore further, so in the next article we will see how we fine-tuned an embedding model on our domain-specific data.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

🔥 Recommended Articles 🔥

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101—Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c 🚀🚀🚀 ███ █████ ███████ █████████ ███████████ █████████████ ███████████████ ███████ ███████ ███████ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ Towards AI is looking for contributors! │ │ Join us in creating awesome AI content. │ │ Let's build the future of AI together → │ │ https://towardsai.net/contribute │ │ │ └───────────────────────────────────────────────────────────────────┘ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->