Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

The Evolution of Tabular Data: From Analysis to AI
Data Analysis   Latest   Machine Learning

The Evolution of Tabular Data: From Analysis to AI

Last Updated on August 16, 2023 by Editorial Team

Author(s): Abid Ali Awan

Originally published on Towards AI.

Discover how tabular data space is being transformed by Kaggle competitions, the open-source community, and Generative AI.

Image by Author

Introduction

Tabular data refers to data organized into rows and columns. It encompasses everything from CSV files and spreadsheets to relational databases. Tabular data has been around for decades and is one of the most common data types used in data analysis and machine learning.

Traditionally, tabular data has been used for simply organizing and reporting information. However, over the past decade, its usage has evolved significantly due to several key factors:

  • Kaggle Competitions: Kaggle emerged in 2010 [1] and popularized data science and machine learning competitions using real-world tabular datasets. This exposed many data scientists and machine learning engineers to the power of analyzing and building models on tabular data.
  • Open-source Contributions: Thanks to major open-source libraries like Pandas, DuckDB, SDV, and Scikit-learn, manipulating, preprocessing, and building predictive models on tabular data is now incredibly easy. Additionally, open-source datasets provide beginners with easy access to practice on real-world datasets.
  • Generative AI: Recent advances in generative AI, especially large language models, now enable the generation of realistic tabular data and make it easy for basically anyone to conduct Data analysis and build machine learning applications.

In the essay, we will discuss each of these factors in more detail and look at examples of how companies and researchers are using tabular data in innovative ways today. The main takeaway will be the importance of analyzing and preparing tabular data in the right way to reap the benefits of machine learning and AI.

This essay is a part of the 2023 Kaggle AI Report, a competition where participants write an essay on one of seven topics. The prompt asks them to describe what the community has learned over the past two years of working and experimenting.

Tabular Data Kaggle Competitions

The Kaggle Competition has profoundly impacted the field of data science and machine learning engineering. Additionally, Tabular competitions have introduced new techniques, tools, and various tabular tasks.

In addition to learning and knowledge development, winning competitions often come with cash prizes, providing further motivation for participation. For example:

  • On average, Kaggle competitions offer prize money of around $21,246 and have approximately 1,498 participating teams.
  • The largest cash prizes have been as high as $125,000, giving the winners a significant incentive to go the extra mile and push the boundaries of what’s possible with tabular data.

Note: We will be using Meta Kaggle dataset for our analysis and code examples. The dataset is under Apache 2.0, and it is updated daily.

import pandas as pd

comptags = pd.read_csv("/kaggle/input/meta-kaggle/CompetitionTags.csv")
tags = pd.read_csv("/kaggle/input/meta-kaggle/Tags.csv")
comps = pd.read_csv("/kaggle/input/meta-kaggle/Competitions.csv")


tabular_competition_ids = comptags.query("TagId == 14101")['CompetitionId']
tabular_competitions = comps.set_index('Id').loc[tabular_competition_ids]
tabular_competitions.describe()[["RewardQuantity","TotalTeams"]]
Code output

Over the last decade, Kaggle has hosted numerous competitions centered around tabular data, with several since 2015 offering cash prizes of up to $100,000 for the winning team.

import plotly.express as px

tabular_competitions["EnabledDate"] = pd.to_datetime(
tabular_competitions["EnabledDate"], format="%m/%d/%Y %H:%M:%S"
)
tabular_competitions["EnabledDate"] = tabular_competitions["EnabledDate"].dt.year
tabular_competitions.sort_values(by="EnabledDate", inplace=True)

fig = px.bar(
tabular_competitions,
x="EnabledDate",
y="RewardQuantity",
title="Reward Distribution of Tabular Competitions over the Years",
labels={"RewardQuantity": "Prize Money($)", "EnabledDate": "Year"},
)

fig.show()
Reward Distribution of Tabular Competitions over the Years

The number of tabular data competitions has grown significantly over this period, with particularly high activity in 2015 and 2022.

fig = px.histogram(
tabular_competitions,
x="EnabledDate",
nbins=20,
title="Number of Tabular Competitions over the Years",
labels={"EnabledDate": "Year"},
)
fig.show()
Number of Tabular Competitions over the Years

Tabular Playground Series

Due to the large demand for tabular data problems, the Kaggle staff started an experiment in 2021 [2] by launching a monthly contest called the Tabular Playground Series. These competitions aimed to provide a consistent platform for competitors to hone their skills on tabular data.

The Tabular Playground Series contests were based on synthetic datasets that replicated the structure of public data or data from previous Kaggle competitions. The synthetic datasets were created using a deep-learning generative network called CTGAN.[3]

  • Exposure: Many machine learning practitioners got their first exposure to working with tabular data through Tabular Playground Series. This helped familiarize them with concepts like data loading, feature engineering, and model tuning.
  • Techniques: Kaggle competitions showcased techniques like feature engineering, data augmentation, and ensemble modeling that are particularly useful for tabular data. Competitors used these techniques to achieve higher scores, setting examples for others.
  • Community: The discussions within Kaggle competitions provided a fertile ground for sharing techniques and ideas on how to best handle tabular data. This helped form a community of practice around tabular data.
  • Democratization: Kaggle competitions have made machine learning on tabular data more accessible to a wider audience, not just data experts. Participants get free access to both CPU and GPU, as well as to large datasets, and anyone is welcome to participate in the competition.
Image from playground-series-s3e18

The Tabular Playground Series is still ongoing, currently in Season 3 with Episode 18. This demonstrates that cash prizes are not the only motivation for participants, as these competitions do not offer any monetary prizes or points systems. Rather, the series caters to data enthusiasts who want to hone their skills by practicing various types of tabular data.

Competition Solutions

Examination of winning solutions has revealed that fancy tools or deep learning models are not necessary to place highly. Even simpler models like linear regression, with careful feature engineering, can win prizes. The key is finding simple yet effective techniques for solving the given problem.

For example, the winner [4] of the GoDaddy — Microbusiness Density Forecasting competition [5] used Linear Regression. This is unsurprising as winning solutions are often based on simple models but involve extensive feature selection, cross-validation, data augmentation, and ensemble techniques.

Image from Farid

Tabular Data Open-source Contributions

Open-source contributions related to tabular data have been invaluable for advancing the field and enabling real-world applications. Contributions fall into two main categories:

  1. Open-source Datasets
  2. Open-source Tools

Open-source Datasets

Kaggle owes its success to the generous contributions of open-source contributors who share real-world tabular datasets for machine learning problems. These datasets, covering various domains and use cases, provide valuable training and benchmarking data for the machine learning community. Numerous companies and organizations have openly contributed their proprietary tabular data to advance the field. The remarkable number and diversity of datasets available on Kaggle have been an essential driver for innovation in working with tabular data.

Kaggle dataset [6] is the go-to place for beginners and experts alike who are looking for specific datasets. Its vast collection of tabular datasets is helping hundreds of community members daily to practice new techniques and handle new types of data.

Image from Kaggle

Open-Source Tools

Several major open-source tools for analyzing, manipulating and modeling tabular data have been made possible by the contributions of developer communities. Tools like Pandas, Numpy, scikit-learn, TensorFlow, XGBoost, and many others have been crucial enablers for working with tabular data at scale. These libraries provide a comprehensive set of functionalities that have made tabular data machine learning accessible to a wide audience. Ongoing community contributions ensure the tools continue to improve and keep pace with new requirements.

Additionally, there are now efficient tools available such as DuckDB and PySpark, that offer a user-friendly yet powerful way to analyze and process large tabular datasets.

%pip install duckdb -q

With DuckDB, you can easily import a CSV file and run SQL queries in just seconds.

import duckdb

duckdb.sql('SELECT * FROM "/kaggle/input/meta-kaggle/Competitions.csv" LIMIT 5')
┌───────┬────────────────┬──────────────────────┬───┬──────────────────────┬──────────┬───────────────────┐
│ Id │ Slug │ Title │ … │ EnableSubmissionMo… │ HostName │ CompetitionTypeId │
│ int64 │ varchar │ varchar │ │ boolean │ varchar │ int64 │
├───────┼────────────────┼──────────────────────┼───┼──────────────────────┼──────────┼───────────────────┤
│ 2408 │ Eurovision2010 │ Forecast Eurovisio… │ … │ false │ NULL │ 1 │
│ 2435 │ hivprogression │ Predict HIV Progre… │ … │ false │ NULL │ 1 │
│ 2438 │ worldcup2010 │ World Cup 2010 - T… │ … │ false │ NULL │ 1 │
│ 2439 │ informs2010 │ INFORMS Data Minin… │ … │ false │ NULL │ 1 │
│ 2442 │ worldcupconf │ World Cup 2010 - C… │ … │ false │ NULL │ 1 │
├───────┴────────────────┴──────────────────────┴───┴──────────────────────┴──────────┴───────────────────┤
│ 5 rows 42 columns (6 shown) │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Perform quick and multiple actions on tabular data using Python Relational API. Its syntax is similar to pandas, making it easy to use.

rel = duckdb.read_csv('/kaggle/input/meta-kaggle/Competitions.csv')
rel.filter("RewardQuantity > 100000").project(
"EnabledDate,RewardQuantity"
).order("RewardQuantity").limit(5)
┌─────────────────────┬────────────────┐
│ EnabledDate │ RewardQuantity │
│ varchar │ double │
├─────────────────────┼────────────────┤
│ 07/25/2019 21:10:14 │ 120000.0 │
│ 11/02/2021 16:00:27 │ 125000.0 │
│ 11/14/2016 08:02:32 │ 150000.0 │
│ 11/22/2021 18:53:57 │ 150000.0 │
│ 05/11/2022 18:46:43 │ 150000.0 │
└─────────────────────┴────────────────┘

Generative AI for Tabular Data

Generative AI is a subfield of artificial intelligence powered by neural networks like Variational Autoencoders and Generative Adversarial Networks (GANs) that can generate photorealistic images, compose original pieces of music, write news articles and stories, and even design objects. It is trained on large datasets, allowing Generative AI models to discover the underlying patterns, structures, and statistical distributions present in the data.

Generative AI models have significantly progressed the field of working with tabular data. Capabilities like data augmentation, anomaly detection and synthetic data generation have helped tackle issues like data scarcity, privacy and bias.

However, recent advances like ChatGPT and other large language models (LLMs) are now also being used as assistants for tabular data tasks. Some of the ways generative AI is transforming our workflows include:

  • Code assistance: LLMs like ChatGPT can help with coding tasks like feature engineering, preprocessing, modeling and evaluating machine learning pipelines for tabular data. They can suggest code snippets, functions and entire scripts.
  • Data understanding: Generative AI can provide insights into the data distributions, correlations, missing values, outliers, target variables and more.
  • Deep analysis: It performs statistical tests, creates visualizations and derives other summary metrics that give practitioners a thorough analysis of tabular datasets to inform modeling decisions.
  • Web scraping: Generative AI tools can help you scrape new tabular data from websites/application, assisting with data acquisition tasks.

While issues like safety, bias, and narrow capability remain, large language models are beginning to transform how data scientists and machine learning engineers work with tabular data on a day-to-day basis. They are increasingly becoming assistants that handle various analytical and coding tasks, freeing up practitioners to focus on higher-level work.

ChatGPT for Tabular Data

ChatGPT [7] has rapidly become an invaluable assistant for nearly every stage of working with tabular data, from helping with data cleaning and feature engineering to generating complex model code, interpreting metrics, producing data analysis reports, and even aiding in synthetic data generation for tasks like data augmentation and anomaly detection.

With ChatGPT, you can easily build and train a machine-learning model by simply typing a detailed prompt. Additionally, you can utilize multiple plugins to automate complex tasks such as running the code and getting internet access.

Check out “A Guide to Using ChatGPT For Data Science Projects”[8] to learn how to use ChatGPT in a real-life end-to-end data science project.

Image by Author

Generative AI Tools for Tabular Data

Generative AI tools, such as PandasAI [9], have made data analysis, dataset cleaning, and data visualization extremely easy for anyone. These tools use large language models like gpt-3.5-turbo [10] to generate insightful results. Additionally, you can also connect with open-source models hosted on Hugging Face to perform AI analysis.

%pip install pandasai -q
from kaggle_secrets import UserSecretsClient
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI

user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("OPENAI_API_KEY")

llm = OpenAI(api_token=secret_value_0)

pandas_ai = PandasAI(llm)

To see the top five competitions with the highest RewardQuantity, we asked ChatGPT to display them by typing a prompt.

pandas_ai.run(tabular_competitions, prompt='Can you provide a list of the top five competitions with the highest RewardQuantity? Please only show the name of the competition,Date,and the corresponding reward.')
Code output

You can even ask it to perform complex tasks or generate visualizations.

pandas_ai.run(tabular_competitions, prompt='Please list all the competition which has "Market" in it.')
Code output

This is just the beginning, as we will see many new AI tools that make data scientists' and developers lives easier by automating tasks and providing assistance.

Conclusion

While significant progress has been made in leveraging tabular data for machine learning and AI applications, we have likely only seen the beginning. In the future, we can expect new powerful tools driven by advanced AI agents [11] that will automate the entire workflow for tabular machine learning tasks — from data ingestion and cleaning to feature engineering, model training, evaluation, and deployment in web applications. With continued advances in generative AI and natural language processing, these agents will be able to take high-level prompts to complete entire tabular data science projects, from data to insights.

This essay highlights the significant impact of Kaggle competitions, open-source communities, and generative AI on our approach to working with tabular data for tasks such as data analysis and machine learning. To delve deeper into the topic, you can read the winning essays from the 2023 Kaggle AI Report competition.[12]

Reference

[1] Wikipedia contributors, “Kaggle,” Wikipedia, Jun. 2023, [Online]. Available: https://en.wikipedia.org/wiki/Kaggle

[2] “Tabular Playground Series — Jan 2021 U+007C Kaggle.” https://www.kaggle.com/competitions/tabular-playground-series-jan-2021

[3] Sdv-Dev, “GitHub — sdv-dev/CTGAN: Conditional GAN for generating synthetic tabular data.,” GitHub. https://github.com/sdv-dev/CTGAN

[4] KAGGLEQRDL, “#1 solution — generalization with linear regression,” Mar. 16, 2023. https://www.kaggle.com/competitions/godaddy-microbusiness-density-forecasting/discussion/395131

[5] “GoDaddy — Microbusiness Density Forecasting U+007C Kaggle,” Dec. 15, 2022. https://www.kaggle.com/competitions/godaddy-microbusiness-density-forecasting/overview

[6] “Find Open Datasets and Machine Learning Projects U+007C Kaggle.” https://www.kaggle.com/datasets

[7] “Introducing ChatGPT,” OpenAI, Nov. 30, 2022. https://openai.com/blog/chatgpt

[8] A. A. Awan, “A Guide to Using ChatGPT For Data Science Projects,” Mar. 2023, [Online]. Available: https://www.datacamp.com/tutorial/chatgpt-data-science-projects

[9] Gventuri, “GitHub — gventuri/pandas-ai: Pandas AI is a Python library that integrates generative artificial intelligence capabilities into Pandas, making dataframes conversational,” GitHub. https://github.com/gventuri/pandas-ai

[10] “GPT-3.5,” OpenAI. https://platform.openai.com/docs/models/gpt-3-5

[11] R. Cotton, “Introduction to AI Agents: Getting Started With Auto-GPT, AgentGPT, and BabyAGI,” May 2023, [Online]. Available: https://www.datacamp.com/tutorial/introduction-to-ai-agents-autogpt-agentgpt-babyagi

[12] “2023 Kaggle AI Report.” May 2023, [Online]. Available: https://www.kaggle.com/competitions/2023-kaggle-ai-report/leaderboard

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

🔥 Recommended Articles 🔥

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101—Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c 🚀🚀🚀 ███ █████ ███████ █████████ ███████████ █████████████ ███████████████ ███████ ███████ ███████ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ Towards AI is looking for contributors! │ │ Join us in creating awesome AI content. │ │ Let's build the future of AI together → │ │ https://towardsai.net/contribute │ │ │ └───────────────────────────────────────────────────────────────────┘ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->