Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

GANs for Synthetic Data Generation
Latest

GANs for Synthetic Data Generation

Last Updated on January 6, 2023 by Editorial Team

Last Updated on August 2, 2022 by Editorial Team

Author(s): Varatharajah Vaseekaran

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

A practical guide to generating synthetic data using open-sourced GAN implementations

The advancements in technology have paved the way for generating millions of gigabytes of real-world data in a single minute, which would be great for any organization or individual in utilizing the data. However, a large amount of time and resources would be consumed in cleaning, processing, and extracting vital information from the mounds of data.

The answer to handling such a problem is by generating synthetic data.

Photo by Vackground on Unsplash.

Contents

  1. What is Synthetic Data?
  2. A Brief Introduction to GANs
  3. Mode Collapse
  4. Wasserstein GAN (WGAN)
  5. Implementing a GAN for Synthetic Data Generation
  6. The Dataset
  7. Designing and Training the Synthesizer
  8. Final Words
  9. References

What is Synthetic Data?

Photo generated using DALL-E by author.

The definition for synthetic data is quite straightforward: artificially generated data that mimics real-world data. Organizations and individuals can leverage the use of synthetic data to their needs and would be able to generate data, according to their specifications, as much as they require.

The use of synthetic data is highly beneficial in preserving privacy in information-sensitive domains: the medical data of the patients and transactional details of banking customers are a few examples where synthetic data can be used to mask the real data, which would enable sharing of sensitive data among organizations.

Few well-labeled data can be used to generate a large amount of synthetic data, which would fast-track the time and energy needed to process the massive real-world data.

There are many ways of generating synthetic data: SMOTE, ADASYN, Variational AutoEncoders, and Generative Adversarial Networks are a few techniques for synthetic data generation.

This article will focus on using Generative Adversarial Networks to generate synthetic data and a practical demonstration of generating synthetic data using open-sourced libraries.

A Brief Introduction to GANs

Generating photorealistic faces using GANs based on StyleGAN3 research. Image from [1].

Many machine learning and deep learning architectures are prone to adversarial manipulation, that is, the models fail when data that is different to the one that is used to train is fed. To solve the adversarial problem, Generative Adversarial Networks (GANs) were introduced by Ian Goodfellow [2], and currently, GANs are very popular in generating synthetic data.

A typical GAN consists of two components: generator and discriminator, where both networks compete with each other.

The generator is the heart of the GAN, where it attempts to generate fake data that looks real by learning the features from the real data.

The discriminator evaluates the generated data with the real data and classifies whether the generated data looks real or not, and provides feedback to the generator to improve its data generation.

The goal of the generator is to generate data that can trick the discriminator.

A Vanilla GAN architecture. Image from [3].

Mode Collapse

Mode collapse is a common problem that GAN-based architectures face during adversarial training, where the generator repeatedly generates one specific type of data. This occurs when the generator identifies that it can fool the discriminator with one type of data, the generator would keep on generating that same data.

This problem can easily go undetected, as the metrics would indicate the model training is running smoothly, but the generated results would indicate otherwise.

An example of mode collapse in image-based GANs. Image from [4].

Wasserstein GAN (WGAN)

The main problem in a standard GAN is the difference in complexity of the outputs from the generator and the discriminator.

A standard Vanilla GAN uses the Binary Cross Entropy (BCE)loss function [5] to evaluate whether the generated data looks real, where the output of the loss function is between 0 and 1. The task of the generator is to generate synthetic data that might have a lot of features and values, and the output from the discriminator is not sufficient for the generator to learn, and due to the lack of guidance, the generator can easily fall into mode collapse.

WGAN [6] alleviates the problem by replacing the discriminator with a critic, where the critic would evaluate the distribution of the real data with the distribution of the generated data and outputs a score of how real the generated data looks when compared to the real data. The Wasserstein loss function utilized in WGAN measures the difference between the real distribution and the generated distribution based on the Earth Mover’s Distance.

Visualization of Earth Mover’s Distance. Image from Coursera course [8]

Earth Mover’s Distance measures the effort that is needed to make the distribution of the generated data look similar to the real data’s distribution. Therefore, there is no limitation on the value that is output. That is, if both distributions are far apart, the Earth Mover’s Distance will give a real positive value, whereas the BCE loss would output gradient values that are closer to zero. Therefore, the Wasserstein loss function enables solving the vanishing gradient problem during training.

The expression of Wasserstein Loss. Image from Coursera course [8].

The above picture denotes the equation for the Wasserstein loss, which is relatively simple compared to the BCE loss. The initial part of the equation is the expected value of the prediction that the critic provides on the real data. The second part of the equation is the expected value of the prediction that the critic provides on the generated data. The goal of the critic is to maximize the distance between the real and generated data, and the goal of the generator is to minimize that difference.

Difference between BCE (left) and Wasserstein (right) losses. Image from Coursera course [8].

WGANs are prone to exploding gradient problems, as the Wasserstein loss outputs any value that is positive and real; therefore, the value of the gradients can uncontrollably increase when the distribution of the generated data differs from the real data’s distribution. This is solved by introducing a regularisation term, gradient penalty [7], which makes sure the gradients are contained, and this ensures better optimization of the WGAN model.

To learn in-depth about GANs, the Coursera specialization on GANs, by deeplearning.ai and Sharon Zhou, is highly recommended.

Implementing a GAN for Synthetic Data Generation

The ydata-synthetic library helps immensely in building GANs to generate synthetic data for tabular datasets, which otherwise would have been challenging and tedious.

There are a plethora of different types of GANs that can be used to generate synthetic data: the standard Vanilla GAN and Conditional GAN, and the advanced WGAN, WGAN with gradient penalty, Deep Regret Analytic GAN, CramerGAN, and Conditional WGAN, and a GAN option available for time series as well (TimeGAN).

The models can be used out-of-the-box with little modifications and can be used on almost any tabular dataset.

The library can be installed in the python environment with just one simple command:

pip install ydata-synthetic
The GitHub repository of ydata-synthetic. Screenshot captured by the author from ydata-synthetic.

The Dataset

The Diabetes Health Indicators dataset is chosen, which is CC0 licensed. The dataset has sensitive data, which are medical records of patients, and there is a need to obtain more data regarding the patients diagnosed with diabetes; therefore, the use of synthetic data would be highly beneficial. The dataset consists of 21 features, and the target feature is whether the patient is diagnosed with diabetes.

Photo by Towfiqu barbhuiya on Unsplash.

Another open-sourced tool, pandas-profiling, is useful for exploratory data analysis, visualizing features and relationships with just two lines of code, and that tool is used to conduct the exploratory data analysis on the Diabetes Health Indicators dataset. This generates a detailed report on all the variables present in the dataset, alerts for any abnormalities present in the data, displays the relationship (or correlation) between variables, and shows the missing values in each column and the duplicates present in the data.

The GitHub repository for pandas-profiling. Screenshot captured by the author from pandas-profiling.

There are more than 218,000 examples of non-diabetes patients and more than 35,000 patients diagnosed with diabetes.

The number of patients with and without diabetes. Image from the author.

All the variables are of float type, as the dataset has undergone pre-processing. With the help of the pandas-profiling library, it was discovered that there are 3 numerical variables (BMI, MentHlth, and PhysHlth), and the other variables are categorical. The pandas-profiling tool highlighted that about 4% of the dataset comprises duplicates.

A general overview of the data. Image from the author.

The pandas-profiling tool also highlighted highly correlated relationships between certain variables, which can be considered later for feature engineering tasks.

Abnormalities detected by the pandas-profiling library. Image from the author.

There are options to view the relationship between variables, and as can be seen in the above diagram, the pandas-profiling tool has detected six numerical variables and has plotted scatter plots depicting the relationship between the variables.

Viewing interactions between the numerical variables. GIF from the author.

The tool is also highly useful in conducting statistical correlation tests between variables, as it provides options to conduct numerical correlation tests as well as categorical correlation tests.

Different correlation tests are available in the pandas-profiling library. GIF from the author.

The pandas-profiling tool indicates that there are no missing values in the dataset.

No missing values are present in the data. Image from the author.

Since 4.5% of the data consists of duplicates, it is possible to check the exact duplicate rows and how many times such rows are repeated.

The duplicates that are present in the data.

The entire pandas-profiling report can be viewed here.

Designing and Training the Synthesizer

Initially, the numerical and categorical columns are separated, as the type of variables is necessary to train the GAN model. Based on the data analysis that was done earlier, there are three numerical variables, and the rest are categorical variables.

To begin the process of generating synthetic data, the labels of the patients are separated based on their diabetic status. At first, a GAN is trained to generate synthetic data for patients who are diabetic.

The next step is to select the GAN model, and as discussed earlier, the Wasserstein GAN with Gradient Penalty is chosen. It is quite easy to initialize the GAN using the ydata-synthetic library.

Once the GAN is initialized, the training process is initiated.

The training time depends on the machine that is used for training. With an NVIDIA GPU, the training is much faster compared to training on the CPU. However, the library is also optimized to train well on the CPU.

Once the training is completed, the next step is to generate synthetic data. The GAN model is trained to understand the distribution of the data of diabetic patients, and 100,000 rows of data representing the diabetic patients are generated.

Compared to the 35,000 rows of diabetic patients that are present in the original dataset, using a GAN-based model, 100,000 rows of synthetic data of the diabetic patients are available. And another advantage of using ydata-synthetic is that the synthetic data is returned in the form of the input data, with all the columns intact.

The steps for training the GAN on the majority class are similar to the previous steps taken to generate data for the minority class. The number of epochs to train the GAN for the non-diabetic patients’ data is set to 100 to reduce the training time.

A simple code function is used to merge the synthetic samples of the majority and the minority classes.

The pandas-profiling tool is used to obtain a quick exploratory data analysis of the synthetic data.

The distributions of the numerical columns from the real dataset and the generated dataset are evaluated.

Considering the BMI feature, the synthetic data has an acceptable representation of the real dataset. The range of values of the synthetic data is within the range of values of the real data. The mean of the synthetic data has shifted to the right, but overall, the BMI column of the synthetic data is comparable with the real data.

Comparison of BMI column with real and synthetic data. Image by the author.

Comparing the “MentHlth” and “PhysHlth” can be a bit tricky. According to the definition of the columns based on the data, these two columns represent the number of days of poor mental health in the past 30 days and the number of days of a physical injury in the past 30 days, respectively. Therefore, a lot of values in these columns are zero, and the data is numerical, as the number of days is considered.

The synthetic data has generated negative values for the two columns, and this requires some feature processing before moving on to modeling, as the negative values would be converted to zeros.

When considering the feature “MentHlth”, the synthetic data has generated an acceptable distribution, as all the generated values are under 30. The mean of the synthetic data has shifted to the left.

Comparison of MentHlth column with real and synthetic data. Image by the author.

However, the mean of the synthetic data for the column “PhysHlth” has shifted significantly to the right. But the generated values are less than 30, and replacing the negative values with zero might slightly improve the generated results.

Comparison of PhysHlth column with real and synthetic data. Image by the author.

The full pandas-profiling HTML report for the synthetic data can be viewed here, and the repository for the entire workings can be found here.

Final Words

The paradigm of AI is being transformed from model-centric to data-centric approaches, and the usage of synthetic data accelerates that transformation.

Synthetic data provides a low-cost and privacy-secured alternative to collecting and labeling real-world data, and anyone can utilize open-sourced powerful tools to generate data for their specific use cases.

There are many open-sourced tools available for generating quality synthetic data, and as discussed in this article, it is relatively easy to generate synthetic data by leveraging the power of GANs with the help of ydata-synthetic.

I hope you have learned how simple it is to generate synthetic data for tabular datasets, and looking forward to seeing how you will use these powerful tools to play with and create synthetic data. Cheers!

References

[1] T. Karras, S. Laine, and T. Aila, “A Style-Based Generator Architecture for Generative Adversarial Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2020, doi: 10.1109/tpami.2020.2970919.

[2] I. J. Goodfellow et al., “Generative Adversarial Networks,” arXiv.org, 2014. https://arxiv.org/abs/1406.2661

[3] G. H. de Rosa and J. P. Papa, “A survey on text generation using generative adversarial networks,” Pattern Recognition, vol. 119, p. 108098, Nov. 2021, doi: 10.1016/j.patcog.2021.108098.

[4] M. Pasini, “10 Lessons I Learned Training GANs for a Year,” Medium, Jul. 28, 2019. https://towardsdatascience.com/10-lessons-i-learned-training-generative-adversarial-networks-gans-for-a-year-c9071159628

[5] U. R. Dr A, “Binary cross entropy with deep learning technique for Image classification,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 9, no. 4, pp. 5393–5397, Aug. 2020, doi: 10.30534/ijatcse/2020/175942020.

[6] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” arXiv.org, 2017. https://arxiv.org/abs/1701.07875

[7] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved Training of Wasserstein GANs,” arXiv:1704.00028 [cs, stat], Dec. 2017, [Online]. Available: https://arxiv.org/abs/1704.00028

[8] “Build Basic Generative Adversarial Networks (GANs),” Coursera. https://www.coursera.org/learn/build-basic-generative-adversarial-networks-gans?specialization=generative-adversarial-networks-gans (accessed Jul. 28, 2022).


GANs for Synthetic Data Generation was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

🔥 Recommended Articles 🔥

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101—Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c 🚀🚀🚀 ███ █████ ███████ █████████ ███████████ █████████████ ███████████████ ███████ ███████ ███████ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ Towards AI is looking for contributors! │ │ Join us in creating awesome AI content. │ │ Let's build the future of AI together → │ │ https://towardsai.net/contribute │ │ │ └───────────────────────────────────────────────────────────────────┘ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->