Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

YOLOV5(m): Implementation From Scratch With PyTorch
Latest   Machine Learning

YOLOV5(m): Implementation From Scratch With PyTorch

Last Updated on July 25, 2023 by Editorial Team

Author(s): Alessandro Mondin

Originally published on Towards AI.

FLIR dataset

The prerequisites to understand this article are a good understanding of PyTorch and a basic comprehension of YOLO architectures.

U+26A0️ Since You Only Live Once, think twice before implementing a YOLO algorithm from scratch. It might hurt your mental health.

Important assumptions:

  • You should not read this article if you are looking for an in-depth explanation of Ultralytics’ autoanchor algorithm since I am giving a high-level description.
  • I am writing this article on December 2022, and the version of YOLOv5 that I am going to describe is YOLOv5 v6.0. In fact, YOLOv5 shouldn’t be considered an algorithm but an object detection and segmentation repository that is continuously updated and improved (the algorithm, the augmentations, loss functions, etc.).

Among all the files present in my YOLOV5 GitHub repo, in this article, I am focusing on model.py. Let’s start!

Note: as of today the indentation of long Gists is broken, if you face this problem click on “view raw”!

Considering that YOLOv5 isn’t published in any Arxiv paper (as of December 2022), the information available online is scattered and often not up-to-date with the latest releases.

After a lot of research, the best visualization that I’ve found is the following:

Source image updated by me (I’ve highlighted the backbone, the neck, and the heads)

Firstly, what arises from this visualization is that, as with all YOLO algorithms, the architecture can be conceptually separated into three chunks:

  1. The backbone, whose task is to extract deep-level features (feature-extractor).
  2. The neck combines information from layers of different depths (feature aggregator).
  3. The heads are responsible for the predictions.

Secondly, the image shows that most of the architecture is composed of the repetition of a few convolutional blocks:

  • C3
  • Bottlenecks
  • SPPF

“But where can I find the details on each layer?”

After some research, I identified two ways:

  1. Read Ultralytics .yaml files and, with the help of the visualization above, attempt to translate the blocks into PyTorch classes.
  2. Export the model to .onnx format and visualize it on Netron.

The first approach seemed too complex and mechanical since .yaml files are understandable once you’ve already comprehended YOLOv5 architecture. Therefore I opted to use Netron.

I exported YOLOv5 to .onnx format with Ultralytics export.py, and then I uploaded the the .onnx file on Netron, which is a tool that “translates” the model architecture into an easy-to-follow visualization.

Here below, YOLOv5(m) input and output layers visualized with Netron.

Input layer (screenshot from Netron)
Output layer (screenshot from Netron)

At this point, I started translating the visualization into PyTorch.

The first step was creating a CBL class that is used throughout the whole architecture:

CBL = Convolution + Barch Norm + SiLU activation. (screenshot from Netron)

But why SiLU? The answer is experiments. They started with ReLU, then LeakyReLU, then ….. SiLU.

Backbone

Source image modified by me

As shown in the images, the backbone is composed of the following CNN blocks:

  • ConvBNSiLU (the CBL defined above)
  • BottleNeck 1
  • C3
  • SPPF

BottleNeck 1 is used to build the C3 block (it’s embedded), so let’s start with it.

Bottleneck 1 is a residual block with two skipped connections that, besides improving gradient back-propagation, comes with a width_multiple parameter. This parameter, usually referred to as the expansion factor, controls how much we want to squeeze and re-expand the number of channels between c1 and c2 . This logic of squeezing & expanding channels was initially introduced by this paper.

Once define the Bottleneck, I’ve created the C3 class which is the bulk of YOLOv5 backbone.

Source image cropped by me

The C3 block is a simplified variant of CSP blocks, which was created to increase the diversity of back-propagated gradients.

A comparison of gradient composition during back-propagation. The source image is taken from this paper

In fact, the gradients flowing backward with a CSPNet-style block are less duplicated compared to the ones of a DenseNet (image above) , which translates into better results.

As shown in the top-right of the architecture you there are two kind of bottlenecks: “bottleneck 1” is used in the backbone while “bottleneck 2” is used in the neck. The parameter “backbone” rules this behavior.

I invite you to check the code above. Anyway, the aspects that need to be highlighted are:

  • The presence of the width_multiple (expansion factor).
  • A depth parameter that controls the number of sequential repetitions of the Bottleneck blocks.
  • The output of the sequence of “Bottleneck 1” blocks and the output of the skipped connection are concatenated channels-wise (along dim=1), and the resulting tensor is fed into the last convolution c_out (check the forward method).

The last CNN block used to build the backbone is the SPPF (Spatial Pyramid Pooling Fast) which is a faster variant of the SPP layer introduced in 2015.

Source image cropped by me

Although the original purpose of an SPP was to create a fixed-size vector from any input size, after testing Ultralytics SPPF, I noticed that this behavior couldn’t be replicated (others noticed it too). However, as the YOLOv5 creator replied in the link here above, SPPF is mathematically identical to the original SPP layer but faster.

Therefore, I summed up that in YOLO architectures, SPP layers are not used for their initial purpose (fixed-size output), but instead, they’re used to exploit their functionality of feature re-aggregators achieved through sequences of max pooling and concat.

Since we have completed all the distinct blocks that compose the backbone we can just put them together to create the YOLOV5 backbone.

The backbone is part of our yolov5 class

Neck

Yolov5 neck is PANet style (Path Aggregation Network).

Source image modified by me

PANet stands for “Path Aggregation Network” and its core functionality is to enhance the flow of information between the lower layers and topmost features via route connections, marked in the image above with “R”. If you look at the image above, in fact you can notice:

  • Features from different depths of the backbone that are combined (concatenated) into the neck (the two “R” in the left).
  • Features from the beginning of the neck are concatenated with features in the last part of the neck that precede the heads (the four “R” in the middle)

In terms of CNN blocks, the neck is composed by these operations:

  • ConvBNSiLU
  • BottleNeck 2
  • C3
  • Upsample
  • Concat

As you remember, in the backbone, inside the C3 blocks, we were using “bottleneck 1” which is characterized by the skipped connection. The “bottleneck 2”, on the other hand, it’s just a sequence of two convolutions (lines 31 and 32 below).

The C3 class is the same used in the backbone previously, with the difference that here we are setting the C3's parameter backbone=False.

So by putting the pieces together, we create the neck.

While Ultralytics YOLOv5 uses nn.Upsample and nn.Concat (to facilitate versioning via .yaml files), in my implementation, I’ve used torch.nn.functional, and therefore, upsampling and concatenation are performed in the forward method.

Source image modified by me

In this image of the YOLOv5 architecture, I marked the position (idx) of the distinct blocks inside self.backbone and self.neck.

In my YOLOV5 forward() shown below, you can notice that:

  • If a backbone layer is a C3 and the idx is 4 or 6, we append the output to a list used to store backbone route connections.
  • If a neck layer has idx 0 or 2, we store the output to a list containing neck route connections, then we resize the tensor (same as Upsample), and concatenate it with the last element of the backbone route connections (that is deleted with .pop() ).
  • If a neck layer has idx 4 or 6, the output layer is concatenated with the last element of the neck route connections (that is deleted with .pop() ).
  • Lastly, If a neck layer is C3 and the idx is greater than 2, we store the output tensor to a list that will be then fed to the model heads to perform a prediction.

The last step is to define YOLOv5 heads.

Heads

If you’re not already familiar with YOLO heads and you’re already struggling to understand what explained up to now, I encourage you to take a break because heads are by far the most complex part of the architecture

NB: all the (red) dimensions assume an input image of size (3, 640, 640). Source image modified by me

To understand YOLO(v5) heads, we need to clarify these key concepts:

  1. Grid-cells
  2. Detection layers
  3. Predictions per scale
  4. Anchor boxes
  5. From grid cells to bounding boxes

1. Grid-cells

YOLO algorithms provide the localization of objects through coordinates expressed w.r.t. the center of a grid-cell.

Photo by Jijo Varghese on Pexels

Remind: in YOLO algorithms each grid-cell can detect at most one object.

The image above, for example, contains a 7 x 10 grid-cells (hxw), and the one responsible for detecting the panther is highlighted in green.

Each grid-cell is a vector of 5 + num_classes values which contains the following information:

  1. Objectness_score which represents the likelihood of the presence of (the centre of) an object within the grid-cell.
  2. Xc that represents the horizontal distance between the origin of the grid-cell (top-left) and the center of the object.
  3. Yc represents the vertical distance between the origin of the grid-cell (top-left) and the center of the object.
  4. Width is measured w.r.t the width of the grid-cell: considering the panther above, width is ~6.5 (details later in 2)Detection layers ) .
  5. Height is measured w.r.t the height of the grid-cell: considering the panther above, height is ~ 2.5 (details later in 2)Detection layers ) .
  6. Classes are a vector of len = len(n_classes) where the values represent the likelihood of an object belonging to each class.

NB: the Objectness score is crucial in YOLO algorithms. For example, in the image above, among the 70 grid_cells, only the one highlighted with green has an objectness_score > confidence_threshold, which indicates the possible presence of an object (we enforce this behavior during YOLOv5 training).

2. Detection layers

len(yolo_output) == num_detection_layers

In YOLO algorithms, a detection layer is a synonym for the head. YOLOv5 default architecture uses 3 detection layers (first image of this chapter) and each one specializes in detecting objects of a given size. Precisely:

  • the head 1 (80 x 80 grid cells) is suitable for detecting small objects
  • the head 2 (40 x 40 grid cells) is suitable for detecting medium-sized objects
  • the head 3 (20 x 20 grid cells) is suitable for detecting large objects
Photo by Andrea Lightfoot on Unsplash

The reason why each layer is suitable for detecting certain object-sizes is related to grid-cells: for example, considering that each cell can detect at most one object, the upper grid could detect at most 4 sheep, while the second one up to 64 sheep.

Another crucial aspect is that for each detection layer, the outputted grid is composed of grid-cells that represent a certain width and height of the original image:

  1. The first detection layer outputs a grid where each grid cell represents a width and a height of 8 pixels of the original input image (80×8 = 640).
  2. The second detection layer outputs a grid where each grid_cell represents a width and a height of 16 pixels of the original input image (40×16 = 640).
  3. The third detection layer outputs a grid where each grid_cell represents a width and a height of 32 pixels of the original input image (20×32 = 640).

80×80, 40×40 and 20×20 is the size of grid_cell assuming an input image of [3, 640, 640]

N.B: the width and height represented by grid cells are scale-invariant: if the input size increases, in turn, the dimension of the output grid increases, but the pixels expressed by each grid_cell are always going to be 8, 16, and 32.

In YOLOv5, [8, 16, 32] are referred as the “strides” of the model.

3. Predictions per scale

Each detection layer ~ head is, in turn, specialised to detect different sub-scales (aspect-ratios), by default 3.

Source image modified by me

For example, the first detection-layer shown here will specialize at detecting 3 different sub-scales of small objects, such as

  1. small rectangular-horizontal objects
  2. small squared objects
  3. small rectangular-vertical objects
Photo by Tyler Lastovich on Pexels

NB: These three aspect-ratios (rect-vertical, squared and rect-horiz) are just an example. They depend entirely on the dataset and, in YOLOv5, are identified through the “autoanchor algorithm”. Details in 4) Anchors.

To summarise what explained up to this point:

The whole image represents one detection-layer / head. Illustration created by me

Therefore, by feeding a [3, 640, 640] image to a default YOLOV5m (3 detection-layers and 3 predictions-per-scale), the amount of grid-cells predicted is:

tot_grid_cells = (3 x 80 x 80) + (3 x 40 x 40) + (3 x 20 x 20) = 25200

4. Anchor Boxes

An anchor-box is a couple of 2 integers that indicate width and height expressed in pixels. In some cases, you can find it expressed with floats [0, 1] and represent W_box/default_image_size and H_box/default_image_size

(10, 13) means 10 pixels of width and 13 pixels of height

Anchor boxes (also referred to as anchors) are obtained by running K-Means clustering across the object labels, and the resulting K-centroids represent the main aspect-ratios (W, H) of the objects of the target datasets. Since a default YOLOv5 architecture has 3 detection layers and 3 predictions per scale, the standard number of anchors is 9.

Practical example. Consider:

– a dataset composed by 1000 images and 1 object per image.

-1000 object labels in YOLO format (class_idx, Xc, Yc, W, H)

– a default YOLOv5 model (3 heads and 3 scale-predictions per head) → 9 anchors

We subset all the object labels and we consider only [W, H] because we are interested only on the aspect-ratios of the objects and not on their location within images. On our [1000, 2] dataset, we run K-Means with n_centroids=9. The resulting 9 centroids are 9 couples of (W, H) that represent the average width and height of each scale.

Besides K-Means, Ultralytics adopts further techniques to compute anchors through the autoanchor algorithm. Glen Josher, YOLOv5 creator, explains that autoanchor “uses kmeans centroids as initial conditions for a Genetic Evolution (GE) algorithm. The GE algorithm will evolve all anchors for 1000 generations under default settings, using CIoU loss (same regression loss used during training) combined with Best Possible Recall (BPR) as its fitness function.” U+1FAE0

Despite describing autoanchor is not in the scope of this article, if you run it before training your YOLOv5 on a custom dataset, you’ll make sure that the default MS COCO anchors (in the image below) correctly suit the objects present in your dataset. If MS COCO anchor don’t suit your objects, autoanchor will suggest you an alternative set of anchors.

Last but not least, in YOLOv5 anchors are not updated during training. As explained here, they noticed empirically that making anchors learnable parameters didn’t improve the results.

* rect-horizontal, *squared and *rect-vertical are just examples! In reality the aspect-ratios depend on the dataset are are determined by the autoanchor algorithm (K-Means + GE). Illustration created by me

As Aladdin Persson brightly says:

Using Anchor boxes is way to encode previous knowledge into the model before starting the training process

In other words, before the training process, we are already telling the model which is the (average) aspect-ratios of the objects that it has to detect. The great advantage of using anchor-boxes is that they tremendously reduce the amount of training data necessary to obtain good detections.

Good! We have defined what anchors are and how they are computed. But how do we incorporate them with the output of the model?

We achieve it by using the formulas explained in this last chapter that serves two purposes.

5) From grid cells to bounding boxes

To complete our YOLOv5, we need to solve these last two problems:

  1. Integrate the output of the model with anchors
  2. Transform the output coordinates from being w.r.t. grid cells to being w.r.t. the image (bounding boxes).

These two tasks, despite being related to different problems, are commonly grouped together and are solved through the formulas shown below:

Green rectangle: the transformation of the output coordinates from being w.r.t. grid cells to being w.r.t. the image. Orange rectangle: integration of the output of the model with anchors by ensuring that f(0)=1 (explained later). Illustration created by me

As we said in the grid cells chapter: YOLOv5 predicts 25200 grid_cells when fed with a (3, 640, 640) image. Each grid_cell is a vector composed by (5 + num_classes) values where the 5 values are [objectness_score, Xc, Yc, W, H].

Bw and Bh are responsible for integrating the anchors with YOLO outputs by ensuring that when the model’s outputs W(=tw) and H(=th) are equal to 0, then default anchors, Pw and Ph are used (if an anchor is (10, 13), pw=10 and ph=13). This property is referred to as f(0)=1.

Secondly, Bx and By are responsible for translating the output of the model from being w.r.t. grid_cells to bounding boxes. Initially, by “sigmoiding the outputs X(=tx) and Y(=ty) between 0 and 1 (coord w.r.t. grid_cells) and then by adding the grids Cx and Cy that translate the grid_cell coordinated to image coordinates.

And why YOLOv5 introduces a new formula?

According to Glen Josher, the main reason was to remove the “exponential” from the equation, which created too many downsides. In addition to that, they wanted to maintain the f(0)=1, which guarantees that for an output equal to 0, default anchors are used. That’s it!

Sooo, we have mentioned everything about YOLOv5 heads!

To summarise:

  • YOLOv5 has, by default 3 detection layers specialized in detecting objects of different dimensions.
  • Each detection layer makes, by default, three scale predictions, and each of these scale predictions is specialized in detecting objects of a specific aspect-ratio.
  • Each scale-prediction predicts a grid of grid-cells.
  • Each grid cells is composed by 5 + n_classes values which are [objectness_score, Xc, Yc, W, H, prob_class_1, .. , prob_class_n]
  • The output of the model finally uses the equations explained above to incorporate anchors and to transform the output from grid_cells coordinates to bounding boxes.

Example: after feeding a (3, 640, 640) input-image to a YOLOv5 model created to detect 80 classes, the final output is a list output whose lenght is equals to len(n_detection layers). Each element of output is going to be:

output[0].shape = (batch_size, 3, 80, 80, 85)

output[1].shape = (batch_size, 3, 40, 40, 85)

output[2].shape = (batch_size, 3, 20, 20, 85)

Where 3 is n_predictions_per_scale. [80, 80], [40, 40], [20, 20] are the dimensions of the “grids of grid_cells”. 85, result of 5 + n_classes, are the values present in each grid_cell.

Let’s finally implement it with PyTorch!

To seek explainability, I’ve divided the model prediction and a function responsible for incorporating the model’s output with the above formula.

This is the YOLOv5 head class, and the forward() takes as input the output list of the YOLOv5 neck and, for each tensor-element of the list, performs a scale prediction. The details that are worth to be mentioned are:

  • In line 12, register_buffer() is used to store the parameters of the model “which should be saved and restored in the state_dict, but are not trained by the optimizer”. Source here.
  • Anchors are strided. In other words, for each detection layer, anchors are divided by the corresponding stride: (10, 13) is stored as (10/8, 13/8) →(1.25, 1.625)
  • The CH parameters are a list that contains the number of out_channels of each element of YOLOv5 neck’s output.
  • Lastly, in line 23, the reason why (5+num_classes) is set as the third dimension to be then permuted as dimension is just “legacy from YOLOv3,” as explained by Glen Josher here.

Lastly, we incorporate anchors in the output, and we transform grid cells into bounding boxes:

Design is taken from cells_to_bboxes by Aladdin Persson

In my repository, this function comes with more functionalities used for experimenting. Here to seek explainability, I’m showing only the key one used for model inference.

This function takes as input the output of the YOLOv5 (heads) and performs a series of operations:

  1. create the Cx and Cy grid
  2. creates an anchor grid.
  3. Sigmoids the whole YOLOv5 output and then applies the formula at lines 14 and 15.
  4. Lastly filters the vector of classes by selecting the class that has the maximum likelihood (we select it by the index, and the value is discarded).

Let’s go in order.

  1. XY_GRIDS (CX and CY)

Cx and Cy are staked together, and they are expressed by the variable xy_grid, which created a grid that looks like this one:

For seek of simplicity, this grid is 10×10, but as we said previously, the YOLOv5 produces 3 grids (80×80), (40×40) and (20×20). Illustration created by me
x_grid = torch.arange(nx)
x_grid = x_grid.repeat(ny).reshape(ny, nx)
y_grid = torch.arange(ny).unsqueeze(0)
y_grid = y_grid.T.repeat(1, nx).reshape(ny, nx)
xy_grid = torch.stack([x_grid, y_grid], dim=-1)
xy_grid = xy_grid.expand(1, naxs, ny, nx, 2)

For example, given:

  • model strides of [8, 16, 32]
  • a YOLOv5 output that is a list of len(n_detection_layer)
  • output[0] of shape (bs, predictions_x_scale, 10, 10, 85) where 10 and 10 are the dimensions of the grid of grid cells
  • 1 object is detected in (3, 2), as in the image above and its values are [obj_score=0.55, xc=0,2, yc=0.1, w=1.22, h=1.36, …n_class_prob]

By summing 3 + 0.2 (grid x coord and xc) and 2 + 0.1 (grid y coord and yc) we get (3.2, 2.1) and by multiplying it by the width of the grid cells (same as strides[0]) we get (3.2, 2.1)*8=(25.6, 16.8) which is the final center of our bounding box!

In reality, you have to compute the coordinates with YOLOv5 formulas explained previously, but the logic is the exact same.

  1. ANCHOR GRID
Illustration created by me
anchor_grid = (anchors[i]*stride).reshape((1, naxs, 1, 1, 2)).expand(1, naxs, ny, nx, 2)

The Anchor grid makes sure that for each detection layer, the W and H of the grid_cells ( [obj_score=0.55, xc=0,2, yc=0.1, w=1.22, h=1.36, …n_class_prob]) are multiplied for their respective anchor boxes. For example, as shown in the image, all the w and h of the grid cells in the first grid (darker one) are multiplied by 30 and 61.

Notice that since anchors are expressed relative to the width of cells (stride), we need to scale them back by multiplying them with the respective stride.

Lastly, in lines 14 and 15, we put everything together by incorporating the output tensor with the equation, and then, with the help of xy_grid and anchor_grid, we transform the cell prediction into bounding boxes!

xy = (2 * (layer_prediction[..., 0:2]) + grid[i] - 0.5) * stride
wh = ((2*layer_prediction[..., 2:4])**2) * anchor_grid[i]

That’s it, and we have built everything needed in order to perform detection with YOLOv5!

After loading Ultralytics official weights on the architecture, the magic happens! U+1F680

Photo by Daniel Semenov on Pexels

Thanks a lot for reading! U+1F64F

Please let me know in the comments if you find any mistakes, and I will correct them!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

🔥 Recommended Articles 🔥

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101—Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c 🚀🚀🚀 ███ █████ ███████ █████████ ███████████ █████████████ ███████████████ ███████ ███████ ███████ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ Towards AI is looking for contributors! │ │ Join us in creating awesome AI content. │ │ Let's build the future of AI together → │ │ https://towardsai.net/contribute │ │ │ └───────────────────────────────────────────────────────────────────┘ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->