Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take the GenAI Test: 25 Questions, 6 Topics. Free from Activeloop & Towards AI

Publication

Understanding Convolution
Artificial Intelligence   Data Science   Latest   Machine Learning

Understanding Convolution

Last Updated on September 29, 2024 by Editorial Team

Author(s): Ayo Akinkugbe

Originally published on Towards AI.

Photo by David Becker on Unsplash

To better understand what convolution is, it is needful to know why dense neural networks (DNN) don’t work well for images. If you trained a DNN and a CNN (convolutional neural network), you are bound to get higher accuracy and lower loss on the CNN model compared to the DNN model. Here are some reasons why:

1. High Dimensionality and Computational Complexity

Images typically have a large number of pixels. For example, a 200×200 image has 40,000 pixels, and a dense neural network would need to treat each pixel as an independent input. A fully connected layer with 40,000 inputs would require an enormous number of connections to the next layer, leading to:

  • High memory usage: Storing the weights for every pixel connection in large images becomes impractical.
  • Increased computational cost: Processing becomes slow and inefficient because dense layers don’t take advantage of the spatial structure of images.

In contrast, convolutional layers in CNNs use small filters that share weights across the image, drastically reducing the number of parameters and making computations more efficient.

2. Loss of Spatial Hierarchy

DNNs treat all pixels as independent features, ignoring the fact that neighboring pixels in an image are closely related. This means that in a DNN:

  • Spatial relationships are not considered: Dense layers don’t account for spatial patterns like edges, textures, or shapes that are present in nearby pixels. Images have local features (e.g. eyes, corners of objects) that need to be preserved.
  • No translation invariance: Dense networks struggle to recognize patterns like an object in an image if it appears in different positions. Convolutional layers, on the other hand, apply filters across the entire image, making them good at recognizing objects regardless of their location.

3. Inefficient Feature Learning

In DNNs, each layer needs to learn global patterns from scratch. This makes it difficult to detect complex hierarchical features in images, such as edges in earlier layers and entire objects in later layers.

In contrast, CNNs can learn hierarchical features. Early layers in a CNN focus on low-level features (like edges and textures), while deeper layers learn more abstract concepts (like parts of objects or even whole objects). Dense layers do not efficiently capture this hierarchical structure, leading to poor performance on image data.

4. Overfitting

With a large number of parameters in fully connected layers, a dense network is more prone to overfitting, especially with smaller datasets. Images usually contain a lot of redundant information, and fully connected networks have no mechanism to reduce this redundancy. Convolutional layers reduce overfitting through the concept of weight sharing (the same filter slides over different parts of the image). This greatly reduces the number of parameters, leading to more generalizable models with less risk of overfitting.

How Then Does Convolution Works?

Imagine sweeping a magnifying glass across an image to detect specific patterns (like lines or shapes). Convolution in CNNs can be thought of as a way to capture patterns in data by sliding a small magnifying glass (filter) across an image or other data to focus on specific local features. Each filter looks for a different kind of pattern, and the CNN uses many of them to understand the image, layer by layer, from simple features to complex ones.

  • Filter as a Pattern Detector: Imagine you have an image of a cat. A filter (or kernel) in a CNN is a small matrix (e.g., 3×3 or 5×5) that scans across this image. Each filter looks for a specific feature like edges, textures, or shapes. For example, one filter might detect horizontal lines, another might detect vertical lines, and yet another could find corners.
  • Sliding Across the Image: The filter moves over the image (convolves) in small steps. At each step, it performs a dot product between the values in the filter and the corresponding region of the image. This helps the CNN extract local information about the image (such as edges or texture patterns) without looking at the entire image at once.
  • Feature Map: The result of this sliding process is a new matrix called a feature map. The values in the feature map represent how strongly the feature (pattern) the filter is looking for is present in different parts of the image. For example, if the filter is detecting vertical edges, the feature map will have high values where vertical edges appear in the image.
  • Multiple Filters, Rich Features: A CNN uses many different filters to capture various features. Early layers typically learn simple features like edges, while deeper layers learn more complex patterns (e.g., eyes, faces, or even abstract shapes).
  • Receptive Field: The filter’s size limits how much of the image it β€œsees” at once, which is called its receptive field. As you go deeper in the network, the filters β€œsee” larger parts of the image, which allows the network to detect higher-level features, like objects or parts of objects.

Conclusion

Convolution improves image prediction as it uses filters to reduce parameter complexity in training while considering spatial hierarchy. These unique properties convolution offers make CNNs deliver better accuracy and lower loss when used on image data.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓