Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!


Towards Identification of Breast Cancer in Mammogram Images using Deep Learning: Part 1
Latest   Machine Learning

Towards Identification of Breast Cancer in Mammogram Images using Deep Learning: Part 1

Last Updated on May 1, 2024 by Editorial Team

Author(s): Aminul Huq

Originally published on Towards AI.

Photo by National Cancer Institute on Unsplash

Breast cancer is one of the leading causes of cancer in the entire world. Its awareness campaign might have been noticed by you with the pink ribbons. So, I think you can understand how severe the situation is. Regular screening is necessary to make sure there is no cancerous cell growth in breast tissue. Not only women but men face this issue as well.

In this particular blog, we will start with discussing the problem and why it is so difficult, and finally, we will also explore some popular datasets that are available. This will be turned into a series in which the next part will focus on deep learning approaches to see how much accuracy we can achieve in general.

Most of the time two artifacts are noticed in the breast tissue for the presence of cancer cells. These are mass and calcification. These are the most common ones. Based on the location, size, shape, and other issues these artifacts can either lead to benign or malignant cancer.

One of the reasons that mass and calcification are not easy to identify is that the density of the breasts may not be the same for all. There are generally 4 types of breast density. The figure below gives a nice idea. Generally, breast tissue has mostly fat tissues. However, in some cases, they may also have glandular and fibrous tissue as well which can cover the entire breast region and this can cause problems. It can make finding the abnormality really hard both for humans and computers.

Source : Image collected from blog post.

Another reason is that different machines may capture the mammograms in different ways leading to disparity among the data. That is why we can’t use one dataset from a source as a resource for transfer learning.

In order to develop a machine learning or deep learning model we need a lot of data. However, acquiring data for this particular task is quite hard, and annotating it is also an expensive and difficult task as it can not be done without the expertise of a professional. Additionally, due to the existence of glandular and fibrous tissue, some image processing might need to be done as well.

I will provide a short description of a few datasets here. In the next part of this series, we will use one of the mentioned datasets below and use deep learning models to see how it performs in general.

Let’s start then :

  1. Mini-MIAS [1]

This particular dataset contains 323 labelled mammograms which are of 1024×1024 spatial dimensions. It is partitioned into 7 types of abnormality(mass, calcification, asymmetry, etc) labeled for each image. It also contains information about the degree of abnormality, namely benign and malignant. Additionally, it has labels regarding the density of the breasts as well. Regarding the segmentation mask, this dataset does not have an exact mask. Rather, it has the x, y, and radius of the circle, which encompasses the abnormality.

Link :

2. InBreast [2]

One of the most widely used datasets in this field is the InBreast dataset. This dataset has a higher resolution and the images are taken using a digital machine. Compared to the mini-MIAS dataset, it has slightly more images, which is 410. The resolution is not the same for each image but it is approximately 3000×4000 dimension. It also has labels regarding the mass and calcification of each image. Some of the images contain both. It also quantifies the abnormality category by determining whether each image is malignant or benign. Additionally, it has the exact segmentation mask for both mass and calcification.

Link :

3. CBIS-DDSM [3]:

Among the three datasets that are mentioned here, this particular dataset has the most amount of images which is 2620. It is not a lot considering the amount of data required for deep learning tasks but this is one of the large datasets in this field. It has marked its images as to whether it contains mass and calcification. Similar to others there is the additional label of normal, benign, and malignant abnormality quantification here as well. It also has an appropriate segmentation mask just like the InBreast dataset.

Link :

There are a few other datasets but in the literature, these three have been widely used in different research works.

In summary, we have explored the issue of breast cancer identification using mammography images and the challenges associated with it. We also learned a bit about three popular datasets that have been widely used for this research work. In the future, we will have hands-on experience using one of these datasets and try to solve the classification problem with deep learning tools.

Feel free to contact me to discuss any issue. See you next time!


[1] J. Suckling et al (1994): The Mammographic Image Analysis Society Digital Mammogram Database Exerpta Medica. International Congress Series 1069 pp375–378.

[2] I. C. Moreira et al (2012): Inbreast: toward a full-field digital mammographic database. Academic radiology, 19(2), 236–248.

[3] R. Sawyer-Lee et al (2016): Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM) [Data set]. The Cancer Imaging Archive.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓