Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!


Stacking Ensemble Method for Brain Tumor Classification: Performance Analysis
Latest   Machine Learning

Stacking Ensemble Method for Brain Tumor Classification: Performance Analysis

Last Updated on May 13, 2024 by Editorial Team

Author(s): Cristian Rodríguez

Originally published on Towards AI.

Photo by National Cancer Institute on Unsplash

This article delves into medical image analysis, specifically focusing on the classification of brain tumors. It introduces a novel approach that combines the power of stacking ensemble machine learning with sophisticated image feature extraction techniques. Through comparative evaluations, insights are provided into the effectiveness and potential applications of the proposed approach in medical imaging and diagnosis.


A brain tumor, also known as an intracranial tumor, is an abnormal tissue mass in which cells grow and multiply uncontrollably, seemingly unchecked by the mechanisms that control normal cells. To date, more than 150 types of brain tumors have been detected; however, they can be grouped into two main groups: primary and metastatic [1].

The incidence of brain tumors has been increasing in all ages in recent decades. Metastatic tumors of the brain affect nearly one in four patients with cancer, or an estimated 150,000 people a year.

There are various techniques used to obtain information about tumors. Magnetic resonance imaging (MRI) is the most used method, producing many 2D images. The detection and classification of brain tumors generated by manual procedures is costly in both effort and time. Therefore, it is worthwhile to develop an automatic detection and classification procedure to obtain an early diagnosis and thus have a faster treatment response to improve patients’ survival rate [2].

Stacking Ensemble Method

An ensemble method is a machine learning technique that combines several base models to produce one optimal predictive model. By combining the output of different models, ensemble modeling helps to build a consensus on the meaning of the data. In the case of classification, multiple models are consolidated into a single prediction using a frequency-based voting system. Ensemble models can be generated using a single algorithm with numerous variations, known as a homogeneous ensemble, or by using different techniques, known as a heterogeneous ensemble [3].

As shown in Figure 1, the stacking method aims to train several different weak learners and combine them by training a meta-model to output predictions based on the multiple predictions returned by these weak models [4].

Figure 1. Stacking Model Representation Diagram. [4]


The dataset comes from Kaggle [5], which contains a database of 3206 brain MRI images. The images are separated into four categories: no tumor, glioma tumor, meningioma tumor, and pituitary tumor. Figure 2 shows a sample image for each category.

Figure 2. Sample Images for Each Category. [Image by Author]

Image Features Extraction

Image preprocessing was necessary to obtain the final dataset to train the models. Machines store images in a matrix of numbers, the size of which depends on the number of pixels in any given image. The pixel values denote the intensity or brightness; smaller numbers represent black, and more significant numbers represent white. For grayscale images, as in this case, the matrices are two-dimensional.

After obtaining the pixel matrices, five first-order and seven second-order features were obtained for each image. For the first-order features, fundamental statistical analysis was implemented in the pixel’s matrices:

  • Mean: is the average or the most common value in the pixel’s matrix.
  • Variance: measures the average degree to which each point differs from the mean.
  • Standard Deviation: looks at how spread out a group of numbers is from the mean.
  • Skewness: measures the lack of symmetry.
  • Kurtosis: defines how heavily the tails of a distribution differ from the tails of a normal distribution.

The grey-level co-occurrence matrix (GLCM) was used to obtain the second-order characteristics. GLCM is a matrix representing the relative frequencies of a pair of grey levels present at a certain distance apart and a particle angle. In this case, one pixel of distance and angles of 0°, 45°, 90°, and 135° were used. Figure 3 shows how the GLCM is determined.

Figure 3. GLCM Calculation Example. [6]

The second-order features obtained from the greycomatrix are the next ones:

  • Contrast: represents the difference in luminance across the image.
  • Entropy: the measure of randomness.
  • Dissimilarity: is a numerical measure of how different two data objects are.
  • Homogeneity: expresses how similar some aspects of the image are.
  • ASM: a measure of the textural uniformity of an image.
  • Energy: the rate of change in the brightness of the pixels over local areas.
  • Correlation: gives information about how correlated a pixel is to its neighboring pixels.
Figure 4. General Overview of the Image Features Extraction. [Image by Author]

Model Proposal

As mentioned, stacking runs multiple models simultaneously on the data and combines those results to produce a final model. The previously mentioned can be schematically illustrated in Figure 5.

Figure 5. Stacking Model Implementation Example. [7]

The general idea of how the model works is as follows [7]:

  1. Initial training data has 2565 observations and 12 features.
  2. Three different weak learner models are trained on the training data.
  3. Each weak learner provides predictions for the outcome, which are then cast into second-level training data, now 2565 x 3.
  4. A meta-model is trained on this second-level training data to produce the final predictions.
  5. The three weak learner models used for this implementation were k-nearest neighbors, decision trees, and naive Bayes. For the meta-model, k-nearest neighbors were used again.

K-Nearest Neighbors

The KNN algorithm assumes that similar things exist in proximity, so it classifies new data points based on their position to nearby data points.

In Figure 6, the data points have been classified into two classes, and a new data point with an unknown class is added to the plot. Using the KNN algorithm, the category of the new data point can be predicted based on its position in the existing data points. For example, if k is set to 3, the outcome of selecting the three nearest data points returns two class B and one class A, so the prediction for the new data point will be class B. On the other hand, if k is set to 6, the prediction will be class A. The chosen number of neighbors identified is crucial in determining the results [3].

Figure 6. Visual Explanation of Classification Using KNN Algorithm. [8]

Decision Trees

Decision trees are transparent and easy to interpret. Classification trees predict categorical outcomes using numeric and categorical variables as input. The trees start with a root node that acts as a starting point, followed by splits that produce branches. The branches then link to leaves, which form decision points, and this process is repeated using the data points collected in each new leaf. The final categorization is produced when a leaf no longer generates new branches, resulting in a terminal node [3]. The figure below shows a decision tree that predicts how to make the journey to work.

Figure 7. Visual Explanation of Classification Using Decision Trees Algorithm. [8]

Naive Bayes

A Naive Bayes classifier is a probabilistic machine learning model for classification tasks. This algorithm is based on the Bayes theorem.

Using the Bayes theorem, we can find the probability of y happening, given that X has occurred. X is the evidence, and y is the hypothesis. The assumption is that the features are independent, meaning that the presence of one feature does not affect the other.

Test & Validation

The initial dataset was divided into training and testing data, 80% for training and 20% for testing. The models were created from scratch for this implementation.

The first step was to train the weak learners; for the k-nearest neighbor model, a k of 5 was used, and for the decision tree, the maximum depth assigned was 7.

After training the weak learners, predictions were made to create our second dataset, in which the meta-model was trained. Figure 8 shows the five first rows of the second dataset.

Figure 8. First 5 Rows of the Meta Model Training Dataset. [Image by Author]

Finally, the meta-model was trained, and our stacking model is ready to make new predictions.

Cross-validation was implemented to evaluate the performance of the models implemented by hand. The dataset was divided into five different K folds, and the average of the recorded scores were saved. The following table shows the accuracy of the models:

Table 1. Model Accuracies Comparison. [Image by Author]

Finally, models from scikit-learn library were used to compare these to those elaborated from scratch. The comparison is shown in the following table:

Table 2. From Scratch vs. Scikit-Learn Model Comparison. [Image by Author]


The final stacking model obtained an accuracy of 67%, while the decision tree model received 73% accuracy; this is because the other two weak models fed into the meta-model have very low accuracy, and the meta-model has an accuracy of approximately the average accuracy among the three weak learner models. A better implementation would have been with Random Forest.

Although slightly better accuracy was obtained with the models implemented from scratch, the library models are better as they take much less time to train and predict new results.

Deep learning could be a better implementation for brain tumor detection and classification, as these models generally have better accuracy and would help identify the tumor’s location.

Thanks for reading!

GitHub Code

GitHub — crisdanrodriguez/brain_tumor_classification: Implementation and performance analysis of the use of a stacking ensemble machine learning algorithm and image features extraction with a brain tumor classification problem.


[1] AANS. (N.D.). Brain Tumors

[2] Díaz, F., Martínez, M., Antón, M., & González D. (2021). A Deep Learning Approach for Brain Tumor Classification and Segmentation Using a Multiscale Convolutional Neural Network

[3] Theobald, O. (2017). Machine Learning for Absolute Beginners Second Edition. Oliver Theobald

[4] Rocca, J. (2019). Ensemble methods: bagging, boosting, and stacking

[5] Kaggle. (SARTAJ). Brain Tumor Classification (MRI)

[6] Singh, S., Agarwal, S., & Srivastava, D. (2017). GLCM and Its Application in Pattern Recognition

[7] KDnuggets. (N.D.). Stacking Models for Improved Predictions

[8] Beisel, A. (2020). KNN (K-Nearest Neighbors) Classifier from Scratch

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓