Quantile Random Forests: Predicting Beyond the Mean

Last Updated on November 1, 2024 by Editorial Team

Author(s): Sanjay Nandakumar

Originally published on Towards AI.

Quantile Random Forests: Predicting Beyond the Mean — Photo by Marcel Eberle on Unsplash

“An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem.”

John Tukey

Introduction
What is Quantile Random Forest?
How does the method of QRF work?
Mathematical Foundation of QRF
The key difference between QRF and Random Forests
Python Implementation: Quantile Random Forests
Explanation of Code
Case Study: Utilizing Quantile Random Forests in Healthcare
Advantages of Quantile Random Forests
Limitations of Quantile Random Forests
Conclusion

Introduction

Conventional machine learning models achieve predictive analysis to estimate, for example, the average outcome. However, there are limitations in understanding the range of possible outcomes, especially in finance, healthcare, and weather forecasting. In such situations, predicting the mean does not suffice; you equally need to know the lower and upper bounds to accurately assess the risks and opportunities. Quantile Random Forests (QRF) resolve these concerns with quantile estimates, allowing a more thorough outlook on the predictions in terms of estimated values along with their corresponding interval. This article takes a tour through the details of Quantile Random Forests: explaining how they work, demonstrating their usage and implementation with Python, with exposure to real-life case studies. We will explore why QRF is a valuable addition to your machine learning toolbox and how it can help you effectively with predicting uncertainties.

What is Quantile Random Forest?

Definition Quantile Random Forests is an extension of the Random Forest algorithm that predicts specified quantiles for a target variable in place of predicting only its average value. Quantiles are certain cutoff points in the distribution; a few examples can be the 10th percentile, the 50th percentile, and the 90th percentile. Estimation of these quantiles aids in unpacking the range of possible outputs for a given input set of features.

In short, to quote an example of predicted housing prices, QRF can give a range of:

10th percentile level: $250,000
50th percentile level-the median: $300,000
90th percentile level: $350,000.

Finally, the decision-maker will now be able to assess potential risks and opportunities and thus have an improved understanding of prediction variability.

How does the method of QRF work?

QRF is applied much in the same way as traditional Random Forests. Rather, the chief difference is that Quantile Random Forests predict a set of quantiles according to the cumulative distribution of the data points, as opposed to returning just one mean prediction for the target variable. The output is the aggregation of the values of the target variable that fall inside the leaf nodes of each decision tree contained in the forest.

Mathematical Foundation of QRF

In ordinary Random forest regression, the aim is to obtain the mean of the target variable over several decision trees. This prediction is expressed as

ŷ = (1 / T) × Σ(yt) where t = 1 to T

where T is the number of trees, and y_t is the predicted value from the t-th tree.

In Quantile Random Forests, the algorithm goes a step further by predicting a quantile Q for the target variable, such that:

ŷq = minimum of { y such that F(y | X) ≥ q }

where F(y | X) represents the cumulative distribution function (CDF) of the target variable Y given the input features X. For example, if q = 0.1, this formula predicts the 10th percentile of the distribution.

The key difference between QRF and Random Forests

Random forests simply provide a prediction of the mean for the target variable based on the leaf nodes on which the tree foliage is being analyzed.
Quantile Random Forests predict a specific quantile by computing the distribution of the target values that fall within each leaf node.

Python Implementation: Quantile Random Forests

To implement Quantile Random Forests in Python, we shall be using the quantile_forest package, which applies an extension for classical Random Forests to predict different quantiles.

Explanation of Code

We load the California Housing dataset to predict house prices based on various features.
We divide it into a training set and a testing set.
We do fit a Quantile Random Forest model with three quantile predictions: the 10th, 50th (median), and 90th percentiles.
We display the predicted quantiles, giving meaning to different ranges of house prices.
Model evaluation is done via mean squared error using the median prediction.

Case Study: Utilizing Quantile Random Forests in Healthcare

So we will describe how Quantile Random Forests could be applied in developing the above-mentioned real-world problem case studies in the healthcare industry.

The Problem

A hospital wants to estimate the length of stay given the patient’s case history, symptoms, and other clinical features. While the hospital can provide a mean length of stay, it wants to know a broad range of possible lengths of stay to be able to make the best decisions about how to allocate the resources.

Solution Using QRF

Step 1: Gather patient data to include clinical features such as the patient’s age, symptoms, pre-existing conditions, and test results.
Step 2: Develop a Quantile Random Forest model with predictions for the 10th, 50th, and 90th percentiles of the length of stay.
Step 3: Quality check the predicted quantiles to know the range of days a patient can expect to stay.

Key Insights from the Case Study Resource Utilization

Resource Optimization: By predicting various quantiles, the hospital can better allocate resources according to the variability of patients’ lengths of stay, thus enhancing efficiency.
Risk Assessment: The length of possible stay for the patient helps in identifying those at risk of overstay and requiring more careful attention.

Advantages of Quantile Random Forests

Uncertainty quantification: QRF gives a full distribution of possible results, which is important for decision-making under uncertainty.
Non-parametric: QRF makes no a priori assumption of the data, which gives high flexibility of application over a wide range of applications.
Handle complex interaction: QRF, like ordinary random forests, can handle very complex interactions of variables without explicit feature engineering.

Limitations of Quantile Random Forests

Computational complexity: Quantile Random Forest training can become computationally heavy for larger dataset sizes and greater numbers of trees.
Interpretability: Although QRF yields estimates of the quantiles, it is a more complicated process to interpret the results than in traditional regression models that yield a point estimate.

Conclusion

Quantile Random Forests provide a more powerful extension to traditional random forests by offering a further informed context of what may happen through quantile predictions. It is this capability that makes QRF extremely rich in value in sectors such as finance, medicine, and risk management, in which uncertainty around predictions is of great essence.

In this article, we have presented the theoretical basics of QRF, offered its demonstration using Python and examined a case study from healthcare. Incorporating QRF into your machine learning toolbox should convey much about your understanding and modelling of uncertainty in the predictions, thereby improving any decisions made in real-life applications.

The Quantile Random Forests endow one with the tools to handle uncertainties and help gain a deeper insight into one’s predictions for whatever platform or theme these might originate from: predicting financial returns, patient out-turns, or weather patterns.

I hope you now have an intuitive understanding of Quantile Random Forest, and these concepts will help you in building valuable and insightful projects.

You can connect with me via the following platforms-

LinkedIn
Gmail — sanjaytheanalyst360@gmail.com

References

Original Paper: Meinshausen, Nicolai. “Quantile Regression Forests.” Journal of Machine Learning Research, 7 (2006): 983–999.
Python package quantile_forest — Link to GitHub Repository

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Quantile Random Forests: Predicting Beyond the Mean

Author(s): Sanjay Nandakumar

Table of contents

Introduction

What is Quantile Random Forest?

How does the method of QRF work?

Mathematical Foundation of QRF

The key difference between QRF and Random Forests

Python Implementation: Quantile Random Forests

Explanation of Code

Case Study: Utilizing Quantile Random Forests in Healthcare

The Problem

Solution Using QRF

Key Insights from the Case Study Resource Utilization

Advantages of Quantile Random Forests

Limitations of Quantile Random Forests

Conclusion

References

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Quantile Random Forests: Predicting Beyond the Mean

Author(s): Sanjay Nandakumar

Table of contents

Introduction

What is Quantile Random Forest?

How does the method of QRF work?

Mathematical Foundation of QRF

The key difference between QRF and Random Forests

Python Implementation: Quantile Random Forests

Explanation of Code

Case Study: Utilizing Quantile Random Forests in Healthcare

The Problem

Solution Using QRF

Key Insights from the Case Study Resource Utilization

Advantages of Quantile Random Forests

Limitations of Quantile Random Forests

Conclusion

References

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement