Quantile Random Forests: Predicting Beyond the Mean
Last Updated on November 1, 2024 by Editorial Team
Author(s): Sanjay Nandakumar
Originally published on Towards AI.
βAn approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem.β
John Tukey
Table of contents
- Introduction
- What is Quantile Random Forest?
- How does the method of QRF work?
- Mathematical Foundation of QRF
- The key difference between QRF and Random Forests
- Python Implementation: Quantile Random Forests
- Explanation of Code
- Case Study: Utilizing Quantile Random Forests in Healthcare
- Advantages of Quantile Random Forests
- Limitations of Quantile Random Forests
- Conclusion
Introduction
Conventional machine learning models achieve predictive analysis to estimate, for example, the average outcome. However, there are limitations in understanding the range of possible outcomes, especially in finance, healthcare, and weather forecasting. In such situations, predicting the mean does not suffice; you equally need to know the lower and upper bounds to accurately assess the risks and opportunities. Quantile Random Forests (QRF) resolve these concerns with quantile estimates, allowing a more thorough outlook on the predictions in terms of estimated values along with their corresponding interval. This article takes a tour through the details of Quantile Random Forests: explaining how they work, demonstrating their usage and implementation with Python, with exposure to real-life case studies. We will explore why QRF is a valuable addition to your machine learning toolbox and how it can help you effectively with predicting uncertainties.
What is Quantile Random Forest?
Definition Quantile Random Forests is an extension of the Random Forest algorithm that predicts specified quantiles for a target variable in place of predicting only its average value. Quantiles are certain cutoff points in the distribution; a few examples can be the 10th percentile, the 50th percentile, and the 90th percentile. Estimation of these quantiles aids in unpacking the range of possible outputs for a given input set of features.
In short, to quote an example of predicted housing prices, QRF can give a range of:
- 10th percentile level: $250,000
- 50th percentile level-the median: $300,000
- 90th percentile level: $350,000.
Finally, the decision-maker will now be able to assess potential risks and opportunities and thus have an improved understanding of prediction variability.
How does the method of QRF work?
QRF is applied much in the same way as traditional Random Forests. Rather, the chief difference is that Quantile Random Forests predict a set of quantiles according to the cumulative distribution of the data points, as opposed to returning just one mean prediction for the target variable. The output is the aggregation of the values of the target variable that fall inside the leaf nodes of each decision tree contained in the forest.
Mathematical Foundation of QRF
In ordinary Random forest regression, the aim is to obtain the mean of the target variable over several decision trees. This prediction is expressed as
Ε· = (1 / T) Γ Ξ£(yt) where t = 1 to T
where T
is the number of trees, and y_t
is the predicted value from the t
-th tree.
In Quantile Random Forests, the algorithm goes a step further by predicting a quantile Q for the target variable, such that:
Ε·q = minimum of { y such that F(y | X) β₯ q }
where F(y | X)
represents the cumulative distribution function (CDF) of the target variable Y
given the input features X
. For example, if q = 0.1
, this formula predicts the 10th percentile of the distribution.
The key difference between QRF and Random Forests
- Random forests simply provide a prediction of the mean for the target variable based on the leaf nodes on which the tree foliage is being analyzed.
- Quantile Random Forests predict a specific quantile by computing the distribution of the target values that fall within each leaf node.
Python Implementation: Quantile Random Forests
To implement Quantile Random Forests in Python, we shall be using the quantile_forest package, which applies an extension for classical Random Forests to predict different quantiles.
Explanation of Code
- We load the California Housing dataset to predict house prices based on various features.
- We divide it into a training set and a testing set.
We do fit a Quantile Random Forest model with three quantile predictions: the 10th, 50th (median), and 90th percentiles. - We display the predicted quantiles, giving meaning to different ranges of house prices.
- Model evaluation is done via mean squared error using the median prediction.
Case Study: Utilizing Quantile Random Forests in Healthcare
So we will describe how Quantile Random Forests could be applied in developing the above-mentioned real-world problem case studies in the healthcare industry.
The Problem
A hospital wants to estimate the length of stay given the patientβs case history, symptoms, and other clinical features. While the hospital can provide a mean length of stay, it wants to know a broad range of possible lengths of stay to be able to make the best decisions about how to allocate the resources.
Solution Using QRF
- Step 1: Gather patient data to include clinical features such as the patientβs age, symptoms, pre-existing conditions, and test results.
- Step 2: Develop a Quantile Random Forest model with predictions for the 10th, 50th, and 90th percentiles of the length of stay.
- Step 3: Quality check the predicted quantiles to know the range of days a patient can expect to stay.
Key Insights from the Case Study Resource Utilization
- Resource Optimization: By predicting various quantiles, the hospital can better allocate resources according to the variability of patientsβ lengths of stay, thus enhancing efficiency.
- Risk Assessment: The length of possible stay for the patient helps in identifying those at risk of overstay and requiring more careful attention.
Advantages of Quantile Random Forests
- Uncertainty quantification: QRF gives a full distribution of possible results, which is important for decision-making under uncertainty.
- Non-parametric: QRF makes no a priori assumption of the data, which gives high flexibility of application over a wide range of applications.
- Handle complex interaction: QRF, like ordinary random forests, can handle very complex interactions of variables without explicit feature engineering.
Limitations of Quantile Random Forests
- Computational complexity: Quantile Random Forest training can become computationally heavy for larger dataset sizes and greater numbers of trees.
- Interpretability: Although QRF yields estimates of the quantiles, it is a more complicated process to interpret the results than in traditional regression models that yield a point estimate.
Conclusion
Quantile Random Forests provide a more powerful extension to traditional random forests by offering a further informed context of what may happen through quantile predictions. It is this capability that makes QRF extremely rich in value in sectors such as finance, medicine, and risk management, in which uncertainty around predictions is of great essence.
In this article, we have presented the theoretical basics of QRF, offered its demonstration using Python and examined a case study from healthcare. Incorporating QRF into your machine learning toolbox should convey much about your understanding and modelling of uncertainty in the predictions, thereby improving any decisions made in real-life applications.
The Quantile Random Forests endow one with the tools to handle uncertainties and help gain a deeper insight into oneβs predictions for whatever platform or theme these might originate from: predicting financial returns, patient out-turns, or weather patterns.
I hope you now have an intuitive understanding of Quantile Random Forest, and these concepts will help you in building valuable and insightful projects.
You can connect with me via the following platforms-
- Gmail β [email protected]
References
- Original Paper: Meinshausen, Nicolai. βQuantile Regression Forests.β Journal of Machine Learning Research, 7 (2006): 983β999.
- Python package
quantile_forest
β Link to GitHub Repository
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI