Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Outline a Smaller Class With the Custom Loss Function
Latest

Outline a Smaller Class With the Custom Loss Function

Last Updated on January 6, 2023 by Editorial Team

Last Updated on October 13, 2022 by Editorial Team

Author(s): Konstantin Pluzhnikov

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

A short guide to you for taking the most from classification when dealing with imbalanced datasets

(Left) Photo by Mike Lawrence on Flickr | (Middle) Photo by Jernej Furman on Flickr | (Right) Photo by Leonard J Matthews onΒ Flickr

Why use a custom loss function?

There may be situations when the accuracy metric is insufficient to get the expected results. We may want to reduce the false negative (FN) or false positive (FP) rate. It may be beneficial when a dataset is imbalanced and the result we seek belongs to a smallerΒ class.

Where toΒ apply

1. Fraud detection. We want to reduce false negative samples (e.g., a model treats a transaction as usual when it is doubtful). A goal is to minimize the risk of fraudulent transactions bypassing unnoticed (even if the false positive rate wouldΒ grow).

2. Disease identification. It has the same logic. We want to increase the chances of catching the disease, even if we get more FP cases to sort out in theΒ future.

3. Pick an outsider in a sports match. Let’s assume a game with two outcomes (win/loss of a favorite). But we want to find out if an outsider has a chance. A balanced model will perform β€œcherry picking” most of the time because it is an easy way to predict a favorite’s win. The way out is to penalize false negatives (cases when a model predicts the victory of a stronger team, but the accurate label is a loss of a favorite).

We do not want high FP rates in the first two cases because it would increase resource usage to confirm that sample is not a fraud/disease. So model optimization remains necessary.

The bridge between original and customΒ logloss

An ordinary gradient boosting algorithm (the algorithm) makes the following steps. It gets the predicted value (let’s call it β€œpredt”, in the form of logit or logitraw depending on the library we use) and a respective true label (let’s call it β€œy”), then it estimates the value of the logloss function. It performs these actions line by line to calculate the cumulative loss functionΒ value.

The logloss has the following formula:

where Nβ€” number of samples, yβ€” true label, logβ€Šβ€”β€Šnatural logarithm.

p_iβ€Šβ€”β€Šthe probability of the ith sample being 1(sigmoid function):

where m is the number of features, zis linear regression where ware weights of features, xare featureΒ values.

The next step is to update the features’ weights to minimize the loss. The algorithm calculates a gradient and hessian to choose a direction and power for changingΒ weights.

The formula for the gradient:

The formula for theΒ hessian:

We calculate derivatives by applying the chain rule to the logloss function lwith respect to a change of an argumentΒ z.

We can imagine the process as a game of golf. A golfer (an algorithm) tries to get the ball into the hole (a correct prediction that minimizes loss); to do this, he can choose the direction and the power of the shot (the gradient) and take various clubs to get control over the strike (the hessian).

Photo by David Joyce onΒ Flickr

The main goal of the algorithm is to make cumulative logloss as minimal as possible (a strange golfer who chases the number of holing balls instead of making the result with minimum strikes). If we have an imbalanced dataset and features do not provide unambiguous categorization, it tends to predict that a sample belongs to the larger group (some kind of cherry-picking).

Let’s imagine a game of golf with changed rules; most holes have a standard value, but there are extras with more significant value than ordinary holes. A golfer does not understand the price of a target beforehand. If a player holes extra, he receives a higher reward. If he incorrectly identifies the hole as a standard one and refuses to approach it, he loses time and receives less compensation.

The basic algorithm cannot realize the change in rules, so it proceeds to simpler ones and fails to identify and score extras. Let’s introduce β€œbeta” (a positive number without an upper bound) to turn the tables around. Its target is to raise a cumulative loss dramatically if the algorithm makes an incorrect cherry-picking prediction. The basic algorithm also has β€œbeta”, which equalsΒ 1.

The beta-logloss has the following formula:

Formulas for the gradient andΒ hessian:

The custom loss is implemented by changing the algorithm β€œbeta” to less than 1 to penalize FN or more than 1 to make FP cost more. We should recalculate the gradient and hessian with a new β€œbeta”, forcing the algo to change a loss calculation policy.

Let’s code customΒ loss

I implemented the code in scikit-learn API for XGboost in Python (version 1.0.2). We can use custom loss functions in gradient boosting packages (XGBoost, LightGBM, Catboost) or deep learning packages like TensorFlow.

There are four user-defined functions to make a custom loss function work. The first is to calculate the first derivative of the amended logloss function (a gradient), the second is to calculate the second derivative, the third is to get the objective function, and the fourth is to calculate the amended evaluation metric.

Hereafter yparameter contains an actual class value, predtgets the predicted value to evaluate.

The logreg_err_beta_sklearnfunction calculates the logloss based on the dataset for valuation. It checks performance against unseen data. Even if we apply sklearn API, this user-defined function receives dmat argument, which is DMatrix datatype (internal data structure of XGBoost).

Example based on a TitanicΒ dataset

The titanic dataset is a case of an imbalanced dataset where the survival rate is barelyΒ 38%.

Let’s skip preprocessing steps made before modeling and go straight to the classification task. You can find a whole in my GitHub repository referenced at the end of theΒ article.

The result of the basic algorithm is the following confusion matrix:

Image byΒ author

The basic algorithm gives 38 FN cases. Let’s assume that our goal is to reduce the number of FN. Let’s change an objective function to the custom one, amend evaluation metrics accordingly, and set β€œbeta” toΒ .4.

Changes from the basic algorithm are disabling of default evaluation metric, the explicit introduction of the custom logloss function, and the respective evaluation metric.

The confusion matrix for the beta-logloss algorithm:

Image byΒ author

The beta algorithm decreases FN cases to 25 (-34%), true positives (TP) grew to 112 (+13%), FP grew to 53 (+103%), true negatives (TN) decreased to 167 (-14%). The fall in FN leads to a boost in both TP and FP. Overall accuracyΒ fell.

We can easily apply the same logic to penalize FP by choosing β€œbeta” higher thanΒ 1.

Conclusions

The beta-logloss has some distinctive pluses as well as flaws. I have listed the most obviousΒ ones.

Advantages

  • Easy and fast to apply (use four user-defined functions and beta, and that’sΒ it).
  • There is no need to perform manipulation with underlying data before modeling (if a dataset is not highly imbalanced)
  • It may be applied as a part of data exploration or as a part of model stacking.
  • We may add it to the most popular machine-learning packages.

Shortcuts

  • We should tune β€œbeta” to get optimal FN to FP trade-off.
  • It may not provide meaningful results when a dataset is highly imbalanced (the dataset where the minor class is less than 10% of all samples). Exploratory data analysis is vital to make the modelΒ work.
  • In case we penalize FN it often leads to a huge growth of FP and vice versa. You may need additional resources to compensate for thatΒ growth.

References

  1. Overview of binary cross-entropy (logloss): https://towardsdatascience.com/understanding-binary-cross-entropy-log-loss-a-visual-explanation-a3ac6025181
  2. Overview of the custom loss function for regression tasks: https://towardsdatascience.com/custom-loss-functions-for-gradient-boosting-f79c1b40466d
  3. XGBoost documentation related to custom loss function: https://xgboost.readthedocs.io/en/latest/tutorials/custom_metric_obj.html
  4. Plot_confusion_matrix function source: https://www.kaggle.com/grfiv4/plot-a-confusion-matrix
  5. Link to my GitHub repository: https://github.com/kpluzhnikov/binary_classification_custom_loss


Outline a Smaller Class With the Custom Loss Function was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓