Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

From Classification to Ordinal Regression
Latest

From Classification to Ordinal Regression

Last Updated on September 13, 2022 by Editorial Team

Author(s): Topaz Gilad

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Unlock the Potential of YourΒ Labels

AI-generated (Midjourney): β€œlion-elephant hybrid, illustration, children’s book” and β€œelephant with lionΒ face”

β€œIs a lion closer to be a giraffe or an elephant?”

It is a question no one asked. Ever. Classifying lions, elephants and giraffes is a straightforward classification task. As such, it can be mostly addressed with a cross-entropy loss. Should the task of classifying someone as a child/adult/elderly be addressed in the sameΒ way?

In this blog post, we will overview best practice approaches and papers for:
(1) Addressing ordered classification.
(2) Coarse classification labels into regression continuous predictions.
(3) How to choose.
(4) How to evaluate.
The papers reviewed in this post focus on deep learning models, but the main concepts apply and may be adapted for other ML architectures asΒ well.

Discrete Labels in a Continuous World

β€œThe world is continuous, but the mind is discrete.”
David Mumford, ICMΒ 2002

We often define categories when breaking down a real-world problem into an ML-based solution. However, real target values may be continuous or at least ordered. This is something to consider and even leverage in the design of your ML model. Are you facing what seems as a classification problem? Take a moment to understand the hidden relations between your β€œclasses”.

Let’s list some classification examples:
– Online ratings: 1–5 stars.
– Medical diagnosis indexβ€Šβ€”β€Šstage 1/2/3.
– Face pose estimation [1]β€Šβ€”β€Š45/90/180 degrees.
– Age estimation and soΒ on.

Left to right: Age prediction: https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/. image credit: https://en.wikipedia.org/wiki/What%27s_My_Age_Again%3F Predict rating (ilmakiage.com), medical severity index of dVRS, Zhu et al. https://www.ahajournals.org/doi/10.1161/strokeaha.110.591586

Know YourΒ Data

Approaching the design of an ML solution, some of the first steps areΒ :
(a) Understand what data you currently have / what data you are likely to obtain in a reasonable time. In some cases, you will have continuous measurements associated with your data. For example, blood measurements or information on exact age. More often than not, those will not be available. Manual annotation of fine-grained labels is an extremely difficult and time-consuming task. Therefore, in many cases, all you will have are coarse categorical labels. Especially where an expert human opinion is required.

(b) Explore the domain. Ask your data domain expert to clarify the relations between the target classes. Every bit of prior knowledge or assumptions they assume. Question the format of the output they ask for. For example: Can a smooth real number between 0 to 1 be a better product use than a failed/passed/excellent student score predictor?

If your classes are indeed independent, this is not the blog post for you!
However, if they are dependent, ask yourself:
(1) Do the labels have an order?
(2) Do you only care about the order, or are some labels closer to their nearest labels than others? Are the β€œsteps” of distances between labelsΒ uniform?

Regression to theΒ Rescue

Logistic Regression
Most are familiar with logistic regression to discriminate between classes. While the target labels are discrete, the estimated class confidence is continuous.

Let’s consider the following case: The consequences of wrongly classifying class A as class B is not as severe as a mistake between A and C. One approach can be to use a weighted loss with the risk coefficient of each type of mistakeΒ [2].

Ordinal Regression
Another approach can be to encode the one-hot vector labels differently, as suggested in the NNRank paper [3]. For a hands-on example, I recommend Gruber’s post and Kotwani’s. Note that if modeling as an aggregation of an ensemble of binary classifiers, inconsistencies in class predictions may occur. Cao et al. 2020 CORAL suggest a solution to achieve consistency of rankΒ [4].

Figure from Cao et al 2020 CORAL [4]: inconsistent rank when aggregating binary classifiers (left) compared to the desired outcomeΒ (right)

Linear Regressionβ€Šβ€”β€ŠSeeing Beyond the Available Labels

Netflix launched lately a double thumbs up. By that, they expanded the 3 (dislike/indifferent/like) into 4 categories. This is a better distinction inside what used to be the β€œliked” category: liked vs. liked a lot. Imagine you have many coarse β€œliked” votes from the past and only a few new β€œdouble thumbs” votes. Estimating a score of user satisfaction (instead of a class) gives you better adaptivity to the new userΒ inputs.

Qin et al BioeNet 2020 showcases a strategy to leverage coarse labels into a fine-grained linear regression [5]. One of the important things to consider when taking that path is to evaluate your inner-class order. We will discuss nextΒ how.

Figures from Qin et al. BioeNet 2020: Learning fine-grained estimation from coarse labelsΒ [5]

Another possible benefit of transforming your discrete integer target labels into real numbers is the positive effect of soft labels. As shown by the Google Brain team (2020), using soft labels not only reduced over-confidence but also improved the calibration even without temperature scalingΒ [6–8].

Make Sure You Are in the Right Direction

Shape and Location of Clusters
I am a big believer in the analysis of your DNN embedded feature space. Create a T-SNE of your test set feature space. Follow the illustrations in the figure below. With a cross-entropy loss, you expect each cluster to condense (minimize inner-class variance) and move away from one another (maximize intra-class variance). However, in ordinal regression, you expect to see the clusters in the right order of proximity in the feature space. For linear regression with an MSE loss, for example, you will not only expect to see the right order but also a continuous order between classes with less margin from one cluster to the next. If you still do see a noticeable margin, this may also indicate your test set is lacking border examples.

Illustration by theΒ author

Relations Between Samples from the Same Category
If you can get your hands on a few finer-grained labeled samples, you can use them as a test set to see if your model generalizes well enough in the right inner-class order. For example, imagine you are trying to estimate the age of a person, but the massive amount of labeled data you have are coarse labels (0–3/4–14/teens/20 something/30 something and so on). If you have a few samples of people from the same β€œage bucket” but with data of exact age label (say age 4, age 8, age 14), check if their locations in the feature space are ordered correctly and that the output predictions are ordered as youΒ expect.

Bottom Line

Think about the relations between your target classes. They may be dependant orΒ ordered.

Consider the different types of regression to gain more from yourΒ labels.

Visualize your feature space and test also the inner-class order of predicted outputs.

While when given a set of small possible classes, the classification approach may seem an obvious direction to go, considering the relation between the classes is key to boosting your ML and sometimes may be critical forΒ success!

References

[1] Beyer, L., Hermans, A. and Leibe, B., 2015, October. Biternion nets: Continuous head pose regression from discrete training labels. In German Conference on Pattern Recognition (pp. 157–168). Springer, Cham.

[2] Polat G, Ergenc I, Kani HT, Alahdab YO, Atug O, Temizel A. Class Distance Weighted Cross-Entropy Loss for Ulcerative Colitis Severity Estimation. arXiv preprint arXiv:2202.05167. 2022 FebΒ 9.

[3] Cheng, J., Wang, Z. and Pollastri, G., 2008, June. A neural network approach to ordinal regression. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (pp. 1279–1284). IEEE.

[4] Cao, W., Mirjalili, V. and Raschka, S., 2020. Rank consistent ordinal regression for neural networks with application to age estimation. Pattern Recognition Letters, 140, pp.325–331.

[5] Qin, Z., Chen, J., Jiang, Z., Yu, X., Hu, C., Ma, Y., Miao, S. and Zhou, R., 2020. Learning fine-grained estimation of physiological states from coarse-grained labels by distribution restoration. Scientific Reports, 10(1),Β pp.1–10.

[6] MΓΌller, R., Kornblith, S. and Hinton, G.E., 2019. When does label smoothing help? Advances in neural information processing systems,Β 32.

[7] Guo, C., Pleiss, G., Sun, Y. and Weinberger, K.Q., 2017, July. On calibration of modern neural networks. In International conference on machine learning (pp. 1321–1330). PMLR.

[8] Minderer, M., Djolonga, J., Romijnders, R., Hubis, F., Zhai, X., Houlsby, N., Tran, D. and Lucic, M., 2021. Revisiting the calibration of modern neural networks. Advances in Neural Information Processing Systems, 34, pp.15682–15694.

[6] MΓΌller, R., Kornblith, S. and Hinton, G.E., 2019. When does label smoothing help? Advances in neural information processing systems,Β 32.

[7] Guo, C., Pleiss, G., Sun, Y. and Weinberger, K.Q., 2017, July. On calibration of modern neural networks. In International conference on machine learning (pp. 1321–1330). PMLR.

[8] Minderer, M., Djolonga, J., Romijnders, R., Hubis, F., Zhai, X., Houlsby, N., Tran, D. and Lucic, M., 2021. Revisiting the calibration of modern neural networks. Advances in Neural Information Processing Systems, 34, pp.15682–15694.


From Classification to Ordinal Regression was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓