Our terms of service are changing. Learn more.

Publication

Probability

Analyzing Ordinal Data in SAS using the Multinomial Distribution.

Last Updated on March 24, 2022 by Editorial Team

Author(s): Dr. Marc Jacobs

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

This post is an extension of an earlier introductory post I made on using Generalized Linear Mixed Models in SAS.

Below, I will use a dataset containing the diarrhea scores of pigs to show how to analyze ordinal data. Before doing so, one has to consider that the assessment of diarrhea scores is done in a subjective way, using raters, who rate diarrhea on a score from 0 to 3 in steps of 1 — 0,1,2,3. A nine equals the absence of feces from which to assess diarrhea. It is not immediately clear if that actually means that there is no diarrhea or just plain absence. It can also have an effect on the analysis itself.

Score distribution of a swine population by five raters. It is not difficult to spot that the results of a trial are dependent on not just the ‘actual’ score but also the person who rated the feces.
Including or excluding a particular score can have an influence, since it affects the denominator and thus the probability estimate.

Ordinal data can be analyzed in multiple ways, of which we show the ordinal and multinomial way in this post. The type of analysis you will use depends on your assumptions, the specific question you would like to answer, and what the model is able to accommodate.

The type of analysis should be based on the specific question you would like to answer, the availability of the data, and the ability of the models.
Model-wise, it definitely has an effect if you model the data across time or per time point. Ordinal and multinomial models, in SAS, cannot accommodate analyses per time-point.

Diarrhea is a discrete outcome and as such should be treated as a proportion and not a continuous distribution with a true mean. Renaming the levels of diarrhea from {0, 1, 2, 3, 9} into {A, B, C, D, Z} would solve the confusion. Diarrhea can be measured in multiple ways the choice of which should depend on the:

  1. the specific question you want to be answered
  2. availability of data

Let's do some actual analysis and use a dataset that looks like this, to start with.

The dataset in which, for each pig, we have the diarrhea score per day up to 42 days.

The multinomial approach is perhaps the easiest, to begin with. Remember, Ordinal and multinomial models cannot handle repeated data. Thus only total periods may be used (f.e. 0–42, 0–14, etc.). So, to use ordinal/multinomial models, specific data management is required:

  1. per Block, Treatment & Fecal Consistency Score, the frequency needs to be calculated and put in the model.
  2. In case of fecal consistency, category ‘9’ (FC=9) is removed, as these are often deemed missing values (see earlier discussion points)

Below you see the PROC SQL command to transform the data in a dataset that is necessary for both multinomial and ordinal modeling. The trick is to get frequencies per category per treatment per block.

Which will make the dataset look like this.
And immediately shows the distribution of scores across treatments.

Below, you see the full code used the model the multinomial. The distribution is set at multinomial, but this is also the case if you want to model using the ordinal approach. Hence, the key is the link function, set here at glogit and will be set to clogit for the ordinal approach — glogit stands for generalized logit and means that we will estimate the proportions separately. Treatment four is set as the reference treatment. For the estimates, you need to use the bycat function to get estimates per category of the feacal consistency score.

Here you can see the full code used to analyze the dataset using a multinomial model.
No treatment differences in the distribution of the feacal scores. No surprise, if you look at the plot based on the observed data.
And the differences. Since we are dealing with a logit function, the comparison has to be done via the Odds Ratio once again. However, what is more important to notice is that the comparison is not done at the treatment level alone — it is also done at the feacal score level. Hence, treatments are compared for their probability of achieving a particular feacal score. Since we are doing it the multinomial way, there is no interdependence.
And the most important plot by far. This plot shows the probability of achieving a particular score given a particular treatment.

Plotting the results is often a better way of getting to terms with your model. This graph shows the probabilities for each of the diarrhea scores across treatments. From the graph you can see that:

  1. Score 3 is almost absent
  2. Score 0 is most present
  3. Treatment difference seems to be most present in score 2

To understand the Odds Ratios from the previous table, you need to get comfortable with proportions. So, let's use another example but to do so, we need to once again transform the data to let PROC GLIMMIX process it.

The score distribution by blocks and treatments.

And below the same code, we saw before.

The code and the probability distribution. This time, one can clearly see that treatment two sticks out.

Now, the plot above is quite easy to interpret but of course, you need to have actual numerical estimates to make sense of the effect size of the differences between treatments. In a multinomial model, the interpretation depends on the choices you made — even more than for any other kind of model.

So, as you can see, the table is a bit different than normal, but we already knew that. Comparisons are made per score category. But, if you look more closely, you can see that score 0 is missing. That is because it is the reference score. If you want to have specific estimates for score=0, then you need to set a different reference score.

The reference score determines all.

For the rest, it is pretty much straightforward. You get an estimate and a mean per an estimate, of which the mean is the probability. Once we go towards direct comparisons and treatment differences, you will have to deal with Odds Ratios again.

The estimates in a table.
The estimates in a graph in the left and middle, and a direct comparison to the right. The comparison to the right is NOT an odds ratio. Also, the two left plots show if the treatments are similar, per feacal score (1 & 2). The middle plot shows that, for feacal score 2, treatment2 two, four, and five are different. This is confirmed by the plot to the right in which treatments two, four, and five crosses.

The influence of the reference score is easiest done by repeating the analysis, but by changing the reference score. Below, you see the probability plots for reference score zero en two. Now, please look closely and see that the actual probability plot does NOT change. That is because it cannot change. However, the estimates do differ, because the scores are compared to a DIFFERENT reference score.

Probability plots stay the same regardless of the reference score. That is because they are reference agnostic and treatment dependent.
Now this is where you can see the difference. If you change the reference score, the treatment estimates and differences change looking at the same score. This is not a true change, but just a change in estimate by using a different anchor point.

Below you can see what changing the reference score does for interpretation. It is quite something, so be careful if you look at these plots. In fact, I would like to advise you to only look at these plots in conjunction with the overall probability plot.

They are all connected. Pieces of the same puzzle. Just look at the numbers.
You need this plot to make sense of the plots above and below.
Differences are linked to the reference used

In summary, when analyzing diarrhea scores, the multinomial model is the most straightforward as it looks into the scores if they are mutually exclusive, and analyzes the treatment effect per score category. Beware, the reference score affects the results so keep thinking in terms of probabilities — graphs help — and in terms of odds ratios which are not intuitive. You could run the model multiple times, changing the reference score and treatment to get the full picture. Another caveat is that you cannot analyze the data in a repeated measures way — only suitable for the total period.

Now, let's move to the ordinal approach.

The ordinal model is built on the same foundation as the multinomial model, however, it does approach the data quite differently. In fact, the ordinal model builds in assumptions to make analyses and comparisons more easy and direct. Let's see if that truly is the case.

The majority of multi-level variables are ordinal which means that categories have a specific ordering although the “distance” between the levels is not clear. However, the categories are man-made and there is no actual underlying theoretical distribution. When using an ordinal model, you however do assume that there is a latent continuous variable underneath its categories. You also assume the Proportional Odds Assumption which is often not met. One way to defend the use of an ordered model is to say that underneath the observed scores there is a latent variable, Y*. This latent variable has thresholds, and crossing a threshold will place you in a different category. This is why an ordinal model is often called a ‘threshold model’. As a result, an ordinal model will come up with a single equation for Y*. This is also directly the difference with the multinomial model which allows for more than one equation since it treats the categories as separate

A distribution on top of observed categories? Or, did we use categories to describe an unobserved distribution? This is the ‘problem’ an ordinal model tries to deal with by incorporating it as an advantage. However, most of the time, ordinal models are difficult to fit because the equal space they assume between categories makes no sense whatsoever.

In an ordinal model, the only parameter that changes is the intercept. The fixed effects remain the same for each category. The intercept defines the boundaries between categories when the linear predictor is zero. Changes in the linear predictor move the boundaries together so that the distance between them remains CONSTANT!! This is the definition of the Proportional Odds assumption

It is the intercept that determines the threshold for the next category. Hence, in a five-category outcome variable, we have four intercepts as you can see in the example below.

An ordinal model will come up with a single equation for Y*. The intercepts themselves have no real meaning — they are placed on a latent distribution which is a mathematical construct as you could see from the pictures above.
The PROC SQL code to transform the data into the necessary structure. It looks exactly the same as the structure needed for the multinomial model. Because it does!
The part of the model that I needed to change to not run an ordinal model instead of a multinomial model — the cumlogit or clogit. The name is not in vain as it means that probability is modeled in a cumulative way, not general/separate as in a multinomial.
The most important table from an ordinal analysis. If you see a table like this, you already know that category three will bring you problems. Perhaps best to combine it with category two?
No differences between treatments.
Advanced codes, but nothing special to be honest. Just ways to fully understand what the model has done by requesting more extensive output.
A short summary on how to interpret the output. The biggest difference is that the comparisons are done on the treatment level, not the treatment * score. This is because the ordinal model assumes proportionality. Hence, the difference between treatments for score one is the same as score two or three. Remember, this was not the case in the multinomial model. Truth be told, the assumption often makes little sense, if it DOES hold you will have a much more global model.
The probabilities are cumulative, meaning that the probability for score one is actually the probability for score zero plus the actual probability for score one.

The best way to deal with complex math is to look at this plot — the cumulative probability plot. It shows the cumulative distribution of the frequencies. Here, it is clearly visible that score 0 is most frequent and that scores 2 and 3 add only a little. Hence, it is perhaps best to combine scores 1, 2 & 3 vs score 0

The key assumption to the proportional odds model is that the relationship BETWEEN categories remains the SAME. Hence, probabilities are not category-specific and treatment effects move the boundaries between categories as a group. As such, the odds ratios of a treatment effect for NOT SEVERE vs SEVERE are the same as for SEVERE vs VERY SEVERE.

Remember, an ordinal model tries to find a linear predictor that will predict the change from one category to the next, but the probabilities are not category-specific. We need to test this assumption using PROC GLIMMIX

The estimate calls are made to test the proportionality assumption. What I want to see are estimates that are similar within a threshold.

If the assumptions of the ordered cumlogit model are met, then all of the corresponding coefficients (except the intercepts) should be the same. The assumptions of the model are therefore sometimes referred to as the parallel lines or parallel regressions assumptions

The estimates are actually quite similar, except for treatments 4 and 5 for the threshold defining the boundary between scores 1 and 2 (c=1). I am also not completely sure about treatments 4 and 5 for c=0. Hence, a multinomial model is perhaps safer to use.

In summary, assuming that the data is ordinal means that there is a certain ordering of the discrete data. The specific ordering is determined by a latent continuous distribution. The data for the ordinal approach needs the same setup as for the nominal approach. Results show the effect of the treatment, regardless of the specific score. Hence, there is no reference to take into account. Checking if the proportional odds assumption was upheld is tricky.

I hope you liked this post! More to come!


Analyzing Ordinal Data in SAS using the Multinomial Distribution. was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓