Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!


Growing Teeth
Latest   Machine Learning

Growing Teeth

Last Updated on July 26, 2023 by Editorial Team

Author(s): Dr. Marc Jacobs

Originally published on Towards AI.

Bayesian analysis of the Teeth Growth dataset.

To show that Bayesian analysis is really not that difficult, or far from how we humans deal with (new) information, I have posted several Bayesian analyses already. On known datasets, such as Iris, Chickweight, MTcars or Rent99. Today, I will show how I analyzed a simulated dataset called ABC. It is nothing fancy, but that does not have to be. Bayes’ Theorem works wonders, especially in small samples, and its power comes from beyond the current dataset. That is why using Bayesian analysis will always tell you more than any kind of Maximum Likelihood approach ever can. Remember, it is all about conditional probabilities.

In this post, I will apply what I already applied before, but then to the ToothGrowth dataset which shows the effect of vitamin C on tooth growth in guinea pigs. It is a dataset that has been used in many many posts on various media before, and I am sure there is a Bayesian analysis somewhere, but I will try and go ahead anyhow. Please read this post in conjunction with the others I made. I believe you will find them quite coherent and therefore helpful in terms of learning how to apply the Bayes Theorem. Let me know if otherwise!

BTW, I placed the codes at the end, so I can tell the story better. Not everything I did is IN the post but is in the codes. Enjoy!

So, what are we dealing with? Well, as you can see below, we have the length of the teeth, the supplement applied (OJ vs VC), and the dose. It really is not more than this, and the dataset is quite small.

Two plots show the same thing, but one is with and one is without p-values.

R has many many functions that can make the most basic of data look like rocket science. Just to show you how ridiculous some of these plots are, and how much number-porn you can add to a straightforward graph, here are some examples. These numbers are not helpful in the least and will make you believe there is much more to extract from such a small sample than actually is the case.

Six plots which tell you much less than meets the eye. Do not use them. Stick to your fundamentals when analyzing data, which is combining prior knowledge with current data to form a new knowledge base. You do not need p-values.
Of course, in R as well, you p-value everything you see. Please do not. Trust me, I did this for years as well, and probably will have to again in the near future because of what clients of collaborators want, but the p-value on top of each graph adds absolutely nothing from what you can also obtain by running Bayesian analysis.
Obtaining a p-value is child’s play in R.

Below, I have made six plots, showing to the left a box-plot with a value and to the right a density plot showing just that, densities. P-values are dependent on p-values as well of course, but I want you to get acquainted with the right plots, not the left. Boxplots are one of the greatest plots to make, but in Bayesian analysis, it is all about density plots.

To the left boxplots with p-values and to the right density plots. They show the exact same things. The only danger one will encounter with density plots is that with small sample sizes, the peaks are anomalies. They are NOT indicative of multinomial data, or mixtures, but just a small dataset.

Let's move on to the actual modeling part. The dataset is small and not exciting so this is more of a technical post, showcasing the functions available in R for Bayesian analysis. Below, I am using rstanarm, although my personal favorite is brms. Both use STAN for sampling and communicate with STAN via a compiler.

The code for looking at the priors I would get automatically if I did not indicate priors myself, and the model I used to analyze the dataset. As you can see, my model has some informative priors, especially in the intercept.
Left: the priors coming from the data if I do nothing, which is nothing more than Likelihood analysis. And the prior summary of what I specified. Right: the distributions of the posterior coming from the model.
Left: summary coming from the posterior model. Notice how the intercept of 7.7 deviates from my prior of 7. It is not a big difference, but it is a difference. Of course, 7.7 is well within the variation I included in the prior, but still. Right: Posterior distributions of each of the parameters sampled.
This is the most important plot of each analysis — the chains showing the sampling procedure. They should converge.

Now, let's do some additional analysis, checking how the sampling when and comparing the likelihood to the posterior. I cannot stress how important it is to note that the likelihood and the posterior MAY deviate. Nothing wrong, no problem.

Looking good. Remember, look for the sampling space, not for overlap between the likelihood and the posterior.
Posterior values overlayed on observed responses.
The relationship between dose, supplement, and length comes from the data (dots) and the model (lines). Looking good. If you believe your prior is justified and this is what you saw in the data then this is what you get. DO NOT CHANGE YOUR PRIOR IF THE POSTERIOR DOES NOT OVERLAY WITH THE DATA. Stick with what you chose. Its science.
Of course, I can ask the model to show me the distribution of the values for specific categories, here what the length is when the dose = 1.8 and supp = OJ (left plot), and what the length is per supp when dose = 1.8 (right plot). I added the difference as well. As you can see, zero is a possibility, so for a dose of 1.8, there is no difference in length per supplement.
Summary of the predictive capabilities of the model. Should not really concern you.
Cross-validated prediction summary. Should also not concern you. Bayesian modeling is not about being able to predict well. It is about integrating that which we knew, what we collected, and what we know now. Which is always less than expected

In the previous example, I modeled dose as a numeric variable. You can of course also have a go at that variable as a factor. For this example, I used brms. What you can see is the prior that comes from the model if I specify nothing and the prior I used. My priors state that there are no differences between dose 1 and dose 2 compared to dose 0. Also, there is no difference between the supplements. Hence, my prior knowledge makes that I expect nothing, and I want to see how that coincides with the latest dataset.

As you can see, the posterior is for sure not agreeing with the prior, which has everything to do with the data. Hence, my prior knowledge, combined with the latest data, will shift the knowledge base.
The posterior distributions for each of the dosages and supplements.

And, another model in which I changed analyzed dose as a numeric variable, and changed the priors. This is all child’s play, but finding the prior is not. Since I know nothing about Guinea pigs my prior can be informative, statistically speaking, but does not have a solid base by itself.

New model.
The prior from the brms model if I specify nothing and my own priors.
The posterior distributions. As you can see, the posterior does not equal the prior which does not need to be.
The distributions for each of the chains for supp = VC. Looks good!

And another model, in which we place all parameters in the random part of the model. As a result, we have a Bayesian Mixed Model, which is a bit paradoxical. If you know why you figured Bayesian analysis out! (hint: in Bayesian analysis, all parameters are considered random and variable).

My priors.
The difference between what I did and what brms would do if I did nothing.
And the results. Look for differences between these posteriors and the priors. The metrics look good. A rhat = 1 tells me that the chains converged.
Drawing the posterior distributions.
Drawing the differences. If zero is included then no difference is a possibility.
Drawing the differences. If zero is included then no difference is a possibility.

And, last but not least, predictions come from the Bayesian model. Both the ribbon and the points show the posterior distributions. These plots show you how fickle modeling is in general — just look at the parts in the dose variable for which no data is available. Interpolation rules and that is always a weakness.

I hope you enjoyed this post! Let me know if something is amiss!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓