Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

Hallucinations in Healthcare LLMs: Why They Happen and How to Prevent Them
Artificial Intelligence   Latest   Machine Learning

Hallucinations in Healthcare LLMs: Why They Happen and How to Prevent Them

Last Updated on May 13, 2025 by Editorial Team

Author(s): Marie

Originally published on Towards AI.

Hallucinations in Healthcare LLMs: Why They Happen and How to Prevent Them

Building Trustworthy Healthcare LLM Systems — Part 1

Hallucinations in Healthcare LLMs: Why They Happen and How to Prevent Them
Image generated by the author using ChatGPT

TL;DR

LLM hallucinations: AI-generated outputs that sound convincing but contain factual errors or fabricated information — posing serious safety risks in healthcare settings.

Three main types of hallucinations: factual errors (recommending antibiotics for viral infections), fabrications (inventing non-existent studies or guidelines), misinterpretations (drawing incorrect conclusions from real data).

Root causes of hallucinations: probabilistic generation, training that rewards fluency over factual accuracy, lack of real-time verification, and stale or biased data.

Mitigation approaches: Retrieval-Augmented Generation (RAG), domain-specific fine-tuning, advanced prompting, guardrails.

This series: build a hallucination-resistant pipeline for infectious disease knowledge, starting with a PubMed Central corpus.

Hallucinations in medical LLMs aren’t just bugs — they’re safety risks. This series walks through how to ground healthcare language models in real evidence, starting with infectious diseases.

Introduction

LLMs (large language models) are changing how we interact with medical knowledge — summarizing research, answering clinical questions, even offering second opinions. But they still hallucinate — and in medicine that’s a safety risk, not a quirk.

In medical domains, trust is non-negotiable. A hallucinated answer about infectious disease management (e.g., wrong antibiotic, incorrect diagnostic criteria) can directly impact patient safety, so grounding models in verifiable evidence is mandatory.

That’s why this blog series exists. This four-part series will show you how to build a hallucination-resistant workflow, step-by-step:

  • Part 1 (this post): what hallucinations are, why they happen and how to build a domain-specific corpus using open access medical literature
  • Part 2: Turn that corpus into a RAG pipeline
  • Part 3: Add hallucination detection metrics
  • Part 4: Put it all together and build a transparent interface to show users the evidence behind the LLM’s responses

What Are Hallucinations in LLMs?

Hallucinations are model-generated outputs that sound correct and coherent, but are not factually correct. They sound convincing but are often false, unverifiable or entirely made up.

Why They Matter in Healthcare

These errors can have serious implications, especially in clinical settings where they might lead to improper treatment recommendations. The wrong recommendation in clinical settings could have life or death consequences, which is why it is critical to mitigate these hallucinations by building transparent, evidence-based systems.

Image generated by the author using ChatGPT

Main Types of Hallucinations

1. Factual Errors

Factual errors happen when LLMs make incorrect claims about verifiable facts. Using our infectious disease example, recommending antibiotics for influenza would be a type of factual error.

2. Fabrications

Fabrications involve LLMs inventing non-existent entities or information. In the context of healthcare, for example, these could be fictional research studies, medical guidelines that don’t exist or made-up technical concepts.

3. Misinterpretations

Misinterpretation happens when LLMs take real information but misrepresents or mis-contextualizes it. For example, a model might reference a study that exists, but draws the wrong conclusions

Why LLMs hallucinate

Large language models hallucinate because they

  • don’t truly understand facts like humans do
  • simply predict what words should come next based on patterns they’ve observed in their training data.

When these AI systems encounter unfamiliar topics or ambiguous questions, they don’t have the ability to say “I don’t know” and instead generate confident-sounding but potentially incorrect responses. This tendency stems from several factors:

  • Their training prioritizes fluent, human-like text over factual caution
  • They lack real-time access to verified information sources
  • They have no inherent understanding of truth versus fiction.
  • Conflicting information in training data can push the model to average contradictory sources.

The problem is compounded by limitations in training data that may contain outdated, biased, or inaccurate information, as well as the fundamental auto-regressive nature of how these models generate text one piece at a time.

How Can We Address Hallucinations?

There are various methods to mitigate or detect hallucinations.

Mitigation Strategies

  • Fine-tuning with Domain-Specific Data: The main reason for hallucination lies in knowledge gaps in the model’s training data. This approach helps by introducing domain specific knowledge and can be very powerful to create models that understand better the specialized medical terminology or various nuances in clinical text.
  • Retrieval-Augmented Generation (RAG): This method allows the integration of external knowledge sources by retrieving relevant information before generating the answer. It helps by grounding the model outputs in verified external sources instead of relying only on the model’s training data. This is the method we will be focusing on in this series
  • Other noteworthy strategies: advanced prompting methods like Chain-of-Thoughts or Few-Shot Learning can help mitigate hallucinations by guiding the model’s answer in the right direction. Rules-based guardrails that screen outputs before they reach users add another safety layer.

Hallucination Detection

  • Source-attribution scoring: This method compares the LLM answer to the retrieved documents to detect how much of the answer is grounded in the source. Beyond identifying hallucinations, it also allows to highlight the source behind the LLM answer, which helps building trust and transparency.
  • Semantic Entropy Measurement: This method measures uncertainty about the meaning of generated responses and has been developed specifically to address the risk of hallucinations in critical areas involving patient safety for example
  • Consistency-Based Methods: This method involves a self-consistency check, where hallucinations can be detected by prompting the model multiple times with the same query and comparing the outputs for consistency.

Some interesting open-access publications to go a bit further:

If you’re interesting in reading recent research on this topic, here’s a few research papers worth reading:

Code Walkthrough: Downloading Medical Research from PubMed Central

To reduce hallucinations in healthcare LLMs, grounding them in reliable medical literature is critical. Let’s start by building a corpus from one of the best sources available: PubMed Central (PMC).

This script helps you automate the retrieval of open-access medical papers, making it easy to bootstrap a dataset tailored to your task (e.g., infectious diseases). Here’s how it works:

1. Setup and Environment

import requests
import xml.etree.ElementTree as ET
import json
import os, re, time
from dotenv import load_dotenv

load_dotenv()

api_key = os.getenv("NCBI_API_KEY")
email = os.getenv("EMAIL")

base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/"

You’ll need to set your NCBI API key and email in a .env file.

You can still call the NCBI API without an API key, but this unlocks higher rate limits and it is free

2. Search PMC

Because we are interested in full texts to build our knowledge base, we should only download articles that are open access. To do so, we need to fetch the articles from PMC:

# 1. Search PMC
search_url = f"{base_url}esearch.fcgi"
search_params = {
"db": "pmc",
"term": query,
"retmax": max_results,
"retmode": "json",
"api_key": api_key,
"email": email
}
print("Searching PMC...")
search_resp = requests.get(search_url, params=search_params)
search_resp.raise_for_status()
ids = search_resp.json()["esearchresult"]["idlist"]

This code queries PMC with your search terms (for example “infectious diseases”) and returns a list of document identifiers (PMCIDs).

3. Fetch and Parse Articles

Now we can fetch the full texts using the PMCIDs:

# 2. Batch fetch
fetch_url = f"{base_url}efetch.fcgi"

for i in range(0, len(ids), batch_size):
batch_ids
= ids[i : i + batch_size]
fetch_params = {
"db": "pmc",
"id": ",".join(batch_ids),
"retmode": "xml",
"api_key": api_key,
"email": email,
}
time.sleep(delay)
r = requests.get(fetch_url, params=fetch_params)
r.raise_for_status()

Our response is an XML object, so the final step is to parse it and create a dictionary with the relevant sections: pmcid, title, abstract, full_text, publication_date, authors:

root = ET.fromstring(r.content)
for idx, article in enumerate(root.findall(".//article")):
# Extract article details
article_data = {
"pmcid": f"PMC{batch_ids[idx]}",
"title": "",
"abstract": "",
"full_text": "",
"publication_date": "",
"authors": [],
}

# Extract title
title_elem = article.find(".//article-title")
if title_elem is not None:
article_data["title"] = "".join(title_elem.itertext()).strip()

# Extract abstract
abstract_parts = article.findall(".//abstract//p")
if abstract_parts:
article_data["abstract"] = " ".join(
"".join(p.itertext()).strip() for p in abstract_parts
)

# Extract publication date
pub_date = article.find(".//pub-date")
if pub_date is not None:
year = pub_date.find("year")
month = pub_date.find("month")
day = pub_date.find("day")

date_parts = []
if year is not None:
date_parts.append(year.text)
if month is not None:
date_parts.append(month.text)
if day is not None:
date_parts.append(day.text)

article_data["publication_date"] = "-".join(date_parts)

# Extract authors
author_elems = article.findall(".//contrib[@contrib-type='author']")
for author_elem in author_elems:
surname = author_elem.find(".//surname")
given_names = author_elem.find(".//given-names")

author = {}
if surname is not None:
author["surname"] = surname.text
if given_names is not None:
author["given_names"] = given_names.text

if author:
article_data["authors"].append(author)

# Extract full text (combining all paragraphs)
body = article.find(".//body")
if body is not None:
paragraphs = body.findall(".//p")
article_data["full_text"] = " ".join(
"".join(p.itertext()).strip() for p in paragraphs
)

The data can then be saved into a jsonl that will be used in our next step — building our RAG system.

Let’s be mindful of licensing restrictions: While open access literature allows anyone to access and read the content, it doesn’t mean the authors agreed to redistribution of their work.

While this blog post and its content are intended for personal and educational use, if you decide to use this function to build a dataset that will be redistributed or commercialized, it is important to comply with the article’s license agreement. To do so, let’s define a function that will help us pull the license data from the downloaded article:

def detect_cc_license(lic_elem):
"""
Inspect <license> … </license> for Creative Commons URLs or keywords
and return a normalised string such as 'cc-by', 'cc-by-nc', 'cc0', or 'other'.
"""

if lic_elem is None:
return "other"

# 1) gather candidate strings: any ext-link href + full text
candidates: list[str] = []
for link in lic_elem.findall(".//ext-link[@ext-link-type='uri']"):
href = link.get("{http://www.w3.org/1999/xlink}href") or link.get("href")
if href:
candidates.append(href.lower())
candidates.append("".join(lic_elem.itertext()).lower())

# 2) search for CC patterns
for text in candidates:
if "creativecommons.org" not in text and "publicdomain" not in text:
continue
# order matters (most restrictive first)
if re.search(r"by[-_]nc[-_]nd", text):
return "cc-by-nc-nd"
if re.search(r"by[-_]nc[-_]sa", text):
return "cc-by-nc-sa"
if re.search(r"by[-_]nc", text):
return "cc-by-nc"
if re.search(r"by[-_]sa", text):
return "cc-by-sa"
if "/by/" in text:
return "cc-by"
if "publicdomain/zero" in text or "cc0" in text or "public domain" in text:
return "cc0"
return "other"

Here’s a short breakdown of what the licenses mean:

Image uploaded by author

Here’s the full function for PubMed download:

def download_pmc_articles(query, 
max_results = 100,
batch_size = 20,
delay = 0.2,
allowed_licenses = {"cc-by", "cc-by-sa", "cc0"},
out_file = "pmc_articles.jsonl"
):

base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/"

# 1. Search PMC
search_url = f"{base_url}esearch.fcgi"
search_params = {
"db": "pmc",
"term": query,
"retmax": max_results,
"retmode": "json",
"api_key": api_key,
"email": email
}
print("Searching PMC...")
search_resp = requests.get(search_url, params=search_params)
search_resp.raise_for_status()
ids = search_resp.json()["esearchresult"]["idlist"]

# 2. Batch fetch
fetch_url = f"{base_url}efetch.fcgi"
skipped, saved = 0, 0
with open(out_file, "w") as f:
for i in range(0, len(ids), batch_size):
batch_ids = ids[i:i+batch_size]
fetch_params = {
"db": "pmc",
"id": ",".join(batch_ids),
"retmode": "xml",
"api_key": api_key,
"email": email
}
time.sleep(delay)
r = requests.get(fetch_url, params=fetch_params)
r.raise_for_status()
root = ET.fromstring(r.content)
for idx, article in enumerate(root.findall(".//article")):
# Check license
license = detect_cc_license(article.find(".//license"))
if license not in allowed_licenses:
skipped += 1
continue # skip disallowed license

# Extract article details
article_data = {
"pmcid": f"PMC{batch_ids[idx]}",
"title": "",
"abstract": "",
"full_text": "",
"publication_date": "",
"authors": []
}

# Extract title
title_elem = article.find(".//article-title")
if title_elem is not None:
article_data["title"] = "".join(title_elem.itertext()).strip()

# Extract abstract
abstract_parts = article.findall(".//abstract//p")
if abstract_parts:
article_data["abstract"] = " ".join("".join(p.itertext()).strip() for p in abstract_parts)

# Extract publication date
pub_date = article.find(".//pub-date")
if pub_date is not None:
year = pub_date.find("year")
month = pub_date.find("month")
day = pub_date.find("day")

date_parts = []
if year is not None:
date_parts.append(year.text)
if month is not None:
date_parts.append(month.text)
if day is not None:
date_parts.append(day.text)

article_data["publication_date"] = "-".join(date_parts)

# Extract authors
author_elems = article.findall(".//contrib[@contrib-type='author']")
for author_elem in author_elems:
surname = author_elem.find(".//surname")
given_names = author_elem.find(".//given-names")

author = {}
if surname is not None:
author["surname"] = surname.text
if given_names is not None:
author["given_names"] = given_names.text

if author:
article_data["authors"].append(author)

# Extract full text (combining all paragraphs)
body = article.find(".//body")
if body is not None:
paragraphs = body.findall(".//p")
article_data["full_text"] = " ".join("".join(p.itertext()).strip() for p in paragraphs)

f.write(json.dumps(article_data) + "\n")
saved += 1
print(f"Saved batch {i//batch_size + 1}")

print(f"Downloaded {saved} articles to {out_file}, {skipped} articles removed by license filter")

Now you can call your function with your query to create your corpus. For example:

# Install packages if needed
pip install python-dotenv requests

query = 'bacterial pneumonia treatment'
max_results = 500
batch_size = 50

download_pmc_articles(query, max_results, batch_size)

And that’s it! Now all your articles are saved in a jsonl file and ready to be processed for RAG

What’s Next: Preparing the Data for RAG

In Part 2, we’ll take the domain-specific corpus you just built and use it to power a Retrieval-Augmented Generation (RAG) system — grounding your LLM in real evidence to reduce hallucinations and improve trust.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.