Hallucinations in Healthcare LLMs: Why They Happen and How to Prevent Them

Last Updated on May 13, 2025 by Editorial Team

Author(s): Marie

Originally published on Towards AI.

Hallucinations in Healthcare LLMs: Why They Happen and How to Prevent Them

Building Trustworthy Healthcare LLM Systems — Part 1

TL;DR

LLM hallucinations: AI-generated outputs that sound convincing but contain factual errors or fabricated information — posing serious safety risks in healthcare settings.

Three main types of hallucinations: factual errors (recommending antibiotics for viral infections), fabrications (inventing non-existent studies or guidelines), misinterpretations (drawing incorrect conclusions from real data).

Root causes of hallucinations: probabilistic generation, training that rewards fluency over factual accuracy, lack of real-time verification, and stale or biased data.

Mitigation approaches: Retrieval-Augmented Generation (RAG), domain-specific fine-tuning, advanced prompting, guardrails.

This series: build a hallucination-resistant pipeline for infectious disease knowledge, starting with a PubMed Central corpus.

Hallucinations in medical LLMs aren’t just bugs — they’re safety risks. This series walks through how to ground healthcare language models in real evidence, starting with infectious diseases.

Introduction

LLMs (large language models) are changing how we interact with medical knowledge — summarizing research, answering clinical questions, even offering second opinions. But they still hallucinate — and in medicine that’s a safety risk, not a quirk.

In medical domains, trust is non-negotiable. A hallucinated answer about infectious disease management (e.g., wrong antibiotic, incorrect diagnostic criteria) can directly impact patient safety, so grounding models in verifiable evidence is mandatory.

That’s why this blog series exists. This four-part series will show you how to build a hallucination-resistant workflow, step-by-step:

Part 1 (this post): what hallucinations are, why they happen and how to build a domain-specific corpus using open access medical literature
Part 2: Turn that corpus into a RAG pipeline
Part 3: Add hallucination detection metrics
Part 4: Put it all together and build a transparent interface to show users the evidence behind the LLM’s responses

What Are Hallucinations in LLMs?

Hallucinations are model-generated outputs that sound correct and coherent, but are not factually correct. They sound convincing but are often false, unverifiable or entirely made up.

Why They Matter in Healthcare

These errors can have serious implications, especially in clinical settings where they might lead to improper treatment recommendations. The wrong recommendation in clinical settings could have life or death consequences, which is why it is critical to mitigate these hallucinations by building transparent, evidence-based systems.

Image generated by the author using ChatGPT

Main Types of Hallucinations

1. Factual Errors

Factual errors happen when LLMs make incorrect claims about verifiable facts. Using our infectious disease example, recommending antibiotics for influenza would be a type of factual error.

2. Fabrications

Fabrications involve LLMs inventing non-existent entities or information. In the context of healthcare, for example, these could be fictional research studies, medical guidelines that don’t exist or made-up technical concepts.

3. Misinterpretations

Misinterpretation happens when LLMs take real information but misrepresents or mis-contextualizes it. For example, a model might reference a study that exists, but draws the wrong conclusions

Why LLMs hallucinate

Large language models hallucinate because they

don’t truly understand facts like humans do
simply predict what words should come next based on patterns they’ve observed in their training data.

When these AI systems encounter unfamiliar topics or ambiguous questions, they don’t have the ability to say “I don’t know” and instead generate confident-sounding but potentially incorrect responses. This tendency stems from several factors:

Their training prioritizes fluent, human-like text over factual caution
They lack real-time access to verified information sources
They have no inherent understanding of truth versus fiction.
Conflicting information in training data can push the model to average contradictory sources.

The problem is compounded by limitations in training data that may contain outdated, biased, or inaccurate information, as well as the fundamental auto-regressive nature of how these models generate text one piece at a time.

How Can We Address Hallucinations?

There are various methods to mitigate or detect hallucinations.

Mitigation Strategies

Fine-tuning with Domain-Specific Data: The main reason for hallucination lies in knowledge gaps in the model’s training data. This approach helps by introducing domain specific knowledge and can be very powerful to create models that understand better the specialized medical terminology or various nuances in clinical text.
Retrieval-Augmented Generation (RAG): This method allows the integration of external knowledge sources by retrieving relevant information before generating the answer. It helps by grounding the model outputs in verified external sources instead of relying only on the model’s training data. This is the method we will be focusing on in this series
Other noteworthy strategies: advanced prompting methods like Chain-of-Thoughts or Few-Shot Learning can help mitigate hallucinations by guiding the model’s answer in the right direction. Rules-based guardrails that screen outputs before they reach users add another safety layer.

Hallucination Detection

Source-attribution scoring: This method compares the LLM answer to the retrieved documents to detect how much of the answer is grounded in the source. Beyond identifying hallucinations, it also allows to highlight the source behind the LLM answer, which helps building trust and transparency.
Semantic Entropy Measurement: This method measures uncertainty about the meaning of generated responses and has been developed specifically to address the risk of hallucinations in critical areas involving patient safety for example
Consistency-Based Methods: This method involves a self-consistency check, where hallucinations can be detected by prompting the model multiple times with the same query and comparing the outputs for consistency.

Some interesting open-access publications to go a bit further:

If you’re interesting in reading recent research on this topic, here’s a few research papers worth reading:

Code Walkthrough: Downloading Medical Research from PubMed Central

To reduce hallucinations in healthcare LLMs, grounding them in reliable medical literature is critical. Let’s start by building a corpus from one of the best sources available: PubMed Central (PMC).

This script helps you automate the retrieval of open-access medical papers, making it easy to bootstrap a dataset tailored to your task (e.g., infectious diseases). Here’s how it works:

1. Setup and Environment

import requests
import xml.etree.ElementTree as ET
import json
import os, re, time
from dotenv import load_dotenv

load_dotenv()

api_key = os.getenv("NCBI_API_KEY")
email = os.getenv("EMAIL")

base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/"

You’ll need to set your NCBI API key and email in a .env file.

You can still call the NCBI API without an API key, but this unlocks higher rate limits and it is free

2. Search PMC

Because we are interested in full texts to build our knowledge base, we should only download articles that are open access. To do so, we need to fetch the articles from PMC:

# 1. Search PMC
search_url = f"{base_url}esearch.fcgi"
search_params = {
 "db": "pmc",
 "term": query,
 "retmax": max_results,
 "retmode": "json",
 "api_key": api_key,
 "email": email
}
print("Searching PMC...")
search_resp = requests.get(search_url, params=search_params)
search_resp.raise_for_status()
ids = search_resp.json()["esearchresult"]["idlist"]

This code queries PMC with your search terms (for example “infectious diseases”) and returns a list of document identifiers (PMCIDs).

3. Fetch and Parse Articles

Now we can fetch the full texts using the PMCIDs:

# 2. Batch fetch
fetch_url = f"{base_url}efetch.fcgi"

for i in range(0, len(ids), batch_size):
 batch_ids = ids[i : i + batch_size]
 fetch_params = {
 "db": "pmc",
 "id": ",".join(batch_ids),
 "retmode": "xml",
 "api_key": api_key,
 "email": email,
 }
 time.sleep(delay)
 r = requests.get(fetch_url, params=fetch_params)
 r.raise_for_status()

Our response is an XML object, so the final step is to parse it and create a dictionary with the relevant sections: pmcid, title, abstract, full_text, publication_date, authors:

root = ET.fromstring(r.content)
for idx, article in enumerate(root.findall(".//article")):
 # Extract article details
 article_data = {
 "pmcid": f"PMC{batch_ids[idx]}",
 "title": "",
 "abstract": "",
 "full_text": "",
 "publication_date": "",
 "authors": [],
 }

 # Extract title
 title_elem = article.find(".//article-title")
 if title_elem is not None:
 article_data["title"] = "".join(title_elem.itertext()).strip()

 # Extract abstract
 abstract_parts = article.findall(".//abstract//p")
 if abstract_parts:
 article_data["abstract"] = " ".join(
 "".join(p.itertext()).strip() for p in abstract_parts
 )

 # Extract publication date
 pub_date = article.find(".//pub-date")
 if pub_date is not None:
 year = pub_date.find("year")
 month = pub_date.find("month")
 day = pub_date.find("day")

 date_parts = []
 if year is not None:
 date_parts.append(year.text)
 if month is not None:
 date_parts.append(month.text)
 if day is not None:
 date_parts.append(day.text)

 article_data["publication_date"] = "-".join(date_parts)

 # Extract authors
 author_elems = article.findall(".//contrib[@contrib-type='author']")
 for author_elem in author_elems:
 surname = author_elem.find(".//surname")
 given_names = author_elem.find(".//given-names")

 author = {}
 if surname is not None:
 author["surname"] = surname.text
 if given_names is not None:
 author["given_names"] = given_names.text

 if author:
 article_data["authors"].append(author)

 # Extract full text (combining all paragraphs)
 body = article.find(".//body")
 if body is not None:
 paragraphs = body.findall(".//p")
 article_data["full_text"] = " ".join(
 "".join(p.itertext()).strip() for p in paragraphs
 )

The data can then be saved into a jsonl that will be used in our next step — building our RAG system.

Let’s be mindful of licensing restrictions: While open access literature allows anyone to access and read the content, it doesn’t mean the authors agreed to redistribution of their work.

While this blog post and its content are intended for personal and educational use, if you decide to use this function to build a dataset that will be redistributed or commercialized, it is important to comply with the article’s license agreement. To do so, let’s define a function that will help us pull the license data from the downloaded article:

def detect_cc_license(lic_elem):
 """
 Inspect <license> … </license> for Creative Commons URLs or keywords
 and return a normalised string such as 'cc-by', 'cc-by-nc', 'cc0', or 'other'.
 """
 if lic_elem is None:
 return "other"

 # 1) gather candidate strings: any ext-link href + full text
 candidates: list[str] = []
 for link in lic_elem.findall(".//ext-link[@ext-link-type='uri']"):
 href = link.get("{http://www.w3.org/1999/xlink}href") or link.get("href")
 if href:
 candidates.append(href.lower())
 candidates.append("".join(lic_elem.itertext()).lower())

 # 2) search for CC patterns
 for text in candidates:
 if "creativecommons.org" not in text and "publicdomain" not in text:
 continue
 # order matters (most restrictive first)
 if re.search(r"by[-_]nc[-_]nd", text):
 return "cc-by-nc-nd" 
 if re.search(r"by[-_]nc[-_]sa", text):
 return "cc-by-nc-sa"
 if re.search(r"by[-_]nc", text):
 return "cc-by-nc"
 if re.search(r"by[-_]sa", text):
 return "cc-by-sa"
 if "/by/" in text:
 return "cc-by"
 if "publicdomain/zero" in text or "cc0" in text or "public domain" in text:
 return "cc0"
 return "other"

Here’s a short breakdown of what the licenses mean:

Here’s the full function for PubMed download:

def download_pmc_articles(query, 
 max_results = 100, 
 batch_size = 20, 
 delay = 0.2, 
 allowed_licenses = {"cc-by", "cc-by-sa", "cc0"},
 out_file = "pmc_articles.jsonl"):

 base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/"
 
 # 1. Search PMC
 search_url = f"{base_url}esearch.fcgi"
 search_params = {
 "db": "pmc",
 "term": query,
 "retmax": max_results,
 "retmode": "json",
 "api_key": api_key,
 "email": email
 }
 print("Searching PMC...")
 search_resp = requests.get(search_url, params=search_params)
 search_resp.raise_for_status()
 ids = search_resp.json()["esearchresult"]["idlist"]
 
 # 2. Batch fetch
 fetch_url = f"{base_url}efetch.fcgi"
 skipped, saved = 0, 0
 with open(out_file, "w") as f:
 for i in range(0, len(ids), batch_size):
 batch_ids = ids[i:i+batch_size]
 fetch_params = {
 "db": "pmc",
 "id": ",".join(batch_ids),
 "retmode": "xml",
 "api_key": api_key,
 "email": email
 }
 time.sleep(delay)
 r = requests.get(fetch_url, params=fetch_params)
 r.raise_for_status()
 root = ET.fromstring(r.content)
 for idx, article in enumerate(root.findall(".//article")):
 # Check license
 license = detect_cc_license(article.find(".//license"))
 if license not in allowed_licenses:
 skipped += 1
 continue # skip disallowed license
 
 # Extract article details
 article_data = {
 "pmcid": f"PMC{batch_ids[idx]}",
 "title": "",
 "abstract": "",
 "full_text": "",
 "publication_date": "",
 "authors": []
 } 

 # Extract title
 title_elem = article.find(".//article-title")
 if title_elem is not None:
 article_data["title"] = "".join(title_elem.itertext()).strip()

 # Extract abstract
 abstract_parts = article.findall(".//abstract//p")
 if abstract_parts:
 article_data["abstract"] = " ".join("".join(p.itertext()).strip() for p in abstract_parts)
 
 # Extract publication date
 pub_date = article.find(".//pub-date")
 if pub_date is not None:
 year = pub_date.find("year")
 month = pub_date.find("month")
 day = pub_date.find("day")
 
 date_parts = []
 if year is not None:
 date_parts.append(year.text)
 if month is not None:
 date_parts.append(month.text)
 if day is not None:
 date_parts.append(day.text)
 
 article_data["publication_date"] = "-".join(date_parts)

 # Extract authors
 author_elems = article.findall(".//contrib[@contrib-type='author']")
 for author_elem in author_elems:
 surname = author_elem.find(".//surname")
 given_names = author_elem.find(".//given-names")
 
 author = {}
 if surname is not None:
 author["surname"] = surname.text
 if given_names is not None:
 author["given_names"] = given_names.text
 
 if author:
 article_data["authors"].append(author)
 
 # Extract full text (combining all paragraphs)
 body = article.find(".//body")
 if body is not None:
 paragraphs = body.findall(".//p")
 article_data["full_text"] = " ".join("".join(p.itertext()).strip() for p in paragraphs)

 f.write(json.dumps(article_data) + "\n")
 saved += 1
 print(f"Saved batch {i//batch_size + 1}")

 print(f"Downloaded {saved} articles to {out_file}, {skipped} articles removed by license filter")

Now you can call your function with your query to create your corpus. For example:

# Install packages if needed
pip install python-dotenv requests

query = 'bacterial pneumonia treatment'
max_results = 500
batch_size = 50

download_pmc_articles(query, max_results, batch_size)

And that’s it! Now all your articles are saved in a jsonl file and ready to be processed for RAG

What’s Next: Preparing the Data for RAG

In Part 2, we’ll take the domain-specific corpus you just built and use it to power a Retrieval-Augmented Generation (RAG) system — grounding your LLM in real evidence to reduce hallucinations and improve trust.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Hallucinations in Healthcare LLMs: Why They Happen and How to Prevent Them

Author(s): Marie

Hallucinations in Healthcare LLMs: Why They Happen and How to Prevent Them

Building Trustworthy Healthcare LLM Systems — Part 1

Introduction

What Are Hallucinations in LLMs?

Why They Matter in Healthcare

Main Types of Hallucinations

1. Factual Errors

2. Fabrications

3. Misinterpretations

Why LLMs hallucinate

How Can We Address Hallucinations?

Mitigation Strategies

Hallucination Detection

Some interesting open-access publications to go a bit further:

Code Walkthrough: Downloading Medical Research from PubMed Central

1. Setup and Environment

2. Search PMC

3. Fetch and Parse Articles

What’s Next: Preparing the Data for RAG

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Hallucinations in Healthcare LLMs: Why They Happen and How to Prevent Them

Author(s): Marie

Hallucinations in Healthcare LLMs: Why They Happen and How to Prevent Them

Building Trustworthy Healthcare LLM Systems — Part 1

Introduction

What Are Hallucinations in LLMs?

Why They Matter in Healthcare

Main Types of Hallucinations

1. Factual Errors

2. Fabrications

3. Misinterpretations

Why LLMs hallucinate

How Can We Address Hallucinations?

Mitigation Strategies

Hallucination Detection

Some interesting open-access publications to go a bit further:

Code Walkthrough: Downloading Medical Research from PubMed Central

1. Setup and Environment

2. Search PMC

3. Fetch and Parse Articles

What’s Next: Preparing the Data for RAG

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement