The End of Prompt Engineering? Stanford’s Self-Improving AI Learned Clinical Reasoning on Its Own
Author(s): Marie Humbert-Droz, PhD Originally published on Towards AI. Stanford’s Agentic Context Engineering lets models reflect, learn, and build their own playbook. I tested it on clinical lab data — and watched it teach itself temporal reasoning. As we saw in my …
Can Local AI Keep Up with the Cloud? I Tested 8 Models on Clinical Data
Author(s): Marie Humbert-Droz, PhD Originally published on Towards AI. I ran OpenAI, Anthropic, and local models head-to-head on clinical note extraction. Here’s what I found. After setting up my local clinical abstraction agent and testing its ability to call tools, I wanted …
Do AI Agents Really Use the Tools You Build for Them? I Tested It.
Author(s): Marie Humbert-Droz, PhD Originally published on Towards AI. Testing tool coverage in local agents and how to improve compliance. I thought my healthcare AI agent would call my lab-checking tool every time it encountered lab values. Instead? Only 1 out of …
I Built a Clinical AI Agent — and It Skipped the Tools I Gave It
Author(s): Marie Humbert-Droz, PhD Originally published on Towards AI. An evaluation of tool coverage in local healthcare agents, with a simple fix. I thought my healthcare AI agent would call my lab-checking tool every time it encountered lab values. Instead? Only 1 …
I Built a Local Clinical AI Agent from Scratch — Here’s How
Author(s): Marie Humbert-Droz, PhD Originally published on Towards AI. How I wired GPT-OSS with custom tools to make clinical data actually usable. In my last experiment, I ran OpenAI’s new local model on my laptop and it extracted JSON from clinical notes …
From Messy EHRs to 30-Day Readmission Predictions: Benchmarking 4 ML Models
Author(s): Marie Humbert-Droz, PhD Originally published on Towards AI. Patient-level splits, imputation hacks, and interpretability tips for real-world healthcare AI. In Part 1, we explored why explainability matters in healthcare AI and introduced our 30-day readmission prediction model. We discussed the critical …
Part 1: Preprocessing MIMIC-IV for Readmission Prediction
Author(s): Marie Humbert-Droz, PhD Originally published on Towards AI. Kicking off a hands-on series on building explainable AI models for healthcare In my last series, we tackled a critical question: How do we detect hallucinations in large language models built for clinical …
From Black Box to Dashboard: How We Built a Transparent Interface for Healthcare AI
Author(s): Marie Humbert-Droz, PhD Originally published on Towards AI. Real-Time Trust Signals for Healthcare LLMs — Part 4 of the Building Trustworthy Healthcare LLM Systems Series A system that knows when it’s wrong is a step forward. A system that shows you …
Detecting Hallucinations in Healthcare AI
Author(s): Marie Humbert-Droz, PhD Originally published on Towards AI. Three Safety Layers Every Medical RAG System Needs In our previous posts, we explored why hallucinations occur in healthcare LLMs and built a basic RAG system using medical literature from PubMed Central. While …
How to Build a RAG System for Healthcare: Minimize Hallucinations in LLM Outputs
Author(s): Marie Humbert-Droz, PhD Originally published on Towards AI. Building Trustworthy Healthcare LLM Systems — Part 2 In our previous post, we explored why hallucinations occur in Large Language Models and the particular risks they pose in healthcare settings. We also set …
I Ran OpenAI’s New Open Model on My Laptop to Extract Medical Data — Here’s What Happened
Author(s): Marie Humbert-Droz, PhD Originally published on Towards AI. Testing privacy-first healthcare AI with OpenAI’s first open-weight models OpenAI just released its first family of open-weight models, and I couldn’t resist testing them on one of healthcare’s trickiest problems: extracting structured data …
I Ran OpenAI’s New Open Model on My Laptop to Extract Medical Data — Here’s What Happened
Author(s): Marie Humbert-Droz, PhD Originally published on Towards AI. Testing privacy-first healthcare AI with OpenAI’s first open-weight models OpenAI just released its first family of open-weight models, and I couldn’t resist testing them on one of healthcare’s trickiest problems: extracting structured data …