Your Postcode Is Deciding Your Care. I Built a Pipeline to Prove It.
Author(s): Yusuf Ismail Originally published on Towards AI. Picture this. It’s 2 am. You’re on a trolley in a hospital corridor. Not a ward. A corridor. Fluorescent lights, the smell of disinfectant, the sound of a ward that’s full somewhere behind a …
Part 19: Data Manipulation in Statistical Profiling
Author(s): Raj kumar Originally published on Towards AI. Statistical profiling sits at the intersection of data validation and analytical insight. In banking operations, descriptive statistics are not academic exercises. They are diagnostic tools that surface anomalies in payment flows, quantify credit portfolio …
Building ML in the Dark: A Survival Guide for the Solo Practitioner
Author(s): Yuval Mehta Originally published on Towards AI. Photo by Boitumelo on Unsplash No GPU cluster. No data team. No ML platform. Here’s what actually ships. Most ML content is written for teams that have things. A labelled dataset. An MLOps platform. …
Part 16: Data Manipulation in Data Validation and Quality Control
Author(s): Raj kumar Originally published on Towards AI. Data quality issues are the silent killers of production systems. A single malformed record can crash your pipeline. A gradual drift in data distributions can slowly degrade model performance. Missing values that sneak through …
Part 9: Data Manipulation in Data Merging and Joins
Author(s): Raj kumar Originally published on Towards AI. Every analysis that combines data from multiple sources faces the same fundamental question: how should these datasets align? Which records match? What happens when they don’t? These aren’t just technical decisions. They shape what …
Part 6: Data Manipulation in String and Text Processing
Author(s): Raj kumar Originally published on Towards AI. If you’ve ever worked with real-world data, you know the struggle. Names come in all caps when they should be title case. Email addresses have trailing spaces. Phone numbers show up in a dozen …
Why AI in CRM Fails Without a Warehouse-First Architecture
Author(s): Clarencer R. Mercer Originally published on Towards AI. When Model Accuracy Is Not Enough In Part 1 of this series, we explored how a warehouse-first composable CDP restores architectural control to modern CRM systems. In Part 2, we examined the Identity …
What I Learned Building a Job-Matching System in Hebrew: Reversed Text, I/O Psychology, and When to Ditch the LLM
Author(s): Tom Ron Originally published on Towards AI. This is Part 2 of a series on building job-matching systems. Part 1 covered why job matching is fundamentally harder than it looks. This post is the technical deep-dive. In Part 1, I wrote …
What I Learned Today About Apache Spark Architecture
Author(s): Abinaya Subramaniam Originally published on Towards AI. Apache Spark often feels magical when we first start using it. We write a few lines of PySpark code, hit run, and suddenly terabytes of data are being processed in seconds. But behind this …
From raw text to training gold: How to collect and prepare data for custom LLMs
Author(s): Laura Verghote Originally published on Towards AI. Practical guidance for building clean, domain-relevant datasets for fine-tuning, continued pretraining, or training from scratch If you’ve worked on language models beyond a quick prototype, you already know where the real bottleneck is. It’s …
Debugging Spark at Scale: Slow to Shipped
Author(s): Diogo Santos Originally published on Towards AI. A stepwise playbook to locate the true bottleneck — I/O, shuffle, Python, or memory — and fix it with minimal changes and hard measurements. If you’re here, you’ve got a Spark job that should …
Langfuse: A Technical Guide to Observability in LLM Applications
Author(s): Rachit Originally published on Towards AI. Langfuse: A Technical Guide to Observability in LLM Applications Large Language Models (LLMs) are incredibly powerful, but they’re also stochastic black boxes. You can design the perfect prompt, and yet in production, responses may vary …
Mastering Python Data Pipelines in 2025
Author(s): Code with Margaret Originally published on Towards AI. How I built scalable ETL workflows without losing my sanity Over the past four years, I’ve built more Python data pipelines than I can count. Some of them ran beautifully; others… well, let’s …
Mastering RAG: Precision from Table-Heavy PDFs
Author(s): Vicky’s Notes Originally published on Towards AI. I just wrapped a customer pilot where “documents” really meant PDFs stuffed with tables, footnotes, and odd layouts. The goal sounded simple: answer two kinds of questions reliably. For semantic questions like “What changed …
AWS Lambda: Serverless Application Is Like Cooking Pasta With a Magic Machine!!!
Author(s): Henry Originally published on Towards AI. How AWS Lambda Powers AI & Data Engineering AWS Lambda is a serverless compute service that runs your code, so you do not need to spend extra effort to maintain the server. It is like …