Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

The Future is Transparent: 3 Shifts in GenAI Explainability and Self-Justifiability
Latest   Machine Learning

The Future is Transparent: 3 Shifts in GenAI Explainability and Self-Justifiability

Last Updated on October 4, 2025 by Editorial Team

Author(s): Mohit Sewak, Ph.D.

Originally published on Towards AI.

The Future is Transparent: 3 Shifts in GenAI Explainability and Self-Justifiability
We’ve built the most powerful tools in history, but we can’t always see how the magic happens.

The Chef in the Black Box

Picture this. You’re in the world’s most advanced hospital. A patient is critical, and the new AI super-doctor, “GPT-Cure,” analyzes a mountain of data and instantly prescribes a novel, life-saving treatment protocol. The human doctors are stunned. It’s brilliant. But before they administer it, they ask a simple, career-saving question:

“Okay, why this treatment? What’s your reasoning?”

GPT-Cure just… whirs. Its screen remains blank. It has given you the what, but it can’t give you the why.

This, my friend, is the trillion-dollar problem at the heart of the AI revolution. We’ve built the most powerful tools in human history, but they are fundamentally “black boxes.” Their genius comes from a complexity so vast that even their creators can’t fully peek inside to see how the magic happens.

The AI is a brilliant chef, but it’s cooking inside a black box. You can’t trust the dish if you can’t see the kitchen.

Think of it like a brilliant but silent Michelin-starred chef. He creates a masterpiece dish that could win awards. You can taste the result, and it’s divine. But you have no idea what ingredients he used, what steps he followed, or if he washed his hands. You can’t replicate it, you can’t debug it if someone gets sick, and you can’t be sure it’s safe for the person with a deadly peanut allergy.

This opacity isn’t just a quirky feature; it’s a direct barrier to trust, safety, and accountability. And the global response to this challenge — the field of Explainable AI (XAI) — is undergoing a massive transformation. We’re in the middle of three seismic shifts that are changing the game from just explaining a decision after the fact to building AI that is transparent and accountable from the ground up.

Let’s stir the pot and see what’s cooking.

“The great enemy of knowledge is not ignorance, it is the illusion of knowledge.” — Stephen Hawking

The Stakes: The Health Inspector is Coming

So, why is everyone suddenly in a panic about our silent chef? For a while, we were happy just eating the fancy food. But now, the stakes have been raised to the moon.

The regulators have arrived. The Wild West days of AI are over, and accountability is now on the menu.

1. The Regulatory Pressure Cooker: The days of the Wild West of AI are over. The grown-ups have entered the room, and they’re carrying clipboards. Frameworks like the EU AI Act are putting legal teeth into the demand for transparency. They’re not just asking about the model’s logic; they’re demanding to see the entire supply chain — the training data, the intended purpose, the limitations (Gyevnar et al., 2023). An opaque AI is rapidly becoming a compliance and liability nightmare waiting to happen. The health inspector doesn’t care how good your soup is if you can’t prove it’s not poisoned.

2. The High-Stakes Deployment Barrier: We want to use GenAI for the big stuff: diagnosing diseases, drafting legal arguments, managing financial markets, designing critical infrastructure. But deploying a system you don’t understand in these fields isn’t just risky; it’s professionally negligent. Would you trust a bridge designed by an AI that can’t “show its work” on the physics calculations? The lack of clear, defensible reasoning is the single biggest barrier preventing GenAI from becoming a trusted professional tool instead of a fascinating novelty (Ji et al., 2023).

3. The Eroding Trust Ecosystem: We live in an age of deepfakes and rampant misinformation. If we can’t trace the provenance of AI-generated articles, images, or code, how can we trust anything? Our entire information ecosystem is at risk. Without robust accountability, the line between fact and “plausible hallucination” blurs into non-existence, poisoning the well of public trust.

ProTip: When using a GenAI tool for any serious work, always operate with “professional paranoia.” Ask yourself: If this AI is wrong, what is the worst-case scenario? Then, work backward to verify its claims using trusted, independent sources. Never let “the model said so” be your final answer.

Shift 1: From Kitchen Taster to Kitchen Architect

The first attempt to understand our silent chef was, logically, to hire tasters.

The Old Way: Explanations as an Afterthought

The first wave of XAI gave us tools like LIME and SHAP. These are brilliant post-hoc techniques. Essentially, they work from the outside in. After the chef has made the dish, these “tasters” poke and prod it, trying to figure out what ingredients were most important. They might say, “I’m detecting strong notes of saffron and a hint of paprika, which likely contributed to the final flavor profile.”

This is like trying to understand why a car crashed by only looking at the skid marks on the road. It’s clever, it gives you some clues, but you have no idea what was actually happening inside the engine when things went wrong. These methods give you an approximation of the model’s reasoning, but it’s not always a faithful one.

The New Paradigm: Transparency by Design

The new shift is revolutionary. Instead of trying to guess what’s in the dish, we’re now redesigning the kitchen to have glass walls. We’re moving toward building models that are intrinsically interpretable.

We’re moving from AI archaeologists, digging through the ruins of a decision, to AI architects, designing transparent systems from the ground up.

This is where the real fun begins. Researchers are now performing the AI equivalent of neurosurgery. In a landmark study called GAN Dissection, scientists literally went inside an image-generating AI to find the exact “neurons” that corresponded to real-world objects (Bau et al., 2019). They found the cluster of neurons responsible for “trees.” How did they know? Because they turned those neurons off, and poof — the trees vanished from the pictures. They turned them on, and trees appeared. This is a direct, causal link. It’s not guessing; it’s seeing the wiring.

Another brilliant approach is creating Concept Bottlenecks. This forces the AI to think in human-understandable terms before spitting out an answer. Imagine a medical AI analyzing a skin lesion. Instead of just jumping to “95% chance of malignancy,” a concept bottleneck model is forced to first conclude:

  1. Feature A: Asymmetrical Shape — True
  2. Feature B: Irregular Borders — True
  3. Feature C: Varied Color — True Therefore, my conclusion is…

This makes its reasoning process transparent and verifiable for a human expert (Yu et al., 2025). We’re moving from being AI archaeologists, digging through the ruins of a decision, to being AI architects, designing transparent systems from the blueprint up.

Trivia: The term “neuron” in a neural network is just a metaphor! It’s a mathematical function, not a biological cell. But researchers in “mechanistic interpretability” are finding that these functions can sometimes organize themselves to represent concepts in a way that’s eerily similar to how we think brains might work.

Shift 2: From “Here’s My Brain Scan” to “Here’s My Homework”

So, we built the glass kitchen. We can now see every neuron fire, every calculation whir. We give this incredibly detailed printout to the human doctor. Problem solved, right?

Wrong. Dangerously wrong.

The Sobering Reality: Explanations Can Backfire

A groundbreaking study delivered a gut punch to the XAI community. Researchers found that giving users detailed technical explanations often didn’t help them make better decisions. In fact, it frequently made things worse by creating “automation bias” (Bansal et al., 2020). People saw the complex, sci-fi-looking chart and thought, “Wow, this thing is smart!” and then proceeded to over-trust the AI, even when its advice was dead wrong.

It’s like this: if you’re trying to check a mathematician’s proof, a neuroscientist showing you an fMRI of the mathematician’s brain isn’t helpful. What you need is for the mathematician to show you their work on the blackboard, step-by-step.

The New Goal: Justifiability over Explainability

This brings us to the most important shift in mindset. For high-stakes domains, the goal is no longer just explainability; it’s justifiability.

We don’t need to see the AI’s brain scan. We need it to show us its homework.

We don’t need a printout of the AI’s “brain activity.” We need the AI to defend its conclusion in a language we can all understand and scrutinize. As one brilliant paper argues, a legal AI shouldn’t just say, “The defendant is liable.” It must justify this conclusion by citing specific case law, pointing to relevant statutes, and quoting evidence from the provided documents (Wehnert, 2023).

It needs to show its homework.

This reframes everything around the user. It aligns AI accountability with the standards of evidence that have governed law, medicine, and science for centuries. In my old cybersecurity days, we had a saying: “In God we trust; all others must bring data.” For AI, the new mantra is: “In models we test; all others must justify their claims.”

“The first principle is that you must not fool yourself — and you are the easiest person to fool.” — Richard P. Feynman

Shift 3: From Inspecting the Chef to Auditing the Entire Supply Chain

For years, we’ve been obsessed with the chef — the model itself. We analyzed its every move in the kitchen. But we missed the most important question:

Where did the ingredients come from?

The Old Blind Spot: The Model in a Vacuum

An AI model is a product of its training data. A model trained on a biased, toxic, or factually incorrect dataset will, unsurprisingly, produce biased, toxic, or incorrect outputs. Focusing only on the model’s logic at the moment of decision-making is like blaming the oven for a cake that tastes terrible when you used salt instead of sugar.

The New Frontier: Auditing the Entire AI Lifecycle

The final and most expansive shift is to zoom out and apply transparency to the entire ecosystem, especially the data supply chain.

A model is only as good as its training data. The new frontier is auditing the entire data supply chain, from source to synthesis.

This is where things get really “Inception”-like. We are now using GenAI to create massive amounts of synthetic data to train other AIs. This creates a terrifying risk of a feedback loop, where biases and errors are amplified with each generation. It’s like making photocopies of photocopies — the quality degrades until you’re left with a distorted mess.

Groundbreaking new research is developing methods to audit an AI model or a dataset to determine if it was trained on AI-generated data, even without seeing its internal code (Wu et al., 2025). This is a “data forensics” capability. It’s like creating a test to tell if your “farm-to-table” vegetables were grown in a field or 3D-printed in a lab. Knowing the provenance of your data is fundamental to maintaining information integrity.

ProTip: Before you trust a new AI model, look for its “Model Card” or “Datasheet” (Mitchell et al., 2019; Gebru et al., 2021). These are transparency documents that should describe what data the model was trained on, its intended uses, and its known limitations. If a vendor can’t provide one, that’s a major red flag.

The Reality Check: This Stuff is Hard

Now, before we get too carried away, let’s take a breath. This journey toward transparency isn’t all sunshine and rainbows.

True transparency at the scale of today’s foundation models is a monumental engineering challenge.

  • The Scalability Challenge: Many of the coolest mechanistic interpretability techniques work on smaller models. Dissecting a model with a few million parameters is one thing; doing it for a foundation model with trillions is like trying to map the wiring of a human brain one synapse at a time. It’s a monumental challenge.
  • The “Force of Nature” Debate: Can we ever fully explain these systems? Some researchers argue that at a certain complexity, AI might become more like the weather. We can predict it, manage its risks, and build shelters, but we can’t explain the movement of every single water molecule in a hurricane (Nakao, 2025). This suggests we need to focus as much on robust risk management as we do on perfect explanation.
  • The Quest for a Unified Framework: Right now, XAI is a bit like a collection of specialized tools. The tool for explaining an image generator is different from the one for a language model. The field is still searching for the “Swiss Army knife” — universal principles of explainability that apply everywhere.

The Path Forward: Your marching orders

So, what does this all mean for you?

  • For Policymakers & Regulators: Your definition of “transparency” must evolve. Stop focusing only on the algorithm. Demand accountability for the entire data lifecycle. Mandate data provenance reports and the right to audit synthetic data ecosystems, just as the EU AI Act is beginning to do.
  • For Executives & Strategists: Change the questions you ask your AI vendors. Don’t just ask, “Is your AI explainable?” That’s a meaningless yes/no question. Ask, “Is it justifiable? Can it cite its sources? Can you produce an audit trail for the training data that will stand up to scrutiny in our industry?” Demand the recipe, not just a free sample.
  • For Researchers & Developers: The future is in building glass kitchens, not designing better keyholes for black boxes. Prioritize research in intrinsically interpretable architectures, human-centric evaluation benchmarks, and robust data auditing tools. A post-hoc fix for an opaque system will increasingly be seen as a temporary patch, not a long-term solution.

The Post-Credits Scene

The conversation around AI transparency has finally grown up. We are moving past the simplistic desire to “open the black box” and toward a sophisticated, multi-layered strategy for building trust.

The three shifts — from post-hoc fixes to intrinsic design, from technical explanations to human-centric justification, and from a narrow model-centric view to a broad ecosystem-wide audit — are the pillars of this new era. They are our best hope for building a future where generative AI isn’t just a powerful and mysterious oracle, but a reliable, safe, and accountable partner in solving humanity’s biggest challenges.

Now, who wants more tea? The next pot is brewing.

References

Shift 1: Intrinsic Interpretability & Mechanistic Understanding

  • Bau, D., Zhu, J.-Y., Strobelt, H., Zhou, B., Tenenbaum, J. B., Freeman, W. T., & Torralba, A. (2019). GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. In International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1811.10597
  • Jahanian, A., Chai, L., & Isola, P. (2020). On the “Steerability” of Generative Adversarial Networks. In International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1907.07171
  • Yu, Z., et al. (2025). Interpretable Generative Models through Post-hoc Concept Bottlenecks. arXiv preprint. https://arxiv.org/abs/2503.19377
  • Zhang, R., Eslami, S. M. A., & D’Souza, F. R. (2022). Diffusion Visual Counterfactual Explanations. In Advances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/2210.11841

Shift 2: Human-Centricity, Justification, & Legal Frameworks

  • Bansal, G., Wu, T., Zhou, J., Fok, R., Nushi, B., Kamar, E., Ribeiro, M. T., & Weld, D. S. (2020). Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance. arXiv preprint arXiv:2006.14779. http://arxiv.org/pdf/2006.14779v3
  • Gyevnar, B., Ferguson, N., & Schafer, B. (2023). Bridging the Transparency Gap: What Can Explainable AI Learn From the AI Act? arXiv preprint arXiv:2302.10766. http://arxiv.org/pdf/2302.10766v5
  • Kadavath, S., Conerly, T., Askell, A., Henighan, T., Drain, D., Perez, E., Schurr, N., DasSarma, N., McMain, E., Kaplan, J., Amodei, D., & McCandlish, S. (2022). Language Models (Mostly) Know What They Know. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://arxiv.org/abs/2207.05221
  • Wehnert, S. (2023). Justifiable Artificial Intelligence: Engineering Large Language Models for Legal Applications. arXiv preprint arXiv:2311.15716. http://arxiv.org/pdf/2311.15716v1

Shift 3: Ecosystem Audits & Data Provenance

  • Hase, F., et al. (2024). Multi-Level Explanations for Generative Language Models. arXiv preprint arXiv:2403.14459. https://arxiv.org/abs/2403.14459
  • Wu, Y., Yang, Z., Shen, Y., Backes, M., & Zhang, Y. (2025). Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications. arXiv preprint. http://arxiv.org/pdf/2502.00808v1

Foundational Surveys & Benchmarks

  • Ahmed, S. Q., Ganesh, B. V., P, J. B., Selvaraj, K., Devi, R. N. P., & Kappala, S. (2025). BELL: Benchmarking the Explainability of Large Language Models. arXiv preprint. http://arxiv.org/pdf/2504.18572v1
  • Nakao, Y. (2025). Accountability of Generative AI: Exploring a Precautionary Approach for “Artificially Created Nature”. arXiv preprint. http://arxiv.org/pdf/2505.07178v1
  • Zhao, H., Chen, H., Yang, F., Liu, N., Deng, H., Cai, H., … & Du, M. (2023). Explainability for Large Language Models: A Survey. arXiv preprint arXiv:2309.01029. http://arxiv.org/pdf/2309.01029v3

Disclaimer: The views and opinions expressed in this article are my own and do not necessarily reflect the official policy or position of any past or present employer. This article was drafted with the assistance of generative AI, which was used for research, summarization, and brainstorming. The images in this article were generated using AI. This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License (CC BY-ND 4.0).

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.