Explainability in Generative AI: A Comprehensive Guide

9 min readMar 5, 2025

“To understand oneself is the beginning of wisdom.”
— Socrates

Read this article for free! I’m sharing a Friend Link so that even non-Medium members can access it: Friend Link

The GPT-4 has approximately 1.8 trillion parameters (Source: Wikipedia). Explainability in this complex trillion-parameter black-box model is very difficult. Making sense of what’s happening inside this black-box model is like trying to explain how a car engine works just by listening to the sound it makes or like trying to decode a secret language where even the dictionary updates itself every second.

Why is it difficult to comprehend the working or decision-making capabilities of these models? 1.8 trillion is 1,800 billion. Imagine if we had a person processing each of these parameters at a rate of one per second (essentially asking them to perform a complex matrix multiplication every second). It would take them 57,000 years (assuming they never took a break!). Even if all 8 billion people on Earth worked together, each calculating one parameter per second, we’d still need 2.6 days to finish!

Yet, transformer models do this in mere milliseconds.

So, how do they actually do it?

At its core, a model uses input to predict the next token that is most statistically likely. But what’s happening under the hood?

Alright, so at a high level, this is how large language models work:

Before the model can generate text, it first needs to convert words into a format it understands — numbers. This is where embeddings come in. A text embedding model translates raw text into vectors (collections of numbers) that help the model understand the contextual meaning behind the sentence — so it doesn’t confuse a financial bank with a riverbank. And yes, these vectors will have multiple dimensions. In fact, GPT-4 is believed to have 3072 dimensions! It’s like trying to understand an object by analyzing it from 3,072 different viewpoints. That’s essentially what the model does to grasp meaning.

Once the data (and your query) is embedded, the model retrieves the most relevant chunks — sets of tokens — and yes, that’s too in numbers. And then finally, based on the retrieved chunks (context), it returns a well-crafted response.

But here’s where it gets even more interesting:

It utilized the Mixture of Experts (MoE) model to optimize processing. For instance, picture an IT help desk in a big company. When an employee submits a support ticket, it doesn’t go to every IT expert in the company. Instead, a ticketing system directs it to the most relevant specialists — hardware issues go to the hardware team, software bugs to the software team, and network problems to the network engineers. This way, only the necessary experts are engaged, keeping things efficient just like MoE models do with AI processing.

Okay, now you have a high-level understanding of how these models work — though trust me, there’s so much more happening under the hood than this simple explanation. So while we know that these models process information by embedding text, retrieving relevant data, and using trillions of parameters to generate responses, the real challenge is figuring out

why they make the decisions they do?

If we can’t look inside and pinpoint exactly which parameters influenced an answer, how do we trust that the response is accurate, fair, or unbiased? This is where Explainable AI (XAI) comes in.

XAI is all about peeling back the layers of these black-box models to make their decision-making process more transparent and interpretable. But getting a trillion-parameter model to ‘explain itself’ is easier said than done. It’s not as easy as prompting, “Hey GPT, please explain your answer, assuming I am 5-year-old”.

By now, you must have probably been convinced why decoding this black-box model is such a massive challenge. Trying to interpret it is like reading a book where every letter is influenced by a million others — except the book is written in a language no one understands, and it rewrites itself as you read. Next to impossible, right?

Deep learning architectures are incredibly complex. Here knowledge is most commonly encoded in the form of very small numerical values — model parameters. They continuously adjust weights and activations across multiple layers, making it nearly impossible to trace the exact factors that led to a specific output. Understanding why it picked one response over another remains one of AI’s biggest mysteries. But don’t worry, we can do it using XAI!

Explainable AI

“Explainable Artificial Intelligence (XAI) is a field of research that helps people understand how AI algorithms make decisions and provides humans the ability of intellectual oversight over AI algorithms. It’s also known as interpretable AI or explainable machine learning.”
-Wikipedia

XAI algorithms follow the three principles of Transparency, Interpretability, and Explainability. Okay, first let’s understand what it means to have a transparent model or this property of transparency. A model is considered transparent if its inner workings — how it learns from training data and makes predictions on new data — can be clearly explained by the person who designed it. In other words, if a model’s decision-making process can be traced and understood, rather than being a ‘black box,’ it is transparent. And now moving up to interpretability and explainability. So, there’s a nuanced difference between these two terms. Interpretability is the level of understanding how the underlying (AI) technology works. While explainability is the level of understanding how the AI-based system came up with a given result.

The Need for Explainable AI

AI systems don’t always make decisions the way we expect them to — and sometimes they make bad ones. This often happens due to biases ingrained in the data they were trained on. But it’s not just about bias; AI can also learn unexpected shortcuts that optimize for its given task without truly understanding the problem.

For instance, in 2017, an AI system trained for image recognition accidentally learned to “cheat.” Instead of identifying horses based on their features, it relied on copyright tags in images that were frequently associated with horse pictures. This kind of misalignment between AI’s learning process and human expectations highlights why Explainable AI (XAI) is crucial.

Beyond image recognition, XAI plays a vital role in high-stakes domains where transparency matters:

Legal Decision-Making: AI is increasingly used in legal case analysis, but how can we trust its reasoning or why it took a certain decision? XAI helps legal professionals understand how an AI reached a verdict, ensuring fairer and more accountable outcomes.
Recruitment & Hiring: AI-driven hiring tools can unintentionally introduce bias, selecting or rejecting candidates unfairly. XAI ensures that hiring decisions are explainable and justifiable, improving fairness in the recruitment process.
Robotics: Understanding why a robot makes certain decisions can be critical, especially in autonomous systems like self-driving cars or industrial automation.

By making AI’s decision-making more transparent, XAI helps build trust, prevent unintended biases, and ensure that AI systems align with human values.

All in all, the main aim of XAI is to make the end users trust that the AI is making good decisions and understand the AI’s cognition, which will help to determine whether to trust the AI.

Explainable AI for GenAI (GenXAI)

The need for understanding AI is more crucial now than in pre-GenAI eras. One key reason is the need to verify generated content and address one of the major problems, which is ‘Hallucinations’. Similar to how humans sometimes experience false perceptions, AI generates false or misleading information.

Modern AI systems have grown increasingly complex, particularly in multi-modal applications. These systems often combine Large Language Models (LLMs) with other generative models like diffusion models, external data sources, and applications, creating intricate patterns of interaction. For instance, in ChatGPT-4, when prompted to “search the internet,” it first performs a web search, then processes the retrieved content through its generative AI model to provide a response.

To grasp how modern AI systems make decisions, we need to examine their foundational architecture,

which is Transformer!

They are now the de-facto standard for Large Language Models (LLMs).

Let’s talk about transformers and how they contribute to making the explainability of generative AI applications more complex. One of the key strengths of transformers is their flexibility. The transformer model often makes few assumptions on the structural bias of input data. Data assumptions (priors) also help reduce the amount of data needed to train a model. It learns by finding patterns in the data. If those patterns are unevenly distributed (e.g., more short sentences than long ones, or more formal than casual text), the model naturally picks up those biases. More to this, transformers have many variations too (aka X-formers), mostly involving different implementations of individual elements, such as different positional encoding strategies, attention mechanisms, and architectural modifications. While these developments have made transformer architecture more powerful, they’ve also increased the complexity of understanding how these systems make decisions, affecting the overall explainability of the model. While text-based GenAI models rely heavily on transformers, in multimodal applications, other architectures like diffusion models, variational autoencoders (VAEs), and generative adversarial networks (GANs) are often used for image generation. This architectural diversity and complex orchestration creates unique challenges for explainability. Understanding how these different components interact and contribute to the final output becomes crucial for explainability in the process.

Explainability in LLM Training process

Large Language Models (LLMs) undergo a sophisticated training process that shapes their behavior and capabilities. Understanding this process is crucial for explaining the decision-making process of the model behind the generated response.

The first stage involves pre-training the model on massive datasets.

Its like sending LLM to the world’s biggest library. Llama-2 is trained on code, Wikipedia articles, and large-scale web data from the Common Crawl Foundation. The presence and absence of explanations in the training data also impact XAI. Then techniques like self-supervised learning are used. GPT-2 uses next-word prediction. While BERT uses masked language modeling (MLM), and sentence order prediction, basically fill-in-the-blanks with random words. The goal of training is to allow the model to generalize across different types of text.

LLM is further tuned using instruction tuning, adapting a pre-trained LLM through supervised fine-tuning and optimizing its ability to follow structured prompts. Here, training data can include task instructions, input (task instance), and desired outputs. This stage is crucial for explainability because it helps us trace why a model responds in certain ways to specific prompts.

Finally, LLMs undergo alignment tuning — it’s more like teaching good manners and ethics to LLMs. Alignment criteria can be diverse, covering helpfulness, honesty, and harmlessness. Training data typically comes from humans rating LLM-generated answers or producing their own. Occasionally, more powerful and already aligned LLMs produce training data. Alternatively, Human Feedback can be used. This reward model can adjust the LLM using Reinforcement Learning — basically giving it a thumbs up or down until it learns what “good behavior” looks like.

So this whole training shapes how models make decisions about what content to generate and what to avoid. Understanding this training journey is essential for explainable AI because each stage leaves its mark on how the model thinks and responds. When we ask, “Why did the AI say that?”

the answer often lies in these training phases.

Conclusion

As AI continues to evolve, so must our ability to understand and interpret it. The journey toward explainable and trustworthy AI is ongoing, but each step brings us closer to ensuring these powerful models are not just intelligent but also transparent and aligned with our values.

Right now, though, AI models can sometimes feel like that one friend who gives brilliant advice but, when asked why, just shrugs and says, “Trust me, bro!” — which isn’t exactly reassuring when making critical decisions.

At the end of the day, an AI model is only as good as our ability to trust it.

With Explainable AI (XAI), we’re moving toward a future where AI is less of a black box and a lot more accountable. Because the goal isn’t just to build smarter AI but AI that we can actually rely on.

Here’s to a future where AI doesn’t just give us answers but also explains itself — maybe then we’d finally understand it… unlike the ending of Inception.

References:

Explainable Generative AI (GenXAI): a survey, conceptualization, and research agenda
Mixture of Experts explained simply
Computer says no: why making AIs fair, accountable and transparent is crucial
If you’re looking for a practical, hands-on experience with XAI, I highly recommend this: Introduction to Explainable AI

Stay tuned for Part 2, where I’ll dive into a range of techniques that bring explainability in generative AI to life!

If you liked this, consider supporting cry.org!

Explainability in Generative AI: A Comprehensive Guide

Explainable AI

The Need for Explainable AI

Explainable AI for GenAI (GenXAI)

Explainability in LLM Training process

Conclusion

References:

tags

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Anisha Shende

No responses yet