Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Unlock the full potential of AI with Building LLMs for Productionβ€”our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

Building Trustworthy AI: Interpretability in Vision and Linguistic Models
Artificial Intelligence   Computer Vision   Data Science   Latest   Machine Learning

Building Trustworthy AI: Interpretability in Vision and Linguistic Models

Last Updated on October 31, 2024 by Editorial Team

Author(s): Rohan Vij

Originally published on Towards AI.

Building Trustworthy AI: Interpretability in Vision and Linguistic Models

Photo by Arteum.ro on Unsplash | What thoughts lie behind that eye?

The rise of large artificial intelligence (AI) models trained using self-supervised deep learning methods presents a dangerous situation known as the AI β€œblack box” problem, wherein it is impossible to understand what methods and how a neural network learns what it does. The exponential growth of computational power, availability of massive datasets, and advancements in deep learning algorithms have enabled the development of AI models with extremely large scale and capabilities. This problem is not new in the field of cognitive sciences β€” the human brain is also considered a black box, as it is impossible to understand how the human brain learns on a fundamental level. Having ununderstandable models performing crucial tasks in business or other high-impact applications is potentially dangerous. It is impossible to determine whether the model has jeopardized decision-making or content-generating capabilities until it does eventually generate false content or make a bad decision. Model users should be able to understand how their data is being used to produce a result. This paper will explore the efficacy of solutions to this problem that attempt to create β€œinterpretable machine learning” in the fields of computer vision and large language models. It will assess the effectiveness of these interpretable machine learning approaches in improving the transparency and accountability of AI systems in real-world applications.

Interpretability in Computer Vision (CV) Models

β€œIf AI enables computers to think, computer vision enables them to see, observe and understand” (IBM, n.d.). Computer vision uses deep learning to look at image data, find patterns, and then identify one image from another. Computer vision models are based on convolutional neural networks (CNNs), which consist of layers used to detect various features of an input image. CNNs have sliding matrix windows that slide along the pixels of an image to capture spatial information β€” this is known as a convolutional operation. Each layer in a CNN is intended to detect a certain feature of the input image. As each successive layer receives information from the previous layer, the model is able to build a feature map that combines all of the important features of the image. Layers at earlier stages of the CNN might be responsible for identifying higher-level features such as edges or colors, while deeper layers use the result of the prior ones to detect more complex patterns (Craig, 2024). The increased complexity and application of CNNs raise concerns about how interpretable they are. As more and more layers are added to CNNs, the ability to understand what patterns the CNN is actually identifying that lead it to make a decision is lost. Kevin Armstrong (2011), columnist at β€œNot a Tesla App,” noted that Tesla’s Full Self-Driving v12:

is eliminating over 300,000 lines of code previously governing FSD functions that controlled the vehicle, replaced by further reliance on neural networks. This transition means the system reduces its dependency on hard-coded programming. Instead, FSD v12 is using neural networks to control steering, acceleration, and braking for the first time. Up until now, neural networks have been limited to detecting objects and determining their attributes, but v12 will be the first time Tesla starts using neural networks for vehicle control.

Tesla’s dramatic shift away from hard-coded rules to having their algorithm for self driving reliant almost entirely on neural networks is concerning in regards to the interpretability and accountability of the self-driving system. If an accident were to occur with FSD v12, it would be harder for Tesla to determine what part of the system was responsible for making the erroneous decision. Without being able to understand how these models reason to arrive at their final decision, they are harder to trust β€” especially in high-stakes environments such as driving a heavy electric vehicle.

LIME

LIME, short for Local Interpretable Model-agnostic Explanations, is a generalized technique that can be used to understand the reasoning behind any classifier model. LIME is best described as a probe for any model β€” it creates slight variance in the original data to understand the relationship between those changes and the model’s final output. LIME allows its users to change specific features of the input to the model, so humans can decide what features are the most important or most likely to overfit and test those to see their impact in the model. LIME outputs a list of explanations that represent each input features’ contribution to the final output of the classifier output (Ribero et al., 2016).

β€œExplaining individual predictions. A model predicts that a patient has the flu, and LIME highlights the symptoms in the patient’s history that led to the prediction. Sneeze and headache are portrayed as contributing to the β€œflu” prediction, while β€œno fatigue” is evidence against it. With these, a doctor can make an informed decision about whether to trust the model’s prediction” (Ribero et al., 2016).

A good example of using LIME in CV is to understand the reasoning behind a model’s prediction:

β€œRaw data and explanation of a bad model’s prediction in the β€˜Husky vs Wolf’ task” (Ribero et al., 2016).

The creators of LIME ran an experiment with 27 graduate students who had taken an ML course at some point in their academic careers. In the first trial, they provided each of the 27 students with 10 images of a wolf classification model. 8 of the images were classified correctly as wolves, where the other two were misclassified: one was classified as a wolf even though it was a dog with snow in the background, and one was classified as a wolf even though it was a wolf with no snow in the background. 10 out of 27 of the students trusted the model, with 12 out of 27 stating that the presence of snow is a potential feature taken into account by the model. In a second trial with the same 27 participants, an explanation (as in Figure 2) was provided for each model’s prediction. After the second trial, only 3 students trusted the model, with 25 citing the presence of snow as a potential feature (Ribero et al., 2016).

Grad-CAM

Grad-CAM, or Gradient Weighted Class Activation map, analyzes the last convolutional layer of a CNN to determine what pixels provided the most weightage to the model’s final result. This is done through a 5-step process (Ahmed, 2022):

  1. The model is traditionally trained on a set of images to get its predictions and the corresponding weights of the last convolutional layer.
  2. With the model’s best classification guess (like β€œdog,” β€œcat,” etc. β€” whatever classification has the highest probability assigned by the network), Grad-CAM computes the gradient of the result compared to the weights/activations of the last convolutional layer. For instance, if the model predicts that the image contains a dog, Grad-CAM computes how minute changes in the model’s activations (features ranging from as simple to edges/textures to patterns that make up a dog’s nose) would affect that classification. Like LIME, this allows Grad-CAM to identify which features in the image were the most important in leading the model to predict its classification. Unlike LIME, however, Grad-CAM probes the model by looking at the last convolutional layer and understanding how changes there affect the final result, while LIME changes the input image to the model to understand how macro changes affect the final result.
  3. By looking at the calculations from the last step, Grad-CAM identifies what parts of the last image convolutional layer were important to deciding the model’s classification.
  4. Each neuron in the final convolutional layer’s gradient (i.e what was calculated in step 2 β€” when we increase this activation by n number, how much does the classification change? The higher the change, the higher the gradient, and the more important that neuron is to the final classification) is multiplied to every pixel involved with that neuron channel. As a result, pixels that contribute to the final classification the most are highlighted the most, whereas pixels that negatively contribute to the final classification are not taken into account and are highlighted accordingly. This creates a heatmap, allowing human users to see what parts of the image were the most critical to the model’s classification decision.
  5. This β€œimportance value” of the pixels is normalized to be between 0 and 1, allowing for better visualization when the heatmap is overlaid on top of the final image.

Using the following image:

The image being applied to the Resnet50 model (with 50 convolutional layers) with GradCAN to understand the reasoning behind its classification (Ahmed, 2022).

The ResNet50 model (CNN models with 50 convolutional layers) classifies the image with two categories: β€˜sports_car’ and β€˜racer.’

Visualizing the activations of the last layer in relation to the β€˜sports_car’ classification:

The Grad-CAM heatmap for the β€˜sports_car’ class (Ahmed, 2022).

The neurons of the last layer are clearly activated by the front parts of the two cars. For further exploration, putting an image of a non-sporty car (i.e a Honda Civic) could be useful to explore how the model differentiates between typical vehicles and high-performance vehicles.

Visualizing the activations of the last layer in relation to the β€˜racer’ classification:

The Grad-CAM heatmap for the β€˜racer’ class (Ahmed, 2022).

The same pixels around the cars are highlighted for the β€˜racer’ classification, even though an individual can still be classified as a racer without being near cars. While it is possible (and even good) that the model is able to use the context around an object to determine its classification, the model not strongly highlighting any of the pixels on the person near the cars creates distrust in some of the model’s classifications. If the person in the middle of the cars was not present, would the model still identify the image in the β€˜racer’ class? If the cars were not present, would the model still identify the image in the β€˜racer’ class? In a nutshell, Grad-CAM provides a window into the decision-making process of CV models by allowing human users to understand the pixels in an image that influence its decisions.

Conclusion & Interpretability with Large Language Models (LLMs)

A common argument against explainable AI (techniques like LIME, Grad-CAM, and SHAP) is that they explain which inputs affect the output and by how much (by input perturbation, as seen in LIME, or by analyzing the last convolutional layer, as seen in Grad-CAM), but not the underlying reasoning (the why) behind its classification. According to Tim Kellog (2023), ML Engineering Director at Tegria, when a model’s explanation β€œdoesn’t match your mental model, the human urge is to force the model to think β€˜more like you.’” The purpose of this paper is to explore AI interpretability for its purpose in helping humans trust it more; humans might tend to distrust AI even more if they see it making decisions based on a decision process that they themselves do not follow:

Jaspars and Hilton both argue that such results demonstrate that, as well as being true or likely, a good explanation must be relevant to both the question and to the mental model of the explainee. Byrne offers a similar argument in her computational model of explanation selection, noting that humans are model-based, not proof-based, so explanations must be relevant to a model (Miller, 2019).

People are far more likely to trust explanations if they match their current way of thinking β€” not if they invent a new thought process (even if it is still correct). Kellog (2023) remarks:

I had seen this phenomenon a lot in the medical world. Experienced nurses would quickly lose trust in an ML prediction about their patient if the explanation didn’t match their hard-earned experience. Even if it made the same prediction. Even if the model was shown to have high performance. The realization that the model didn’t think like them was often enough to trigger strong distrust.

Through understanding trust in AI through the lens of sociology it can be observed that humans want to trust AI like they trust another human β€” they want to be able to probe it to find out more and understand how it reasons. Large language models (LLMs) like ChatGPT or Claude happen to act more human than any other type of model thus far. They can be probed to explain their thought process, asked for more information, and fact-check themselves.

A common argument against LLMs is that they cannot always be trusted β€” which is a nonissue if society reconsiders its interactions with LLMs to be similar to an individual’s interactions with real people. It would be naive to believe whatever someone says to you without doing any internal fact/logic-checking. This same level of constantly questioning the information society gives in the media or by other people can and should also be applied to information received from LLMs. In a quest to make AI as trustable as possible by making it as human as possible, users must acknowledge that this also makes AI susceptible to the same β€œhallucinatory” or made-up information that humans can propagate.

To increase society’s trust in AI, it must be designed to act more human β€” but not a human that spreads rumors or makes up facts, but one that is consistent with its thoughts, viewpoints, and presentation of information, and one that is able to cite its sources.

  1. Consistency in AI is an issue that has already been solved with the temperature variables, which controls the β€œrandomness” of the LLM’s response. LLMs are set algorithms with the same weights that mathematically provide the same output to every input. However, commonly used-models like GPT often have a temperature setting other than 0 which forces the model to randomly use a word that is not the most probable β€” introducing randomness and β€œcreativity” to the LLM’s writing (Prompt Engineering Guide, 2024). If LLMs were allowed to be far more deterministic (provide the same response for every input), it would be far easier for humans to trust them because they would be far more reliable to use.
  2. It is possible to use Retrieval Augmented Generation (RAG), which expands the knowledge base of an LLM for specific responses. Microsoft’s Copilot has the ability to actively search Bing during a response and cite the websites it retrieves information from (Microsoft, n.d.). While still in its infancy, LLMs can use RAG in a reliable way to cite all information they provide from external sources. LLMs are simply language algorithms that can be fed more information and glue that information together β€” it is not necessary for them to always fallback to their training data to get information if they can be given it during their response.

Interpretability might not be what society is looking for in AI; characteristics of humanity might be far more important than raw explainability for society to truly adopt and trust AI in dictating important decisions.

Thank you for reading!

References

Ahmed, I. (2022, April 5). Interpreting Computer Vision Models. Paperspace Blog. https://blog.paperspace.com/interpreting-computer-vision-models/

Armstrong, K. (2023, November 24). Tesla FSD v12 Rolls Out to Employees With Update 2023.38.10 (Update: Elon Confirms). Not a Tesla App. https://www.notateslaapp.com/news/1713/tesla-fsd-v12-rolls-out-to-employees-with-update-2023-38-10

Awati, R. (2022, September). What is convolutional neural network? SearchEnterpriseAI. https://www.techtarget.com/searchenterpriseai/definition/convolutional-neural-network

Computer Vision. (2019). IBM. https://www.ibm.com/topics/computer-vision

Kellogg, T. (2023, October 1). LLMs are Interpretable β€” Tim Kellogg. Timkellogg.me. https://timkellogg.me/blog/2023/10/01/interpretability

Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267, 1–38. https://doi.org/10.1016/j.artint.2018.07.007

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, February 16). β€œWhy Should I Trust You?”: Explaining the Predictions of Any Classifier. ArXiv.org. https://arxiv.org/abs/1602.04938

Saravia, E. (2024). LLM Settings β€” Nextra. promptingguide.ai. https://www.promptingguide.ai/introduction/settings

Your AI-Powered Copilot for the Web. (n.d.). microsoft.com. https://www.microsoft.com/en-us/bing

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓