Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

Uncovering the Dark Side of AI: Why Our Intelligent Systems Are More Vulnerable Than We Think
Artificial Intelligence   Latest   Machine Learning

Uncovering the Dark Side of AI: Why Our Intelligent Systems Are More Vulnerable Than We Think

Last Updated on May 24, 2023 by Editorial Team

Author(s): Christian Kruschel

Originally published on Towards AI.

Artificial Intelligence (AI) has rapidly advanced in recent years and found extensive applications in many fields. These intelligent systems have remarkable capabilities to solve complex problems, improve efficiency, and achieve superior performance. However, they are not immune to vulnerabilities. Specifically, deep neural networks (DNNs) — a subset of AI and machine learning (ML) systems — have some essential weaknesses that need to be identified for effective risk management. By uncovering these vulnerabilities and addressing the “Achilles heel” of AI, this paper aims to provide valuable insights and recommendations for developing more resilient and trustworthy AI systems.

Data Incompleteness

In general, when executing a DNN, the input data is projected into a feature space. In this latent space, the decision boundaries to the labels of the dataset are determined during training, at least in classification tasks. The determination is an approximation of the real decision boundaries and depends on the quality and coverage of the data. Complete coverage of the data space is often not possible due to the high dimensionality of the input data, and can also not be quantified due to physical limits in the combination of individual input variables. Additionally, as shown in DNNs with ReLU activation, the distance between all data points is reduced when projected into the latent space. Thus, data points lie close to the decision boundaries. A natural disturbance, such as unfavorable lighting conditions on a stop sign that is not present in the data, can therefore already lead to misclassification.

Source: Wang et al., Def-IDS: An Ensemble Defense Mechanism Against Adversarial Attacks for Deep Learning-based Network Intrusion Detection

The graphic from work by Wang et al. shows the difference between the real and the modeled decision boundary. Adversarial samples in red can be generated via an attack, but the same concept may hold for random perturbation or systematic failure due to missing data coverage.

A common challenge is also that the training data differs significantly from the inference data with which the DNN is confronted in real operations. This data shift can occur, for example, during training with pure simulation data or wear/change of sensors. The data space is also incomplete in this case. Attackers can exploit this characterization of the latent space: in evasion attacks, adversarial perturbations are constructed that deliberately and with structured noise disrupt input data, resulting in misclassification. The noise is so small that it is not perceptible to the human eye — the real decision boundary is different from the approximate one. The success of such attacks depends on the information an attacker has about the data and the model. Assuming an attacker has no information, this is referred to as a black-box attack. In an adaptive variant, with access to an API of the DNN, both information about the model can be extracted and the functionality of the decision boundaries can be approximated by a surrogate model. In these exploratory attacks, the transfer of adversarial perturbations generated with the approximated model is often sufficient to disrupt the actual DNN. Alternatively, in a query-based attack, an adversarial perturbation can be optimized and generated via the API. This type of attack also offers the possibility to extract information about the training data without it being stored in the model. This problem often arises in federated learning, where multiple parties train an entire DNN, but want to keep the data private for privacy reasons. Generally, adaptive attacks present a major challenge due to their adaptability to mitigation strategies.

Another type of uncertainty arises from the iterative training process of the DNN. If a data sample or class is underrepresented, it is neglected in the training and thus in the determination of the decision boundaries, and is lost in the statistics of the training error. Often, especially in safety-critical systems, the challenge is that safety-critical data samples are underrepresented.

DNN as a black-box system

Algorithmically, a DNN is usually a recurrent sequence of simple operations. In its simplest form, a feedforward network, an operation block consists of applying a non-linear function to an affine transformation. While this creates a very simple structure, the high repetition rate of these operation blocks results in a highly complex model, whose intuitive functionality is usually impossible to understand. In traditional software development, the standard is that the requirements for the system must be reflected one-to-one in the software code. With DNNs, this mapping is generally no longer possible; requirements must be implemented through training data, model structure, or the training algorithm. Verifying the requirements presents a particular challenge: many requirements cannot be described in the formal specification, so they cannot be tested using V&V methods. For example, the question of how to verify that stop signs are always identified as such under any lighting conditions remains open. This may be understandable on a sample-by-sample basis, but the general determination of decision boundaries is usually not understandable because the latent feature space is not easily interpretable. It is possible to use individual methods of Explainable AI to identify which components of the training data samples are relevant to decision-making. Components can be, for example, pixels in an image. This knowledge can be exploited for attacks on the system. In Data Poisoning / Backdoor Attacks, the attacker has access to the training data and can expand the data with patterns/triggers so that a false association with the label is established when determining the decision boundaries (Data Modification). Similarly, training data can also be expanded with false data (Data Injection). In both cases, the model may be faulty due to incorrect decision boundaries. Triggers can range from visually identifiable watermarks to natural-looking shadows. Another case is in inference: patch attacks can be constructed with targeted, structured stickers that, when placed next to an object, attract attention during feature extraction and thereby change the classification. This is very similar to the case of adversarial attacks, but studies show that these stickers only provoke a misclassification near the relevant components, but then work universally against natural perturbations.

The model structure of a DNN provides another way of attack. It has been shown that malware can be hidden within a DNN without affecting the quality of the model. Up to 3 bytes per parameter can be hidden in a 32-bit model. The attacker only needs to create an additional method in the executing system to extract and execute the malware.

Documentation and Reproduction of Individual Development Stages

The development of a DNN presents another challenge for documentation due to its cyclical work process. In order to provide comprehensive documentation on the development process, insights from model analysis, for example, through xAI methods, must also be taken into account. In addition, any type of changes made to the model and data must be recorded. In a dynamic work process, as required by the development of DNN systems, this can only be done through suitable automation and databases. The extraction in suitable document form concludes the list of challenges. At present, there is no standard for this topic, and companies, at best, rely on their own solutions.

To document the development process, a complete safety argumentation, preferably formulated in GSN, describing use-case-specific and model-specific vulnerabilities and their corresponding solutions is also required. Requirements must be formulated from existing vulnerabilities. These requirements typically need to be very specific and precise to demonstrate that all security risks have been addressed according to the current state of the art. However, some algorithmic details may be omitted. For example, it is important to formulate a strategy against adversarial perturbations, while how these are generated may be negligible in this case, as new types of perturbations are constantly being constructed.

To sum it up, this article discusses the vulnerabilities of DNNs used in AI and ML systems. These intelligent systems have weaknesses that need to be addressed for effective risk management. The vulnerabilities arise from incomplete data, data shifts, and the iterative training process of the DNN. Additionally, the complex nature of DNNs makes them difficult to understand and verify, which can lead to attacks on the system. It is recommended to use Explainable AI to identify relevant components of training data for decision-making and prevent attacks. This article concludes by emphasizing the need for more resilient and trustworthy AI systems.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓