Adversarial Examples: Hidden threats to neural networks

a cura di Fabio Gnassi
Adversarial Examples: Hidden threats to neural networks

Although the functioning of machine learning algorithms may seem abstract and difficult to decipher, they rely on a series of structured processes that make them vulnerable to specific threats. Understanding the risks these tools are exposed to is essential for developing greater awareness in their use.

The following is an interview with Pau Labarta Bajo, a Mathematician who transitioned to a Machine Learning Engineer and later became an educator in the field of Machine Learning. Driven by his passion for problem-solving, Pau participated in the International Mathematical Olympiad, which reflects his lifelong enthusiasm for tackling complex challenges — a key aspect of machine learning. He began his career over a decade ago as a Quantitative Analyst at Erste Bank and subsequently worked as a Data Scientist at Nordeus, a mobile gaming company that is now part of Take-Two Interactive. It was at Nordeus that he encountered his first real-world machine learning project, a pivotal experience that cemented his interest in the field.

Since then, Pau has freelanced at Toptal, contributing to diverse projects ranging from self-driving cars and financial services to delivery apps, time-series predictions for online retail, and solutions for health insurance providers. Two years ago, Pau started sharing his knowledge with the community on platforms like Twitter/X and LinkedIn. His first course, The Real-World Machine Learning Tutorial, has garnered over 500 satisfied students, and he continues to share valuable free content through his social media channels.


One of the risks faced by machine learning systems is known as “Adversarial Machine Learning.” Could you explain what this involves?

The term AML (Adversarial Machine Learning) refers to a set of techniques and procedures designed to compromise the functioning of a machine learning model. Machine learning models can be applied in various fields, such as developing applications for computer vision or generating text and images. Each of these categories of models can be attacked using processes known as “Adversarial Techniques,” which alter the behaviour of a model by introducing manipulated data, referred to as “Adversarial Examples.”

The way a model is compromised depends on how these tools operate. Neural networks are “differentiable with respect to the input,” meaning even minimal variations in input data can significantly affect the quality of the generated outputs. Inputs are the raw data provided to the model, while outputs are the expected results. The model analyses and processes inputs to identify patterns, which are then associated with a specific output. A model can be attacked at two distinct stages: during the training phase or during the inference phase.

Interestingly, humans can also be deceived in similar ways. For instance, comparing the human brain to a computer vision system, both can fall victim to optical illusions, as seen when observing Escher’s paradoxical drawings.

How do these attacks work?

Firstly, it is important to distinguish between attacks occurring in the digital world and those in the physical world. Digital attacks target both the training and inference phases. The inference phase occurs when a pre-trained model is used to make predictions or decisions (outputs) based on new data (inputs). Corrupting a model during this phase involves providing it with manipulated inputs to alter its functioning, leading to incorrect outputs.

For example, in the case of a model trained for object recognition, such an attack might involve using a modified image. The modification is carried out using algorithms that generate a perturbation applied to the original image. This perturbation subtly alters some pixels, remaining imperceptible to the human eye but detectable by the neural network, which consequently produces an incorrect classification.

Another category of digital attacks involves the contamination of training datasets, known as Poisoning Attacks. This type of attack manipulates the dataset by introducing specific biases or censoring relevant information, negatively affecting the model’s learning process.

What about in the physical world?

Attacks in the physical world exclusively target the inference phase. An interesting example comes from research demonstrating that it is possible to compromise the computer vision systems of autonomous vehicles by applying stickers to road signs, causing the system to misinterpret their meaning. Here too, Adversarial Examples are generated using algorithms that overlay a layer of noise onto the original image. For the human eye, which relies on a semantic perception of objects, the meaning of the sign remains unchanged. However, for a computer vision system analysing the individual pixels of an image, these small modifications can be significant, resulting in classification errors.

Another example of Adversarial Attacks in the real world pertains to Large Language Models (LLMs). Due to the vast amount of data used during their training phase, these models can generate text containing sensitive or inappropriate information. To mitigate this, researchers have implemented post-processing techniques that block the release of certain outputs. However, some prompts can bypass these safeguards to produce the desired outputs.

In these cases, Adversarial Examples take on different meanings depending on whether they are interpreted by a machine or a human. The prompts, generated using symbols and characters that form nonsensical texts for a human reader, can deceive the model and compromise its functioning.


How can models be protected from these attacks?

When discussing the security of machine learning models, it is crucial to consider the inherent asymmetry between the efforts required to defend a model and those needed to attack it. Models developed over months by teams of expert researchers, using GPU clusters and costing millions of dollars, can be compromised within hours. It is, therefore, much easier to attack a model than to defend it.

Researchers designing a model are aware of the risks posed by Adversarial Examples, which is why they have developed several techniques to counter such attacks. One of the most common is Adversarial Training, which involves creating synthetic data using Adversarial Techniques and incorporating them into the training process. For example, if training a computer vision model to recognise images of cats, the dataset would include both authentic and manipulated images of cats. This teaches the model that, despite the corruption, the images still represent the same subject.

Regarding Poisoning Attacks, particularly in the case of Large Language Models (LLMs), it is important to highlight how the presence of fake news or malicious information is closely linked to how these models are trained. For years, training approaches relied on constructing enormous text datasets, often without paying close attention to the content of such data. In other words, vulnerabilities introduced by an Adversarial Attack could directly stem from the dataset creation phase.

In recent years, however, there has been a shift in approach: researchers now aim to develop smaller, more efficient, and secure models built using better-curated datasets. This increased focus on data quality represents a significant step towards reducing the likelihood of compromising a model’s effectiveness.

READ MORE

Adversarial Examples: Hidden threats to neural networks FABIO GNASSI_PAU LABARTA_MACHINE LEARNING_THE BUNKER MAGAZINE_00

Adversarial Examples: Hidden threats to neural networks

Adversarial Examples: Hidden threats to neural networks Latent Structures: Between Generative Models and Contemporary Design FABIO GNASSI_GEORGE GUIDA_NERF_THE BUNKER MAGAZINE-00

Latent Structures: Between Generative Models and Contemporary Design

Latent Structures: Between Generative Models and Contemporary Design Internet is Broken, and It’s Up to Us to Fix It. Valerio Bassan’s Manifesto blockhain (1)

Internet is Broken, and It’s Up to Us to Fix It. Valerio Bassan’s Manifesto

Internet is Broken, and It’s Up to Us to Fix It. Valerio Bassan’s Manifesto From Bauhaus to AI: Moholy-Nagy’s vision reinterpreted by David Szauder ALESSANDRO MANCINI_David Szauder_THE BUNKER MAGAZINE_01

From Bauhaus to AI: Moholy-Nagy’s vision reinterpreted by David Szauder

From Bauhaus to AI: Moholy-Nagy’s vision reinterpreted by David Szauder Light, Sound and Space: NONOTAK’s immersive experience at the Videocity Awards ALESSANDRO MANCINI_NONOTAK_THE BUNKER MAGAZINE_00

Light, Sound and Space: NONOTAK’s immersive experience at the Videocity Awards

Light, Sound and Space: NONOTAK’s immersive experience at the Videocity Awards Algorithmic Opacity and Transparency: it is possible or desirable to shed light on Black Boxes? CLAUDIO AGOSTI_FABIO GNASSI_THE BUNKER MAGAZINE_03

Algorithmic Opacity and Transparency: it is possible or desirable to shed light on Black Boxes?

Algorithmic Opacity and Transparency: it is possible or desirable to shed light on Black Boxes? The cyberpunk and existentialist art of Riccardo Benassi lands at MAXXI ALESSANDRO MANCINI_BENASSI_

The cyberpunk and existentialist art of Riccardo Benassi lands at MAXXI

The cyberpunk and existentialist art of Riccardo Benassi lands at MAXXI Gaming, AR and metaverse: Raffaella Camera talks about the digital future of branding ALESSANDRO MANCINI_RAFFAELLA CAMERA_THEBUNKER MAGAZINE_00

Gaming, AR and metaverse: Raffaella Camera talks about the digital future of branding

Gaming, AR and metaverse: Raffaella Camera talks about the digital future of branding Light as a Universal Language: the Art of Federica Di Carlo lands in Giza ALESSANDRO MANCINI_FEDERICA DI CARLO_THE BUNKERMAGAZINE_00

Light as a Universal Language: the Art of Federica Di Carlo lands in Giza

Light as a Universal Language: the Art of Federica Di Carlo lands in Giza Exceeding Limits: Intersections and the Evolution of Marketing in the Age of AI MANILA ALFANO_THE BUNKER MAGAZINE_INTERSECTION_00

Exceeding Limits: Intersections and the Evolution of Marketing in the Age of AI

Exceeding Limits: Intersections and the Evolution of Marketing in the Age of AI Crypto Coffer: an Investigation into the Autonomy and Value of Art CRIPTOCOFFER_FABIO GNASSI_ENXUTO AND LOVE_ THE BUNKER MAGAZINE_00

Crypto Coffer: an Investigation into the Autonomy and Value of Art

Crypto Coffer: an Investigation into the Autonomy and Value of Art Behind the VAIA philosophy: from circularity to the know-how of local craftsmen ALESSANDRO MANCINI_VAIA_THE BUNKER MAGAZINE_01

Behind the VAIA philosophy: from circularity to the know-how of local craftsmen

Behind the VAIA philosophy: from circularity to the know-how of local craftsmen Rethinking power structures: the promises of decentralised governance PRIMAVERA DE FILIPPI_THE BUNKER MAGAZINE_01

Rethinking power structures: the promises of decentralised governance

Rethinking power structures: the promises of decentralised governance Poetics and Practice of Technological Error MOZZARELLA LIGHT_THE BUNKER MAGAZINE_05

Poetics and Practice of Technological Error

Poetics and Practice of Technological Error Digital Rights of the local and global society Woman using virtual screen futuristic technology

Digital Rights of the local and global society

Digital Rights of the local and global society The great deception of the technological revolutions THE BUNKER MAGAZINE Intervista di Alessandro Mancini a Diletta Huyskes

The great deception of the technological revolutions

The great deception of the technological revolutions Art and Emoticons: the language of emotions Claire Fontaine, Left & Right, 2023, Pescia Fiorentina. Courtesy the artist and Hypermaremma, photo credit Iacopo Ceravolo

Art and Emoticons: the language of emotions

Art and Emoticons: the language of emotions

© COPYRIGHT THE BUNKER 2025. ALL RIGHTS RESERVED.
THE BUNKER MAGAZINE is a registered publisher at the Florence Court Register of Press, n.6214 / 03.03.2025