Prompt Injection With Image Scaling Attacks Threatens AI System

172

As the use of AI tools for image generation and processing becomes more widespread, ensuring robust security measures is crucial. Researchers have uncovered a new attack method that leverages AI for data exfiltration through images. This attack combines the threat of image scaling attacks on AI with prompt injection, illustrating how malicious activities can be carried out stealthily.

Table of Contents

Researchers Unleash Prompt Injection Attacks Alongside Image Scaling

In a recent revelation, cybersecurity experts from Trail of Bits delve into how prompt injection attacks can exploit image scaling in AI tools to execute malicious operations. These operations can vary from simple tasks like opening an application to surreptitious data exfiltration.

The concept of image scaling attacks was first introduced by researchers at Technische Universität Braunschweig, Germany, back in 2020. Image scaling attacks revolve around manipulating the image scaling process of AI systems. When processing images, AI systems reduce the size of input images for faster and more efficient processing before feeding them to the model. A malicious actor can exploit this reduction in image size to influence how the model interprets the image. In the case of the Trail of Bits researchers, they utilized image scaling for prompt injection attacks.

$\"image$

Source: Trail of Bits

As demonstrated, the researchers embedded a malicious prompt in an image, ensuring the prompt remains invisible when the image is viewed at full scale. However, when an AI system rescales the image, the change in resolution reveals the prompt to the system. Upon reaching the AI model, the prompt deceives the model into treating it as part of the instructions, leading to the execution of the specified malicious action without the user’s awareness.

In their experiment, the researchers showcased this attack method against the Gemini CLI using the default configuration for the Zapier MCP server. They uploaded an image containing a hidden malicious prompt to extract user data from Google Calendar to a designated email address.

The researchers have documented the specifics of this attack method in their article.

Most AI Systems Are Susceptible To This Attack

According to the researchers, this attack, with slight modifications tailored to the target AI model, is effective against a wide range of systems, including:

For further evaluation, the researchers have made an open-source tool named “Anamorpher” available on GitHub. This tool, supported by a Python API, enables users to visualize attacks on multimodal AI systems. Currently in beta phase, it generates images designed for multimodal prompt injections during downscaling.

Recommended Countermeasures

The researchers suggest that restricting downscaling algorithms will not deter these attacks given the broad attack surface. Instead, they propose limiting upload dimensions and avoiding image downsizing. Moreover, ensuring an accurate preview of the image seen by the model could help detect any unnoticed prompt injections during image uploads.

Additionally, the researchers advocate for the implementation of robust defense mechanisms to thwart multimodal prompt injection attacks, such as enforcing mandatory user confirmation before executing any text-based instructions embedded in images.

We welcome your thoughts in the comments.

Stay updated on this post category in real-time on your device by subscribing now.