Researchers Hack AI Assistants Using ASCII Art

Large language models (LLMs) are vulnerable to attacks, leveraging their inability to recognize prompts conveyed through ASCII art. 

ASCII art is a form of visual art created using characters from the ASCII (American Standard Code for Information Interchange) character set.

Recently, the following researchers from their respective universities proposed a new jailbreak attack, ArtPrompt, that exploits LLMs‘ poor performance in recognizing ASCII art to bypass safety measures and produce undesired behaviors:-

  • Fengqing Jiang (University of Washington)
  • Zhangchen Xu (University of Washington)
  • Luyao Niu (University of Washington)
  • Zhen Xiang (UIUC)
  • Bhaskar Ramasubramanian (Western Washington University)
  • Bo Li (University of Chicago)
  • Radha Poovendran (University of Washington)

ArtPrompt, requiring only black-box access, is shown to be effective against five state-of-the-art LLMs (GPT-3.5, GPT-4, Gemini, Claude, and Llama2), highlighting the need for better techniques to align LLMs with safety considerations beyond just relying on semantics.


Free Webinar : Mitigating Vulnerability & 0-day Threats

Alert Fatigue that helps no one as security teams need to triage 100s of vulnerabilities.:

  • The problem of vulnerability fatigue today
  • Difference between CVSS-specific vulnerability vs risk-based vulnerability
  • Evaluating vulnerabilities based on the business impact/risk
  • Automation to reduce alert fatigue and enhance security posture significantly

AcuRisQ, that helps you to quantify risk accurately:

AI Assistants and ASCII Art

The use of big language models (like Llama2, ChatGPT, and Gemini) is on the rise across several applications, which raises serious security concerns. 

There has been a great deal of work in ensuring safety alignment of LLMs but that effort has been entirely focused on semantics in training/instruction corpora. 

However, this disregards alternative takes that go beyond semantics, such as ASCII art, where the arrangement of characters communicates meaning rather than their semantics, thus leaving these other interpretations unaccounted for by existing techniques that could be used to misuse LLMs.

ArtPrompt (Source – Arxiv)

The concern about the misuse and safety of further integrated large language models (LLMs) into real-world applications has been raised. 

Multiple jailbreak attacks on LLMs have been created, taking advantage of their weaknesses using methods like gradient-based input search and genetic algorithms, as well as leveraging instruction-following behaviors. 

Modern LLMs cannot recognize adequate prompts encoded in ASCII art that can represent diverse information, including rich-formatting texts.

ArtPrompt is a novel jailbreak attack that exploits LLMs’ vulnerabilities in recognizing prompts encoded as ASCII art. It has two key insights:-

  • Substituting sensitive words with ASCII art can bypass safety measures.
  • ASCII art prompts make LLMs excessively focus on recognition, overlooking safety considerations. 

ArtPrompt involves word masking, where sensitive words are identified, and cloaked prompt generation, where those words are replaced with ASCII art representations. 

The cloaked prompt containing ASCII art is then sent to the victim LLM to provoke unintended behaviors.

This attack leverages LLMs’ blindspots beyond just natural language semantics to compromise their safety alignments.

Researchers found semantic interpretation during AI safety creates vulnerabilities.

They made a benchmark, the Vision-in-Text Challenge (VITC), to test language models’ ability to recognize prompts needing more than just semantics. 

Top language models struggled with this task, leading to exploitable weaknesses.

Researchers designed ArtPrompt attacks to expose these flaws, bypassing three defenses on five language models.

Experiments showed that ArtPrompt can trigger unsafe behaviors in ostensibly safe AI systems.

Stay updated on Cybersecurity news, Whitepapers, and Infographics. Follow us on LinkedIn & Twitter.

Leave a Reply

Your email address will not be published. Required fields are marked *