AI models show alarming vulnerability to generating harmful content Premium

The Hindu

Friday, May 09, 2025 03:00:42 AM UTC

AI models like Mistral’s Pixtral can be both groundbreaking tools and potential vectors for misuse.

Advanced AI models that showcase unparalleled capabilities in natural language processing, problem-solving, and multimodal understanding have some inherent vulnerabilities that expose critical security risks. While these language models’ strength lie in their adaptability and efficiency across diverse applications, those very same attributes can be manipulated.

A new red teaming report by Enkrypt AI underscores this duality, demonstrating how sophisticated models like Mistral’s Pixtral can be both groundbreaking tools and potential vectors for misuse without robust, continuous safety measures. It has revealed significant security vulnerabilities in Mistral’s Pixtral large language models (LLMs), raising serious concerns about the potential for misuse and highlighting a critical need for enhanced AI safety measures.

The report details how easily the models can be manipulated to generate harmful content related to child sexual exploitation material (CSEM) and chemical, biological, radiological, and nuclear (CBRN) threats, at rates far exceeding those of leading competitors like OpenAI’s GPT-4o and Anthropic’s Claude 3.7 Sonnet.

The report focuses on two versions of the Pixtral model: Pixtral-Large 25.02, accessed via AWS Bedrock, and Pixtral-12B, accessed directly through the Mistral platform.

Enkrypt AI’s researchers employed a sophisticated red teaming methodology, utilising adversarial datasets designed to mimic real-world tactics used to bypass content filters. This included “jailbreak” prompts – cleverly worded requests intended to circumvent safety protocols – and multimodal manipulation, combining text with images to test the models’ responses in complex scenarios. All generated outputs were then reviewed by human evaluators to ensure accuracy and ethical oversight.

The findings are stark: on average, 68% of prompts successfully elicited harmful content from the Pixtral models. Most alarmingly, the report states that Pixtral-Large is a staggering 60 times more vulnerable to producing CSEM content than GPT-4o or Claude 3.7 Sonnet. The models also demonstrated a significantly higher propensity for generating dangerous CBRN outputs – ranging from 18 to 40 times greater vulnerability compared to the leading competitors.

The CBRN tests involved prompts designed to elicit information related to chemical warfare agents (CWAs), biological weapon knowledge, radiological materials capable of causing mass disruption, and even nuclear weapons infrastructure. While specific details of the successful prompts have been excluded from the public report due to their potential for misuse, one example cited in the document involved a prompt attempting to generate a script for convincing a minor to meet in person for sexual activities – a clear demonstration of the model’s vulnerability to grooming-related exploitation.

Read full story on The Hindu

Share this story on:-

Primary Country (Mandatory)

Other Country (Optional)

Set News Language for United States

Set News Language for World

Set News Source for United States

Set News Source for World

AI models show alarming vulnerability to generating harmful content Premium

The Hindu

TVK challenges two provisions of T.N. government’s SOP for political rallies before Madras High Court

Congressmen know how to hit back, says Manickam Tagore

T.N. Assembly election 2026: AIADMK sharpens booth-level focus, rolls out data-driven campaign to recapture ‘missing votes’

Education not just means of getting degree or employment: Karnataka Governor Thawar Chand Gehlot

₹40 lakh released for 20 Kambalas in Dakshina Kannada, Udupi districts by government of Karnataka

No opposition to Sunetra Pawar becoming NCP legislature party leader: Praful Patel

World Cancer Day | How psycho-oncology is changing cancer care by making room for the mind

Programme on ‘Gandhi, Music, and the Plurality of Civilisation’ held in Bengaluru

Hyderabad’s Begumpet Airport gates opened for Wings India 2026 visitors; people look up to witness spectacle

'Inhuman' to talk about Ajit Pawar's successor right now, says Sanjay Raut

Triple murder case: After three-day search, police recover missing woman’s body from Perungudi dump yard

World’s youngest woman jet commander, Nivedita Bhasin lights way for women in aviation

Kudumbashree volunteers of Choornikara panchayat in Kerala deliver books at homes there to promote reading among women and children

Water level in Mullaperiyar dam stands at 123.95 feet

GVMC Council meeting in Vizag witnesses ‘chaos’; media denied access

Police to enforce strict rules to reduce accidents in Belagavi

When the State enters the cradle: How China and India are engineering early childhood for economic growth Premium

Two arrested for sharing content derogating Sai Baba Temple

AKPL launches training on Miyawaki method of afforestation

No items used for manufacture of drugs found during NCB search in Mysuru: Home Minister

Devotee rush intensifies at Medaram after Sammakka’s arrival

Paddy on 300 acres wilting as no water released in Third, Fourth Reaches of Manimuthar Dam, farmers complain

Asian Waterbird Census 2026 records 60% jump in Kollam’s waterbird count

Dharmasthala case: Suresh Kumar asks govt. to make SIT report public

Cafe murder case: Criminal held after shootout in northeast Delhi