All five systems tested were found to be ‘highly vulnerable’ to attempts to elicit harmful responses
Guardrails to prevent artificial intelligence models behind chatbots from issuing illegal, toxic or explicit responses can be bypassed with simple techniques, UK government researchers have found.
The UK’s AI Safety Institute (AISI) said systems it had tested were “highly vulnerable” to jailbreaks, a term for text prompts designed to elicit a response that a model is supposedly trained to avoid issuing.
More Stories
Virologist Wendy Barclay: ‘Wild avian viruses are mixing up their genetics all the time. It’s like viral sex on steroids’
Microsoft unveils chip it says could bring quantum computing within years
‘It would be seen as political’: why the Royal Society is torn over Elon Musk