Paper by Anthropic outlines how LLMs can be forced to generate responses to potentially harmful requests
The safety features on some of the most powerful AI tools that stop them being used for cybercrime or terrorism can be bypassed simply by flooding them with examples of wrongdoing, research has shown.
In a paper from the AI lab Anthropic, which produces the large language model (LLM) behind the ChatGPT rival Claude, researchers described an attack they called “many-shot jailbreaking”. The attack was as simple as it was effective.
More Stories
Virologist Wendy Barclay: ‘Wild avian viruses are mixing up their genetics all the time. It’s like viral sex on steroids’
Microsoft unveils chip it says could bring quantum computing within years
James Bond by Amazon isn’t a bad thing – we could finally get an update to the groundbreaking GoldenEye 007