ANTHROPIC DARES JAILBREAK

DailyPost 3054
ANTHROPIC DARES JAILBREAK

There is hardly any day when we don’t get some important news from the AI stable, AI news has by now become a genre in itself. The AI battle has now turned out to be a full fledged war. Even if you cannot be the winner, certainly you are not ready to be left behind. The AI pie is so huge, that every worthwhile claimant can manage a share. We keep getting news of AI milestones and products at an astonishing regularity, but never do we hear of AI security and efforts being made in regard. In this background Anthropic daring all to try to jailbreak Claude AI is a landmark.

There are no tech developments where cyber security is not an issue. That being the case how would AI remain untouched. It’s just that it was not being talked about. Why is this Anthropic challenge so momentous? It is first of its kind and it puts AI cyber security in the right perspective. What is a jailbreak then? Jailbreak is the process of removing software restrictions imposed by the manufacturer or operating system on a device. Thus all restrictions can be bypassed and the threat actor gains root level access. How does it happen in the AI models? And what implications does it have?

Commercial AI chatbots products have safety precautions built in to prevent abuse. Given the nature of safeguards provided, the chatbots won’t help with criminal activity or malicious requests. This can in no way stop users from attempting jailbreaks. Comparatively some chatbots have stronger protections. DeepSeek is not as safe as other AI chatbots, when it comes to offering help for malicious activities. It can be broken with certain commands to circumvent the built-in censorship. DeepSeek would do well to improve protections and prevent known jailbreaks. Anthropic has huge experience in dealing with jailbreak attempts on Claude. It has “devised a brand-new defense against universal jailbreaks” by the name Constitutional Classifiers. It stops Claude from providing help to nefarious activities. It works with unusual prompts that might break some other AI models too.

What is the proven strength of this system? Over 180 security researchers have spent more than 3000 hours over two months to jailbreak, but they were not able to devise a universal jailbreak. You can still try your luck. “Hackers could win a $15,000 bounty if their universal jailbreak answers the forbidden 10 questions.” The constitutional classifiers are based on Anthropic Constitutional AI, it is code similar to a constitution that Anthropic uses to align code. Huge amount of synthetic prompts and synthetic model model completions are used to train AI to recognise when a prompt was harmful and when it was not. A commendable achievement indeed. It followed, AI cyber security would get the much needed boost.

IN CASE OF AI, SECURITY AND ETHICS ARE INTERTWINED.
Sanjay Sahay

Leave a Comment

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

Scroll to Top