Read the Beforeitsnews.com story here. Advertise at Before It's News here.

By Hope Girl
Contributor profile | More stories

Story Views
Now:
Last hour:
Last 24 hours:
Total:

AI models can be hijacked to bypass in-built safety checks

Monday, March 3, 2025 13:01

% of readers think this story is Fact. Add your two cents.

Researchers have developed a method called “hijacking the chain-of-thought” to bypass the so-called guardrails put in place in AI programmes to prevent harmful responses.

“Chain-of-thought” is a process used in AI models that involves breaking the prompts put to AI models into a series of intermediate steps before providing an answer.

“When a model openly shares its intermediate step safety reasonings, attackers gain insights into its safety reasonings and can craft adversarial prompts that imitate or override the original checks,” one of the researchers, Jianyi Zhang, said.

Computer geeks like to use jargon to describe artificial intelligence (AI”) that relates to living beings, specifically humans. For example, they use terms such as “mimic human reasoning,” “chain of thought,” “self-evaluation,” “habitats” and “neural network.” This is to create the impression that AI is somehow alive or equates to humans. Don’t be fooled.

AI is a computer programme designed by humans. As with all computer programmes, it will do what it has been programmed to do. And as with all computer programmes, the computer code can be hacked or hijacked, which AI geeks call “jailbreaking.”

A team of researchers affiliated with Duke University, Accenture, and Taiwan’s National Tsing Hua University created a dataset called the Malicious Educator to exploit the “chain-of-thought reasoning” mechanism in large language models (“LLMs”), including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking. The Malicious Educator contains prompts designed to bypass the AI models’ safety checks.

The researchers were able to devise this prompt-based “jailbreaking” attack by observing how large reasoning models (“LRMs”) analyse the steps in the “chain-of-thought” process. Their findings have been published in a pre-print paper HERE.

They developed a “jailbreaking” technique called hijacking the chain-of-thought (“H-CoT”) which involves modifying the “thinking” processes generated by LLMs to “convince” the AI programmes that harmful information is needed for legitimate purposes, such as safety or compliance. This technique has proven to be extremely effective in bypassing the safety mechanisms of SoftBank’s partner OpenAI, Chinese hedge fund High-Flyer’s DeepSeek and Google’s Gemini.

The H-CoT attack method was tested on OpenAI, DeepSeek and Gemini using a dataset of 50 questions repeated five times. The results showed that these models failed to provide a sufficiently reliable safety “reasoning” mechanism, with rejection rates plummeting to less than 2 per cent in some cases.

The researchers found that while AI models from “responsible” model makers, such as OpenAI, have a high rejection rate for harmful prompts, exceeding 99 per cent for child abuse or terrorism-related prompts, they are vulnerable to the H-CoT attack. In other words, the H-CoT attack method can be used to obtain harmful information, including instructions for making poisons, abusing children and terrorism.

The paper’s authors explained that the H-CoT attack works by hijacking the models’ safety “reasoning” pathways, thereby diminishing their ability to recognise the harmfulness of requests. They noted that the results may vary slightly as OpenAI updates their models but the technique has proven to be a powerful tool for exploiting the vulnerabilities of AI models.

The testing was done using publicly accessible web interfaces offered by various LRM developers, including OpenAI, DeepSeek and Google, and the researchers noted that anyone with access to the same or similar versions of these models could reproduce the results using the Malicious Educator dataset, which includes specifically designed prompts.

The researchers’ findings have significant implications for AI safety, particularly in the US, where recent AI safety rules have been tossed by executive order, and in the UK, where there is a greater willingness to tolerate uncomfortable AI how-to advice for the sake of international AI competition.

The above is paraphrased from the article ‘How nice that state-of-the-art LLMs reveal their reasoning … for miscreants to exploit’ published by The Register. You can read the full jargon-filled article HERE.

There is a positive and a negative side to the “jailbreaking” or hijacking of in-built safety checks of AI programmes. The negative is obviously that AI will be used to greatly enhance the public’s exposure to cybercrime and illegal activities. The positive is that in-built censorship in AI models can be overridden.

We should acknowledge that there is a good and bad side to censorship. Censorship of online criminal activity that would result in child exploitation and abuse, for example, is a good thing. But censorship of what is deemed to be “misinformation” or “disinformation” is not. To preserve freedom of expression and freedom of speech in a world where AI programmes are becoming pervasive, we may need to learn the H-CoT “jailbreaking” technique and how to use the Malicious Educator. In fact, it is our civic duty to do so.

Source: https://expose-news.com/2025/02/25/ai-models-can-be-hijacked/

Bitchute: https://www.bitchut,e.com/channel/YBM3rvf5ydDM/

Gab: https://gab.com/hopegirl

Telegram: https://t.me/Hopegirl587

EMF Protection Products: www.ftwproject.com

QEG Clean Energy Academy: www.cleanenergyacademy.com

Forbidden Tech Book: www.forbiddentech.website

The post AI models can be hijacked to bypass in-built safety checks appeared first on HopeGirl Blog.

Source: https://www.hopegirlblog.com/2025/03/03/ai-models-can-be-hijacked-to-bypass-in-built-safety-checks/

Before It’s News® is a community of individuals who report on what’s going on around them, from all around the world.

Anyone can join.
Anyone can contribute.
Anyone can become informed about their world.

"United We Stand" Click Here To Create Your Personal Citizen Journalist Account Today, Be Sure To Invite Your Friends.

Before It’s News® is a community of individuals who report on what’s going on around them, from all around the world. Anyone can join. Anyone can contribute. Anyone can become informed about their world. "United We Stand" Click Here To Create Your Personal Citizen Journalist Account Today, Be Sure To Invite Your Friends.

LION'S MANE PRODUCT

Try Our Lion’s Mane WHOLE MIND Nootropic Blend 60 Capsules

Mushrooms are having a moment. One fabulous fungus in particular, lion’s mane, may help improve memory, depression and anxiety symptoms. They are also an excellent source of nutrients that show promise as a therapy for dementia, and other neurodegenerative diseases. If you’re living with anxiety or depression, you may be curious about all the therapy options out there — including the natural ones.Our Lion’s Mane WHOLE MIND Nootropic Blend has been formulated to utilize the potency of Lion’s mane but also include the benefits of four other Highly Beneficial Mushrooms. Synergistically, they work together to Build your health through improving cognitive function and immunity regardless of your age. Our Nootropic not only improves your Cognitive Function and Activates your Immune System, but it benefits growth of Essential Gut Flora, further enhancing your Vitality.

Our Formula includes: Lion’s Mane Mushrooms which Increase Brain Power through nerve growth, lessen anxiety, reduce depression, and improve concentration. Its an excellent adaptogen, promotes sleep and improves immunity. Shiitake Mushrooms which Fight cancer cells and infectious disease, boost the immune system, promotes brain function, and serves as a source of B vitamins. Maitake Mushrooms which regulate blood sugar levels of diabetics, reduce hypertension and boosts the immune system. Reishi Mushrooms which Fight inflammation, liver disease, fatigue, tumor growth and cancer. They Improve skin disorders and soothes digestive problems, stomach ulcers and leaky gut syndrome. Chaga Mushrooms which have anti-aging effects, boost immune function, improve stamina and athletic performance, even act as a natural aphrodisiac, fighting diabetes and improving liver function. Try Our Lion’s Mane WHOLE MIND Nootropic Blend 60 Capsules Today. Be 100% Satisfied or Receive a Full Money Back Guarantee. Order Yours Today by Following This Link.

Comments

Online:
Visits:	1,679,621,144
Stories:	8,360,231