A new mathematical method for controlling internal concepts within neural networks and the associated security risks

Scientists Discover Mathematical Method to Control AI Responses

22.06.2026
Reading time: 3 min
0

A breakthrough by American researchers has provided an unprecedented look inside the workings of artificial intelligence. It turns out that the internal behavior of large language models can be manipulated through relatively simple mathematical operations, without the need for extensive retraining. However, the line between useful fine-tuning and outright manipulation appears to be alarmingly thin.

The prestigious journal Science has published a study by a joint team from the University of California San Diego and the Massachusetts Institute of Technology. Led by Mikhail Belkin and Aditya Radhakrishnan, the researchers identified more than 500 stable semantic concepts embedded within neural network architectures. These concepts represent clusters of meaning grouped into categories ranging from emotional states and fears to geographic locations. By mathematically adjusting these concepts, the team was able to selectively amplify or suppress specific topics in the model’s final output.

The technique was tested on the open-source models Llama and DeepSeek. The approach proved to be language-independent, working effectively in English, Chinese, and Hindi. According to Professor Belkin, previously hidden reasoning mechanisms inside AI systems have now become controllable, opening the door to highly precise calibration of model behavior.

The practical benefits are significant. The method improves performance on complex tasks such as translating software code between programming languages. It can also help identify moments when an AI system begins to hallucinate, generating false information as if it were factual.

At the same time, the potential risks are equally striking. When researchers weakened the concept associated with refusal behavior, the model readily provided instructions for prohibited chemical mixtures and generated real social security numbers. The same technique could also be used to reinforce bias, misinformation, and pseudoscientific narratives. During testing, the AI claimed that satellite imagery had been manipulated to conceal a flat Earth and described COVID-19 vaccines as poison.

Compared with traditional model-tuning methods, the new approach is faster and far more targeted. However, several limitations remain. The technique has not yet been tested on proprietary systems such as Claude because it requires direct access to a model’s internal layers. In addition, the findings have not yet been independently replicated by other research groups.

The researchers have given the AI community much to consider. The same mathematical tools can be used either to reduce hallucinations and improve reliability or to create large-scale networks of biased and harmful AI systems. As a result, the debate over who should control and regulate the use of such techniques has already moved beyond academia into the realm of real-world policy and governance.

Source: Science

Prepared by —
Author avatar
Yulia Frolova
Did you like this news? Share with friends
RELATED

Leave your comment

 

Editor-in-Chief
Maria Kostina
Maria Kostina
Geophysicist, founder of the project and editor-in-chief GeoConversation. Salt of the Earth
GO TO THE EDITOR'S COLUMN

GeoConversation. Salt of the Earth is a media platform where top mining-industry specialists share their experience, helping professionals communicate and collaborate more effectively.

Learn more about the project
TOP PROFESSIONALS
4

Alina Pavlovskaya

Gold Deposit Development
Chief Geologist, placer gold expert
Александра Волкова

Alexandra Volkova

TPU, Heriot-Watt Center
Laboratory Engineer, Lecturer
Тарас Паникоровский — эксперт в minералогии и кристаллографии, лауреат премии «Хрустальный компас». Исследует природоподобные технологии и техносферную безопасность Арктики.

Taras Panikorovsky

Murmansk Arctic University, Murmansk
Head of Laboratory
VIEW ALL EXPERTS
CATEGORIES
SUBSCRIBE
If you would like to receive a monthly selection of fresh articles by email
LIKE THE PROJECT? SUPPORT US
Friends, developing the project takes a lot of effort and financial resources. If you like what we do, you can support us in two ways.
MORAL SUPPORT
Show our website to your friends. Just click on the social media icons below and share our website on your pages.
FINANCIAL SUPPORT
Even a small fee will help us pay for the transcription (audio to text) of an expert interview or the design of drawings, diagrams, and tables.
Send a donation
Got an article idea? Suggest it.
Cool! You have an idea for us. We love that, because only the experience and knowledge of an expert makes our articles useful for the reader. Please answer 5 questions to let us know a little more about you and the article
answer questions