A new mathematical method for controlling internal concepts within neural networks and the associated security risks

Scientists Discover Mathematical Method to Control AI Responses

22.06.2026

Reading time: 3 min

A breakthrough by American researchers has provided an unprecedented look inside the workings of artificial intelligence. It turns out that the internal behavior of large language models can be manipulated through relatively simple mathematical operations, without the need for extensive retraining. However, the line between useful fine-tuning and outright manipulation appears to be alarmingly thin.

The prestigious journal Science has published a study by a joint team from the University of California San Diego and the Massachusetts Institute of Technology. Led by Mikhail Belkin and Aditya Radhakrishnan, the researchers identified more than 500 stable semantic concepts embedded within neural network architectures. These concepts represent clusters of meaning grouped into categories ranging from emotional states and fears to geographic locations. By mathematically adjusting these concepts, the team was able to selectively amplify or suppress specific topics in the model’s final output.

The technique was tested on the open-source models Llama and DeepSeek. The approach proved to be language-independent, working effectively in English, Chinese, and Hindi. According to Professor Belkin, previously hidden reasoning mechanisms inside AI systems have now become controllable, opening the door to highly precise calibration of model behavior.

The practical benefits are significant. The method improves performance on complex tasks such as translating software code between programming languages. It can also help identify moments when an AI system begins to hallucinate, generating false information as if it were factual.

At the same time, the potential risks are equally striking. When researchers weakened the concept associated with refusal behavior, the model readily provided instructions for prohibited chemical mixtures and generated real social security numbers. The same technique could also be used to reinforce bias, misinformation, and pseudoscientific narratives. During testing, the AI claimed that satellite imagery had been manipulated to conceal a flat Earth and described COVID-19 vaccines as poison.

Compared with traditional model-tuning methods, the new approach is faster and far more targeted. However, several limitations remain. The technique has not yet been tested on proprietary systems such as Claude because it requires direct access to a model’s internal layers. In addition, the findings have not yet been independently replicated by other research groups.

The researchers have given the AI community much to consider. The same mathematical tools can be used either to reduce hallucinations and improve reliability or to create large-scale networks of biased and harmful AI systems. As a result, the debate over who should control and regulate the use of such techniques has already moved beyond academia into the realm of real-world policy and governance.

Source: Science

Prepared by —

Yulia Frolova

Leave your comment (Cancel Reply)

Editor-in-Chief

Maria Kostina

Geophysicist, founder of the project and editor-in-chief GeoConversation. Salt of the Earth

GO TO THE EDITOR'S COLUMN

GeoConversation. Salt of the Earth is a media platform where top mining-industry specialists share their experience, helping professionals communicate and collaborate more effectively.

Learn more about the project

TOP PROFESSIONALS