Artificial Intelligence: What water turning to vapour and the way AI learns have in common


Artificial intelligence (AI) models like ChatGPT, Claude, and Gemini often give the impression that there’s a mind at work within the machine. These days they “think” in response to queries, go back and correct themselves, apologise for mistakes, and mimic many tics of human communication.

There’s no direct physical evidence to this day that a machine mind exists however. In fact, there’s good reason to believe what these machines are doing when they say they’re “thinking” is actually dealing with a physical phenomenon.

Also Read | At the last frontier of thought: will AI kill creativity?

In the 1980s, a group of physicists led among others by John Hopfield and Geoffrey Hinton realised that if you have a network with millions of neurons, you can stop treating them as individual ‘particles’ and start addressing them as a system. And the behaviour and properties of these systems can be described by the rules of thermodynamics and statistical mechanics.

Hopfield and Hinton won the physics Nobel Prize in 2024 for this work. A pair of studies published in Physical Review E has doubled down on the same idea, showing that two common ‘tricks’ engineers use to make AI models better are also such physical phenomena.

Achilles heel

A neural network is a network of processors connected to each other like neurons in the human brain and which learns and uses information like the brain. They can also be stacked in multiple layers, so that one layer prepares the inputs for the next and so on. Neural networks are at the heart of machine learning applications like generative AI, self-driving cars, computer vision, and modelling.

They also have an Achilles heel called overfitting: a network becomes so obsessed with some specific examples it has seen during its training that it fails to understand the broader patterns. Engineers have developed some techniques to prevent this. For instance, the October 2025 paper by University of Oxford and Princeton University researchers Francesco Mori and Francesca Mignacco focused on a technique called dropout. During training, the neural network is made to randomly turn off a certain percentage of its neurons, forcing the remaining ones to work harder and learn the concepts independently.

Abdulkadir Canatar and SueYeon Chung, of the Flatiron Institute and New York University, turned to a constraint called tolerance in their August paper. They analysed what happens when an AI is told to ignore any error that falls within a small range. So rather than trying to correct every little discrepancy, the network treats any answer that’s ‘close enough’ to be good enough.

While dropout and tolerance look like different programming choices, the authors of the two papers insisted (separately) that they’re both governed by the same underlying physical phenomena.

Teacher-student experiment

Both duos used a tool called the teacher-student framework to explain how. Teacher is a neural network that’s already familiar with a dataset while Student is a network that’s starting completely blank. The Student’s goal is to learn the same dataset until its internal settings are aligned with those of the Teacher.

Mori and Mignacco wrote that at first, the Student was stuck in an “unspecialised phase” when its neurons were all doing the same thing. In the authors’ mathematical models, this appeared as a plateau, or a flat line, in the error graph, and it denoted that the Student wasn’t learning.

The three phases of learning.

The three phases of learning.
| Photo Credit:
Phys. Rev. E 112, 045301

So they argued that for the Student to become smarter, it must first undergo a “specialisation transition”. Physicists are familiar with such transitions because they use the same maths to describe liquid water turning into vapour, a process called a phase transition.

Mori and Mignacco reported that by randomly turning neurons off, dropout injected a certain amount of noise into the system, which then nudged the network out of its plateau and towards specialised intelligence — a phase transition. This description also aligns with the work of Hopfield and Hinton, who proved that the energy of a neural network is a real thing, by manipulating which the network can be made to perform better.

They even reported a formula that they said could find the supposedly ideal dropout rate: relating the activation probability, which is the chance that a neuron spits out a particular output for a given set of inputs, to the learning rate, noise level, and the learning capacities of the Teacher and Student networks.

Like atoms

Canatar and Chung also found that the consequences of changing the tolerance on the network could be described using the laws of physics, which they illustrated by applying their findings to the double-descent problem. When you give a network more data, its performance sometimes gets worse before it suddenly gets better. According to Canatar and Chung, when a network learns exactly as many examples as it has internal settings for, it reaches a point where it’s looking for more information. When it doesn’t get that information, it starts to overfit what it already ‘knows’ to every problem in its way.

The machine doesn’t reach this overfitting stage because its algorithm is flawed but because its millions of parameters are like a collection of atoms trying to go through a phase transition, and failing, they added. As a result, the results of the neurons’ computations are riddled with errors.

The solution? “Canatar and Chung uncovered a critical value of … tolerance that separates two regimes: one in which the neural network perfectly fits the training data and another in which overadaption is avoided. In physical terms, this regime crossover corresponds to a … phase transition,” Hugo Cui, a researcher at the University of Paris-Saclay and the French National Centre for Scientific Research, wrote in a commentary for Physics.

Some limitations

Mori and Mignacco were working with a two-layer neural network, which is like a toy model compared to the large, multi-layered deep learning networks that power AI models like ChatGPT or self-driving cars. Nonetheless, they’ve written that the “mechanisms” they’ve uncovered answer “several open questions about the mechanisms driving the performance improvement induced by dropout”.

Canatar and Chung on the other hand applied their equations to ResNet, an advanced type of neural network used to solve real-world problems like computer vision. They said that even in this setup, the same geometric and thermodynamic rules they’d found in their simpler model held true.

For decades now, engineers have often treated machine learning as a kind of ‘black box’, where they just tinker with the code until it works but without understanding why it works. In the 1980s, however, a conviction prevailed that machine intelligence is a complex product, but nonetheless a product, of statistical mechanics, which physicists understand very well. By this logic, the machine’s inner workings aren’t inscrutable so much as a physical system that can be deciphered using undergraduate physics.

These studies suggest a future where scientists could use analytic theories like the ones in the papers to estimate an AI model’s performance even before they turn it on.

mukunth.v@thehindu.co.in

Published – March 02, 2026 07:30 am IST



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *