Aayush Rath

Introduction

Working in robotics naturally pulled me into the world of AI—because in today’s era, intelligence isn’t just about code, it’s about giving machines a body to interact, learn, and act in the real world. I never followed a strict, textbook path to learn machine learning; instead, I stumbled into it, explored out of curiosity, and discovered how exciting the journey could be.

What fascinates me most is not just how AI works today, but how these ideas first took shape—the roots of the complex architectures and learning algorithms we rely on. That curiosity eventually led me to dive into Associative Memory Models, better known as Hopfield Networks — a concept that bridges neuroscience, physics, and AI in a way that feels both timeless and futuristic.

One fascinating detail worth mentioning is that J.J. Hopfield was jointly awarded the 2024 Nobel Prize in Physics alongside Geoffrey E. Hinton, recognized for their “foundational discoveries and inventions that enable machine learning with artificial neural networks.”

The ideas I’m exploring here stand on the shoulders of that legacy. They are not just the work of a single mind but the result of decades of collective effort—researchers, physicists, neuroscientists, and computer scientists—each contributing to our understanding of how the brain works and how we can model intelligence artificially. Without their breakthroughs, the journey from studying neurons to building artificial intelligence would not have been possible.

To put it simply, a Hopfield Network can be thought of as a network of binary units (like little switches that are either on or off) connected by weighted links. These weights are carefully set so that the network “remembers” certain stable patterns—like snapshots of information stored in its memory.

Once trained, the network can reconstruct an entire memory from just a small fragment of it. This mirrors how our own brains work: for example, catching just a faint smell of a dish might instantly bring back the memory of a specific food or even the moment you last had it. In this way, Hopfield Networks don’t just store data—they recreate the feeling of memory recall itself.

History

Here’s a quick journey through the key milestones—the discoveries and breakthroughs that paved the way for the creation of Hopfield Networks and beyond.

1920 The Ising Model is introduced to represent the state of magnetic materials using spins.
1943 McCulloch Pitts neuron came to help us build neural networks
1957 Ronsenblatt gives world the idea of perceptron and simulates it on an IBM 704
1963 Roy Glauber extends the Ising Model by introducing Glauber dynamics, allowing it to adapt and evolve over time.
1971–72 Amari and Nakano(separately) propose modifying the weights of the Ising Model using Hebbian Learning, a model for associative memory.
1975 The Sherrington–Kirkpatrick model is introduced, a mean-field model for spin glasses characterized by disordered and competing interactions between spins.
1982 Hopfield applied the SK-model idea with binary activation function to study Hopfield Networks
1984 The concept of Hopfield Networks was extended to continuous activation functions (this is what gave rise to the modern neural networks)
1985 Hopfield and Tank presented an application of Hopfield Neural Networks to solve the popular Travelling Salesman Problem

Ising Model

Just like some of today’s popular machine learning models—such as diffusion models and physics-informed neural networks (PINNs)—Hopfield Networks are also grounded in physics. The goal was to create a general content-addressable memory, and in his original paper, Hopfield pointed out that any physical system that naturally settles into stable configurations can serve as a model.

One of the key inspirations was the Ising Model, introduced to describe magnetism in ferromagnetic materials. In this model, every unit of the material carries a tiny “spin” (either +1 or –1). The overall energy of the system depends on how these spins are arranged. When placed in a magnetic field, the spins tend to align in ways that lower the energy—pushing the system towards stability.

This is more than just theory—you can experiment with it! Below, try clicking on the spins in the interactive setup to see how the system’s energy shifts. Notice how each dipole (spin) only “talks” to its neighbors, yet together they form a collective behavior. That same principle of local interactions leading to global order is what Hopfield leveraged to design his networks.

Total Energy: 0

The Ising model defines energy (without any external magnetic field) as:

$$ E = -\sum_{\langle i, j \rangle}J_{i, j}\sigma_{i}\sigma_{j} $$

$ J_{i, j} $ is the interaction between the dipoles
$ \sigma_{i} $ is the spin value (either 1 or -1)

The central idea behind Hopfield Networks is the concept of energy minimization. A memory is introduced as input, and the network adjusts its interaction parameters so that the overall energy of the system is reduced. Stable states, which represent stored memories, correspond to these minimum-energy configurations.

This principle effectively serves as the loss function for the network. Structurally, Hopfield Networks borrow from the Ising Model: a graph of interconnected binary units (0 or 1), where the state of each unit influences its neighbors. Through these interactions, the network converges toward a stable pattern that encodes the memory.

Shown below is a 3D plot of example energy function. The minima will correspond to the stable memory location. So if the network was queried at a region near a particular minimum (i.e. partial memory), it would eventually converge to a local minimum to give the full memory as the output.

Learning

Now that we have defined an energy-based loss function, we need a method to optimize the network parameters so that the system can learn effectively. Several approaches exist, but here we focus on one of the most widely recognized methods. It is important to note that Hopfield Networks employ symmetric weights (the weight from neuron i to j is the same as from j to i) to guarantee the convergence of energy values.

Hebbian Learning

Donald Hebb introduced the Hebbian theory, which describes how the simultaneous activation of two neurons strengthens the synaptic connection between them. This is often summarized as: “neurons that fire together, wire together.” In the context of Hopfield Networks, this principle is used to update the weights of the network. The weight for each connection is calculated as:

$$ w_{i,j} = \frac{1}{n} \sum_{\mu=1}^{n} (2 \epsilon_{i}^{\mu} - 1) \, (2\epsilon_{j}^{\mu}-1) $$

$ w_{i,j} $ is the weight of the connection between neuron i and neuron j
$ n $ is the total number of stored patterns
$ \mu $ is the index of a specific memory pattern
$ \epsilon_{i}^{\mu} $ is the state of neuron i in memory pattern μ
$ \epsilon_{j}^{\mu} $ is the state of neuron j in memory pattern μ

Let’s imagine we have just two simple neurons. From the formula, you can see that:

If both neurons are in the same state (either (1, 1) or (0, 0)), the connection between them — the weight — is positive.
If the neurons are in opposite states ((1, 0) or (0, 1)), the weight becomes negative.

This clever setup ensures that the system always prefers lower energy levels, pulling it toward stable points (minima). In other words, the weights are chosen so that each possible state naturally settles into one of the energy valleys, just like a ball rolling down into the lowest part of a hill. Since we are dealing with binary values of the state (1 and 0 instead of 1 and -1), the above energy can be rewritten as

$$ E = -\frac{1}{2} \sum_{i}\sum_{j, j \neq i} w_{i,j} (2\epsilon_i -1)(2\epsilon_j-1). $$

One thing to observe is the weird $2 \epsilon - 1$ instead of just using $ \epsilon $. Well that is just a way to accomodate the states to still give the same energy irrespective of whether we go with the $ \pm 1 $ representation or the binary representation $ (1, 0) $.

Memory Retrieval

We’ve seen how the network learns to store memories using the mathematical framework discussed earlier. But how do we actually retrieve a stored memory once the network is trained?

Hopfield, in his original paper, proposed a binary activation update rule for neurons. The idea is simple: if we provide a partial or noisy memory as input, each neuron updates its state based on the weighted contributions from all the other connected neurons. If this summed input crosses a certain threshold, the neuron activates (1); otherwise, it deactivates (0).

The update rule can be written as:

$$ \epsilon_i = \begin{cases} 1, & \text{if } \sum_{j} w_{i,j}\epsilon_j - \theta \ge 0 \\ 0, & \text{otherwise} \end{cases} $$

Where:

$ \epsilon_i $ is the binary state of neuron $ i $ (either 0 or 1)
$ w_{i,j} $ is the learned weight between neuron $ i $ and $ j $
$ \theta $ is the threshold value (mostly taken as zero)
The summation is over all neurons $ j $ connected to $ i $

By repeatedly applying this rule, the network gradually converges from the given partial input to the closest stored memory pattern — effectively completing the memory.

Implementation

The Hnet class below represents a basic Hopfield Network model in C++. It stores the connection strengths between neurons (the weights), keeps track of the number of neurons (n), and uses a threshold value to determine activation. The main components are:

weights: A 2D matrix of doubles representing connections between neurons.
n: The total number of neurons in the network.
threshold: The activation threshold for neuron state changes (default is 0.0).

The class provides methods to:

energy: Calculate the current energy of the network given a state.
learn: Update weights using a chosen learning method and training states.
infer: Evolve the network state until it stabilizes (or until max iterations).
save_weights / load_weights: Store and retrieve trained weights from a file.


          class Hnet {
          private:
              std::vector<std::vector<double>> weights;
              int n;
              double threshold = 0.0;
          
          public:
              Hnet(int n);
              Hnet(int n, double threshold);
          
              double energy(const std::vector<int>& state);
              void learn(Learning_method lm, const std::vector<std::vector<int>>& states);
              void infer(std::vector<int>& state, int max_iters = 100);
              void save_weights(const std::string& filename) const;
              void load_weights(const std::string& filename);
          };

The learn function is where the Hopfield Network actually trains. It uses the Hebbian learning rule to update the weights based on training states.

Learning method check: If the method is not Hebbian, it simply returns without doing anything.
Hebbian update: For every pair of neurons (j, k), the weight is updated using the rule (2*state[j] - 1) * (2*state[k] - 1), which reinforces same states and penalizes opposite ones.
Weight merge: After all threads finish, their local weights are combined into the main weight matrix.

In short: This function takes the training states, applies Hebbian learning in parallel, and updates the network’s weights so that the given patterns become stable energy minima.


          void Hnet::learn(Learning_method lm, const std::vector<std::vector<int>>& states) {
              if (lm != Hebbian) return;
          
              int total_states = states.size();
          
              for (int idx = 0; idx < total_states; idx++) {
                  const auto& state = states[idx];
          
                  for (int j = 0; j < n; j++) {
                      for (int k = 0; k < n; k++) {
                          if (j == k) continue;
                          weights[j][k] += (2 * state[j] - 1) * (2 * state[k] - 1);
                      }
                  }
              }
          }

Starting from an initial (possibly incomplete or noisy) state, the infer function repeatedly updates each neuron based on the weighted input from all other neurons until the network reaches a stable state or the maximum number of iterations is reached. Each neuron's new state is computed using the net weighted sum of all other neurons. The process stops early if the state stops changing (converges).


          void Hnet::infer(std::vector<int>& incomplete_state, int max_iters) {
              for (int iter = 0; iter < max_iters; iter++) {
                  std::vector<int> next_state(n);
          
                  for (int i = 0; i < n; i++) {
                      double net_input = 0.0;
          
                      for (int j = 0; j < n; j++) {
                          if (i == j) continue;
                          net_input += weights[i][j] * (2 * incomplete_state[j] - 1);
                      }
          
                      net_input -= threshold;
                      next_state[i] = net_input >= 0 ? 1 : 0;
                  }
          
                  if (next_state == incomplete_state) break;
                  incomplete_state.swap(next_state);
              }
          }

You can check out the full implementation of this code with parallelization performed on the MNIST handwritten digit recognition dataset here. My stupid-self just tried training it on all the images even though Hopfield mentioned about the $ 0.15N $ soft limit which basically means that a network with $ N $ nodes can memorize upto $ 0.15N $ patterns. So all I get is some random spurious pattern. Also one thing to mention is I found a paper where they attempted H-Net on the MNIST dataset and they mentioned the Hebbian method doesn't work well so I will try implementing the Storkey method on this.

View on GitHub

Convergence Proof

Now that we have a basic understanding of what a Hopfield Network is, how it can be used, and to some extent why it works, we can move on to a deeper question—why is it able to store memories? To answer this, we need to explore its convergence properties. Jehoshua Bruck published a paper that examines the convergence behavior of Hopfield networks in great detail. In the following section, I have drawn upon some of the theorems and proofs from his work to explain the convergence of Hopfield networks, specifically for the case of serial mode with symmetric weights. Here, serial means that the neuron states are updated one at a time, and symmetric means that $ w_{i,j} = w_{j,i} $.

Theorem: Let $ N = (W, T) $ where $ N $ is the neural network, $ W $ is the matrix of the weights of network and $ T $ is a vector of threshold values (essentially a $ 0 $ vector as given in the example) operating in serial mode. $ W $ is basically a symmetric matrix with non-negative diagonal elements (obviously zero since we are not considering self-connections). Then the network $ N $ will always converge to a stable state.

Proof:
Do note that I have considered state value to be $ \epsilon = \pm 1 $ instead of $ 0/1 $ because that is how it has been done in the paper. The idea is still the same and maybe you can try using $ 0/1 $ to write the proof on your own for fun.

Threshold vector: $ T = [ \theta_1, \theta_2, \ldots, \theta_n ]^T $
State vector at $t: V_k(t) = [ \epsilon^k_1, \epsilon^k_2, \ldots, \epsilon^k_n ]^T $

The energy function is defined as:

$$ E(t) = V(t)^T W V(t) - 2V(t)^TT $$

We can define the discrete energy change with time as $ \Delta E = E(t+1) - E(t) $ associated with the change in state

$$ \Delta \epsilon_j^k = \begin{cases} 0, & \text{if } (\epsilon^k_j(t) \times (\sum_{j} w_{i,j}\epsilon_j - \theta_j)) \ge 0 \\ 2, & \text{if } \epsilon^k_j(t) = -1 \text{ and } \sum_{j} w_{i,j}\epsilon_j - \theta_j \ge 0 \\ -2, & \text{otherwise} \end{cases} $$

Now we can formulate the change in the energy using the defintion of state change,

$$ \begin{aligned} \Delta E &= E(t+1)-E(t) \\ &= V_k(t+1)^T W V_k(t+1) - 2V_k(t+1)^T T - \bigl(V_k(t)^T W V_k(t) - 2V_k(t)^T T\bigr) \\ &= \bigl(V_k(t)+\Delta V_k\bigr)^T W \bigl(V_k(t)+\Delta V_k\bigr) - 2\bigl(V_k(t)+\Delta V_k\bigr)^T T - V_k(t)^T W V_k(t) + 2V_k(t)^T T \\ &= V_k(t)^T W V_k(t) + \Delta V_k^T W V(t) + V_k(t)^T W \Delta V_k + \Delta V_k^T W \Delta V_k - 2V_k(t)^T T - 2\Delta V_k^T T - V_k(t)^T W V_k(t) + 2V_k(t)^T T \\ &= \Delta V_k^T W V_k(t) + V_k(t)^T W \Delta V_k + \Delta V_k^T W \Delta V_k - 2\Delta V_k^T T \\ &= 2\,\Delta V_k^T W V_k(t) - 2\,\Delta V_k^T T + \Delta V_k^T W \Delta V_k \qquad(\text{since } W=W^T)\\ &= 2\,\Delta V_k^T\bigl(WV_k(t)-T\bigr) + \Delta V_k^T W \Delta V_k\\ &= 2\, \sum_i \sum_j (w_{i,j} \epsilon_j^k - \theta_j) \Delta \epsilon_i^k + \sum_i \sum_j w_{i,j} \Delta \epsilon_j^k \Delta \epsilon_i^k \end{aligned} $$

We are doing this computation in the serial mode, a let us say we do it at the neuron i for instance then $ \Delta \epsilon_j^k = 0$ if $ j \neq i$,

$$ \begin{aligned} \Delta E &= 2\, \sum_i \sum_j (w_{i,j} \epsilon_j^k - \theta_j) \Delta \epsilon_i^k + \sum_i w_{i,i} \Delta (\epsilon_j^k)^2\\ \end{aligned} $$

It is easy to see from the definition of $ \Delta \epsilon_j^k $ that the first term is always positive or zero and as we know that diagonal elements of $ W $ are non-negative, we can say $ \Delta E \ge 0$ for every neuron. We also that the energy function is defined on a finite state space for a given neural network which means it will be a bounded function.
$\therefore $ E will always converge.

References

Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8), 2554–2558. https://doi.org/10.1073/pnas.79.8.2554
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), 115–133. https://doi.org/10.1007/BF02478259
Bruck, J. (1990). On the convergence properties of the Hopfield model. Proceeings of the IEEE, 78(10), 1579–1585. https://doi.org/10.1109/5.58341
Storkey, A. (1997). Increasing the capacity of a Hopfield network without sacrificing functionality. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Artificial Neural Networks — ICANN’97 (Lecture Notes in Computer Science, vol 1327, pp. 451–456). Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020196
Uykan, Z. (2020). On the working principle of the Hopfield Neural Networks and its equivalence to the GADIA in optimization. IEEE Transactions on Neural Networks and Learning Systems, 31(9), 3294–3304. https://doi.org/10.1109/TNNLS.2019.2940920
Hopfield, J. J., & Tank, D. W. (1985). Computing with neural circuits: A model. Science, 233(4764), 625–633. https://doi.org/10.1126/science.3755256

Hopfield Networks

Introduction

Pseudo-Etymology