Home

Hopfield Networks

Introduction

Working in robotics naturally pulled me into the world of AI—because in today’s era, intelligence isn’t just about code, it’s about giving machines a body to interact, learn, and act in the real world. I never followed a strict, textbook path to learn machine learning; instead, I stumbled into it, explored out of curiosity, and discovered how exciting the journey could be.

What fascinates me most is not just how AI works today, but how these ideas first took shape—the roots of the complex architectures and learning algorithms we rely on. That curiosity eventually led me to dive into Associative Memory Models, better known as Hopfield Networks — a concept that bridges neuroscience, physics, and AI in a way that feels both timeless and futuristic.

One fascinating detail worth mentioning is that J.J. Hopfield was jointly awarded the 2024 Nobel Prize in Physics alongside Geoffrey E. Hinton, recognized for their “foundational discoveries and inventions that enable machine learning with artificial neural networks.”

The ideas I’m exploring here stand on the shoulders of that legacy. They are not just the work of a single mind but the result of decades of collective effort—researchers, physicists, neuroscientists, and computer scientists—each contributing to our understanding of how the brain works and how we can model intelligence artificially. Without their breakthroughs, the journey from studying neurons to building artificial intelligence would not have been possible.

To put it simply, a Hopfield Network can be thought of as a network of binary units (like little switches that are either on or off) connected by weighted links. These weights are carefully set so that the network “remembers” certain stable patterns—like snapshots of information stored in its memory.

Once trained, the network can reconstruct an entire memory from just a small fragment of it. This mirrors how our own brains work: for example, catching just a faint smell of a dish might instantly bring back the memory of a specific food or even the moment you last had it. In this way, Hopfield Networks don’t just store data—they recreate the feeling of memory recall itself.
Rene and Bot

Pseudo-Etymology

The Hopfield network in itself is described as a delocalized content-addressable memory or categorizer using extensive asynchronous parallel computing

History

Here’s a quick journey through the key milestones—the discoveries and breakthroughs that paved the way for the creation of Hopfield Networks and beyond.

Rene and Bot

Ising Model

Just like some of today’s popular machine learning models—such as diffusion models and physics-informed neural networks (PINNs)—Hopfield Networks are also grounded in physics. The goal was to create a general content-addressable memory, and in his original paper, Hopfield pointed out that any physical system that naturally settles into stable configurations can serve as a model.

One of the key inspirations was the Ising Model, introduced to describe magnetism in ferromagnetic materials. In this model, every unit of the material carries a tiny “spin” (either +1 or –1). The overall energy of the system depends on how these spins are arranged. When placed in a magnetic field, the spins tend to align in ways that lower the energy—pushing the system towards stability.

This is more than just theory—you can experiment with it! Below, try clicking on the spins in the interactive setup to see how the system’s energy shifts. Notice how each dipole (spin) only “talks” to its neighbors, yet together they form a collective behavior. That same principle of local interactions leading to global order is what Hopfield leveraged to design his networks.

Total Energy: 0

The Ising model defines energy (without any external magnetic field) as:
$$ E = -\sum_{\langle i, j \rangle}J_{i, j}\sigma_{i}\sigma_{j} $$
The central idea behind Hopfield Networks is the concept of energy minimization. A memory is introduced as input, and the network adjusts its interaction parameters so that the overall energy of the system is reduced. Stable states, which represent stored memories, correspond to these minimum-energy configurations.

This principle effectively serves as the loss function for the network. Structurally, Hopfield Networks borrow from the Ising Model: a graph of interconnected binary units (0 or 1), where the state of each unit influences its neighbors. Through these interactions, the network converges toward a stable pattern that encodes the memory.

Shown below is a 3D plot of example energy function. The minima will correspond to the stable memory location. So if the network was queried at a region near a particular minimum (i.e. partial memory), it would eventually converge to a local minimum to give the full memory as the output.

Learning

Now that we have defined an energy-based loss function, we need a method to optimize the network parameters so that the system can learn effectively. Several approaches exist, but here we focus on one of the most widely recognized methods. It is important to note that Hopfield Networks employ symmetric weights (the weight from neuron i to j is the same as from j to i) to guarantee the convergence of energy values.

Hebbian Learning

Donald Hebb introduced the Hebbian theory, which describes how the simultaneous activation of two neurons strengthens the synaptic connection between them. This is often summarized as: “neurons that fire together, wire together.” In the context of Hopfield Networks, this principle is used to update the weights of the network. The weight for each connection is calculated as:

$$ w_{i,j} = \frac{1}{n} \sum_{\mu=1}^{n} (2 \epsilon_{i}^{\mu} - 1) \, (2\epsilon_{j}^{\mu}-1) $$
Let’s imagine we have just two simple neurons. From the formula, you can see that: This clever setup ensures that the system always prefers lower energy levels, pulling it toward stable points (minima). In other words, the weights are chosen so that each possible state naturally settles into one of the energy valleys, just like a ball rolling down into the lowest part of a hill. Since we are dealing with binary values of the state (1 and 0 instead of 1 and -1), the above energy can be rewritten as
$$ E = -\frac{1}{2} \sum_{i}\sum_{j, j \neq i} w_{i,j} (2\epsilon_i -1)(2\epsilon_j-1). $$

One thing to observe is the weird \(2 \epsilon - 1\) instead of just using \( \epsilon \). Well that is just a way to accomodate the states to still give the same energy irrespective of whether we go with the \( \pm 1 \) representation or the binary representation \( (1, 0) \).

Memory Retrieval

We’ve seen how the network learns to store memories using the mathematical framework discussed earlier. But how do we actually retrieve a stored memory once the network is trained?

Hopfield, in his original paper, proposed a binary activation update rule for neurons. The idea is simple: if we provide a partial or noisy memory as input, each neuron updates its state based on the weighted contributions from all the other connected neurons. If this summed input crosses a certain threshold, the neuron activates (1); otherwise, it deactivates (0).

The update rule can be written as:

$$ \epsilon_i = \begin{cases} 1, & \text{if } \sum_{j} w_{i,j}\epsilon_j - \theta \ge 0 \\ 0, & \text{otherwise} \end{cases} $$

Where:

By repeatedly applying this rule, the network gradually converges from the given partial input to the closest stored memory pattern — effectively completing the memory.

Implementation

The Hnet class below represents a basic Hopfield Network model in C++. It stores the connection strengths between neurons (the weights), keeps track of the number of neurons (n), and uses a threshold value to determine activation. The main components are:

The class provides methods to:


          class Hnet {
          private:
              std::vector<std::vector<double>> weights;
              int n;
              double threshold = 0.0;
          
          public:
              Hnet(int n);
              Hnet(int n, double threshold);
          
              double energy(const std::vector<int>& state);
              void learn(Learning_method lm, const std::vector<std::vector<int>>& states);
              void infer(std::vector<int>& state, int max_iters = 100);
              void save_weights(const std::string& filename) const;
              void load_weights(const std::string& filename);
          };
        

The learn function is where the Hopfield Network actually trains. It uses the Hebbian learning rule to update the weights based on training states.

In short: This function takes the training states, applies Hebbian learning in parallel, and updates the network’s weights so that the given patterns become stable energy minima.


          void Hnet::learn(Learning_method lm, const std::vector<std::vector<int>>& states) {
              if (lm != Hebbian) return;
          
              int total_states = states.size();
          
              for (int idx = 0; idx < total_states; idx++) {
                  const auto& state = states[idx];
          
                  for (int j = 0; j < n; j++) {
                      for (int k = 0; k < n; k++) {
                          if (j == k) continue;
                          weights[j][k] += (2 * state[j] - 1) * (2 * state[k] - 1);
                      }
                  }
              }
          }
        

Starting from an initial (possibly incomplete or noisy) state, the infer function repeatedly updates each neuron based on the weighted input from all other neurons until the network reaches a stable state or the maximum number of iterations is reached. Each neuron's new state is computed using the net weighted sum of all other neurons. The process stops early if the state stops changing (converges).


          void Hnet::infer(std::vector<int>& incomplete_state, int max_iters) {
              for (int iter = 0; iter < max_iters; iter++) {
                  std::vector<int> next_state(n);
          
                  for (int i = 0; i < n; i++) {
                      double net_input = 0.0;
          
                      for (int j = 0; j < n; j++) {
                          if (i == j) continue;
                          net_input += weights[i][j] * (2 * incomplete_state[j] - 1);
                      }
          
                      net_input -= threshold;
                      next_state[i] = net_input >= 0 ? 1 : 0;
                  }
          
                  if (next_state == incomplete_state) break;
                  incomplete_state.swap(next_state);
              }
          }
        
You can check out the full implementation of this code with parallelization performed on the MNIST handwritten digit recognition dataset here. My stupid-self just tried training it on all the images even though Hopfield mentioned about the \( 0.15N \) soft limit which basically means that a network with \( N \) nodes can memorize upto \( 0.15N \) patterns. So all I get is some random spurious pattern. Also one thing to mention is I found a paper where they attempted H-Net on the MNIST dataset and they mentioned the Hebbian method doesn't work well so I will try implementing the Storkey method on this.

Convergence Proof

Now that we have a basic understanding of what a Hopfield Network is, how it can be used, and to some extent why it works, we can move on to a deeper question—why is it able to store memories? To answer this, we need to explore its convergence properties. Jehoshua Bruck published a paper that examines the convergence behavior of Hopfield networks in great detail. In the following section, I have drawn upon some of the theorems and proofs from his work to explain the convergence of Hopfield networks, specifically for the case of serial mode with symmetric weights. Here, serial means that the neuron states are updated one at a time, and symmetric means that \( w_{i,j} = w_{j,i} \).

Theorem: Let \( N = (W, T) \) where \( N \) is the neural network, \( W \) is the matrix of the weights of network and \( T \) is a vector of threshold values (essentially a \( 0 \) vector as given in the example) operating in serial mode. \( W \) is basically a symmetric matrix with non-negative diagonal elements (obviously zero since we are not considering self-connections). Then the network \( N \) will always converge to a stable state.

Proof:
Do note that I have considered state value to be \( \epsilon = \pm 1 \) instead of \( 0/1 \) because that is how it has been done in the paper. The idea is still the same and maybe you can try using \( 0/1 \) to write the proof on your own for fun.

Threshold vector: \( T = [ \theta_1, \theta_2, \ldots, \theta_n ]^T \)
State vector at \(t: V_k(t) = [ \epsilon^k_1, \epsilon^k_2, \ldots, \epsilon^k_n ]^T \)

The energy function is defined as:

$$ E(t) = V(t)^T W V(t) - 2V(t)^TT $$

We can define the discrete energy change with time as \( \Delta E = E(t+1) - E(t) \) associated with the change in state
$$ \Delta \epsilon_j^k = \begin{cases} 0, & \text{if } (\epsilon^k_j(t) \times (\sum_{j} w_{i,j}\epsilon_j - \theta_j)) \ge 0 \\ 2, & \text{if } \epsilon^k_j(t) = -1 \text{ and } \sum_{j} w_{i,j}\epsilon_j - \theta_j \ge 0 \\ -2, & \text{otherwise} \end{cases} $$
Now we can formulate the change in the energy using the defintion of state change,
$$ \begin{aligned} \Delta E &= E(t+1)-E(t) \\ &= V_k(t+1)^T W V_k(t+1) - 2V_k(t+1)^T T - \bigl(V_k(t)^T W V_k(t) - 2V_k(t)^T T\bigr) \\ &= \bigl(V_k(t)+\Delta V_k\bigr)^T W \bigl(V_k(t)+\Delta V_k\bigr) - 2\bigl(V_k(t)+\Delta V_k\bigr)^T T - V_k(t)^T W V_k(t) + 2V_k(t)^T T \\ &= V_k(t)^T W V_k(t) + \Delta V_k^T W V(t) + V_k(t)^T W \Delta V_k + \Delta V_k^T W \Delta V_k - 2V_k(t)^T T - 2\Delta V_k^T T - V_k(t)^T W V_k(t) + 2V_k(t)^T T \\ &= \Delta V_k^T W V_k(t) + V_k(t)^T W \Delta V_k + \Delta V_k^T W \Delta V_k - 2\Delta V_k^T T \\ &= 2\,\Delta V_k^T W V_k(t) - 2\,\Delta V_k^T T + \Delta V_k^T W \Delta V_k \qquad(\text{since } W=W^T)\\ &= 2\,\Delta V_k^T\bigl(WV_k(t)-T\bigr) + \Delta V_k^T W \Delta V_k\\ &= 2\, \sum_i \sum_j (w_{i,j} \epsilon_j^k - \theta_j) \Delta \epsilon_i^k + \sum_i \sum_j w_{i,j} \Delta \epsilon_j^k \Delta \epsilon_i^k \end{aligned} $$

We are doing this computation in the serial mode, a let us say we do it at the neuron i for instance then \( \Delta \epsilon_j^k = 0\) if \( j \neq i\),

$$ \begin{aligned} \Delta E &= 2\, \sum_i \sum_j (w_{i,j} \epsilon_j^k - \theta_j) \Delta \epsilon_i^k + \sum_i w_{i,i} \Delta (\epsilon_j^k)^2\\ \end{aligned} $$

It is easy to see from the definition of \( \Delta \epsilon_j^k \) that the first term is always positive or zero and as we know that diagonal elements of \( W \) are non-negative, we can say \( \Delta E \ge 0\) for every neuron. We also that the energy function is defined on a finite state space for a given neural network which means it will be a bounded function.
\(\therefore \) E will always converge.

Conclusion

Hopfield Networks may seem like a relic of early neural network research, but their core ideas continue to ripple through modern AI — from associative memory mechanisms to modern attention-based architectures. What makes them remarkable is not just their ability to store and retrieve patterns, but how they bridge diverse disciplines: the physics of energy minimization, the biology of neural plasticity, and the mathematics of dynamical systems.

While they are no longer the state-of-the-art for large-scale pattern recognition, they remain a powerful mental model — a reminder that intelligence can emerge from simple local interactions when embedded in the right structure. For anyone curious about where neural networks came from, studying Hopfield Networks is like looking at the DNA of modern AI.

This blog only scratches the surface. I plan to keep updating this post and experiment with extensions like continuous Hopfield networks and the Storkey learning rule, and see how these early architectures can still surprise us today.

References

  1. Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8), 2554–2558. https://doi.org/10.1073/pnas.79.8.2554
  2. McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), 115–133. https://doi.org/10.1007/BF02478259
  3. Bruck, J. (1990). On the convergence properties of the Hopfield model. Proceeings of the IEEE, 78(10), 1579–1585. https://doi.org/10.1109/5.58341
  4. Storkey, A. (1997). Increasing the capacity of a Hopfield network without sacrificing functionality. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Artificial Neural Networks — ICANN’97 (Lecture Notes in Computer Science, vol 1327, pp. 451–456). Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020196
  5. Uykan, Z. (2020). On the working principle of the Hopfield Neural Networks and its equivalence to the GADIA in optimization. IEEE Transactions on Neural Networks and Learning Systems, 31(9), 3294–3304. https://doi.org/10.1109/TNNLS.2019.2940920
  6. Hopfield, J. J., & Tank, D. W. (1985). Computing with neural circuits: A model. Science, 233(4764), 625–633. https://doi.org/10.1126/science.3755256

Comments