Artificial neural networks have been a hot topic in the tech industry as generative AI and Machine Learning applications have shown great potential and intrigued businesses in all industries.
However, while neural networks exhibit great promise, catastrophic forgetting, also known as catastrophic interference, can cause performance degradation and cause a neural network to forget previously learned tasks.
Obviously, if a neural network forgets previously trained tasks, the ramifications for your business could be grave. This post will explain what catastrophic forgetting is in neural networks, how it occurs, and whether your business should be concerned about it.
What Is Catastrophic Forgetting?
Catastrophic forgetting occurs when a neural network suddenly or drastically forgets a previously learned task or information upon learning a new task or new information.
Artificial neural networks are often responsible for multiple tasks, so when learning new tasks causes a neural network to forget previous tasks, this catastrophic interference represents a significant problem. Hence, the use of the descriptor is catastrophic.
Deep neural networks are modeled after the way scientists believe the human brain works and learns new information. However, the human brain doesn’t tend to replace old information with new information.
We can learn complex tasks, develop additional skills and knowledge as we grow, and differentiate between information and tasks. Scientists refer to this ability of the brain as plasticity.
For example, when you learn the difference between the colors red and blue, you can also learn about the colors green, orange, yellow, white, etc. While this is easy for the human brain, there is a continual learning problem for neural networks.
Understanding How Catastrophic Forgetting Occurs
Neural networks create pathways between nodes. This is modeled after the way scientists believe that the human brain learns. During training sessions, neural networks create pathways based on the data being fed to them.
When new information is fed to the neural network, it creates new pathways, and sometimes, as new pathways are formed, old neural pathways are eliminated, causing the network to forget previously learned information.
There are degrees of forgetting. Sometimes, the amount of information or task-related data that is forgotten is minimal. Catastrophic forgetting is the complete loss of information.
As more data scientists enable continual learning in neural network training, catastrophic interference needs to be planned for to ensure that task performance does not degrade to the point where it causes issues for the organization or end users.
Can Catastrophic Forgetting Be Eliminated?
The threat of catastrophic interference can never be entirely eliminated from a standard backpropagation neural network. However, some strategies can be effectively utilized to reduce the occurrence of catastrophic forgetting in neural networks.
Since the primary cause of catastrophic forgetting seems to be an overlap in the representations in a neural network’s hidden layers, avoiding catastrophic forgetting requires further training and specific network weights.
If you are interested in learning how researchers and developers have addressed the issue of catastrophic interference in artificial neural networks, take a closer look at the solutions that have demonstrated real success in reducing the threat of catastrophic forgetting, including:
- Orthogonal vectors
- Novelty rules
- Pre-training
- Node sharpening
- Rehearsal mechanisms
- Elastic weight consolidation
- Latent learning
Orthogonal Vectors
One of the earliest solutions to combat catastrophic interference caused by representational overlap is the utilization of orthogonal vectors. Researchers discovered that interference between sequentially learned patterns was reduced when the input vectors were orthogonal to one another.
When are input vectors considered to be orthogonal? Data scientists consider input vectors to be orthogonal when the product of their elements across the vectors sums to zero. One of the most effective ways to create orthogonal representations in the hidden layers of a neural network is to use “Bipolar coding.”
Instead of following traditional binary code using ones and zeros, bipolar coding uses negative ones and ones. Thus creating orthogonal patterns that sum to zero and minimize the chances of catastrophic forgetting.
While orthogonal patterns have demonstrated less interference between each other, not every learning problem can be represented with orthogonal vectors. In addition, for some learning problems, orthogonal vectors can still cause interference issues.
Novelty Rules
Novelty rules are another approach to reducing catastrophic forgetting that addresses the learning rules during the training phase of neural networks. Essentially, the novelty rule instructs the neural network to only learn the elements of the new input data that differ from the old input data it has received.
The idea is that a novelty rule will reduce representational overlap and ensure that new input data does not replace the old input data. While a novelty rule implemented during the learning phase seems like the perfect solution to catastrophic interference, there is one key issue.
Novelty rules can only be applied to auto-associative or auto-encoder neural networks where the target output matches the input patterns. This limits the effectiveness of novelty rules when applying them to artificial neural networks.
Pre-Training
Pre-training is another innovative approach to Machine Learning that looks to the human experience for inspiration. Humans do not take on learning new tasks with a random set of weights.
Instead, humans bring a wealth of experience and prior knowledge into learning a new task, which not only helps inform the new task they are learning but also solves the problem of interference.
Pre-training is an approach to neural network training that feeds the network a random sample of data before beginning a sequential training task. This collection of sample data will constrain how the network incorporates the data it is given during training.
This method of reducing catastrophic forgetting is effective when training data is highly structured. For example, the English language has a high degree of internal structure. Pre-training a language model enables it to incorporate new data with little to no interference with existing data.
When the new input data follows the patterns as the data it was previously trained on, the neural network should not experience a wildly different node activation path or drastically alter the weights of previous information.
Node Sharpening
Catastrophic forgetting occurs when node activations overlap. So, how can you reduce the chances of node activation overlap and thus reduce the chances of catastrophic interference?
It has been proposed that neural networks that utilize localized representations exhibit catastrophic forgetting at a much lower rate because there is little to no representational overlap in the hidden layer.
The goal of node sharpening is to reduce the value of activation overlap at the hidden layer level to effectively reduce the instances of catastrophic forgetting in distributed neural networks.
This is achieved by slightly increasing the activation of the network’s most active hidden layer nodes and slightly reducing the activation of all other units and nodes.
In addition, the neural network’s input-to-hidden layer weights must be adjusted to match the adjusted levels of node activity caused by sharpening.
The node sharpening approach to address catastrophic forgetting resembles the error backpropagation technique.
Rehearsal Mechanisms
One of the most effective approaches to reduce catastrophic forgetting is rehearsal. Rehearsal involves re-training the neural network on some of the previously learned information when new data is introduced.
While previously learned data might not always be available for re-training purposes, a representation of this data can be used for rehearsal. There are many approaches to the rehearsal mechanism that have been used effectively.
Some of the most popular rehearsal methods include:
Pseudo-Recurrent Networks
In this approach, the neural network is separated into two distinct sub-networks. One sub-network is used for short-term memory or learning new training data, while the other is used for long-term memory storage.
The long-term sub-network can then send information back to the short-term sub-network, creating a recurrent network. This type of information sharing between sub-networks is known as interleaved training, and it is an effective method for reducing catastrophic forgetting.
Generative Replay
The emergence and success of generative Artificial Intelligence has been used to develop a new rehearsal method called generative replay. Generative AI models are used to develop rehearsal data. Hence the name, generative replay.
Generative replay is an effective rehearsal method when the replay is performed in the hidden layer instead of the input layer. This is one of the newest approaches to addressing catastrophic forgetting, and it has already proven effective.
Elastic Weight Consolidation
Elastic weight consolidation is a training technique that can be used to sequentially train an artificial neural network on multiple tasks. The elastic weight consolidation technique allows some weights for previously learned tasks to be more critical than others.
During training, changes to a network’s weights are made more unlikely based on the degree of their importance. Thus significantly reducing the likelihood of catastrophic forgetting. The elastic weight consolidation method uses probabilistic mechanisms to determine the importance of each weight.
Latent Learning
Latent learning is a technique to reduce the likelihood of catastrophic forgetting by using transfer learning. The latent learning approach looks for optimal ways to code new information so that it will not interfere with existing information or responses catastrophically.
Latent learning ensures that a neural network’s response to newly learned information is consistent with the information it has already learned. The result of this approach is far fewer instances of catastrophic forgetting.
Final Thoughts
Catastrophic interference should be a big concern for any organization with a neural network. If you are developing and training your own neural network, you must ensure that catastrophic forgetting is limited as much as possible.
When built efficiently and trained correctly, neural networks shouldn’t experience any catastrophic forgetting events. If you want to learn more about neural networks or catastrophic forgetting, contact a skilled app development partner like Koombea.