• lofi papers
  • Posts
  • ✅ Let your Neural Network pick the best activation function

✅ Let your Neural Network pick the best activation function

The Problem

Multi-layer perceptrons (MLPs) that have fixed activation functions on their nodes, lack interpretability and require many neurons to perform. Kolmogorov-Arnold Networks (KANs) as an alternative, utilize learnable activation functions.

The Solution

KANs replace the fixed activation functions with learnable activation functions represented as splines (polynomial functions used to approximate complex functions, they are defined within specific intervals and smoothly connected at their boundaries).

Kolmogorov-Arnold Networks (KANs) break down complex functions into simpler, one-variable functions. The splines serve two primary purposes:

  1. Flexibility: Unlike standard activation functions like ReLU or sigmoid, splines can adapt their shape to fit the data better

  2. Smoothness: The smooth transitions at the knots ensure that the function is differentiable, which is essential for gradient-based optimisation (training of the neural network)

Learning new things without forgetting the old ones

Traditional neural networks are prone to catastrophic forgetting, when a neural network is trained on a new task, it often forgets how to perform the tasks it was previously trained on.

KANs leverage the locality property of splines, meaning that updates to the network parameters affect only local regions.

In a toy example with a sequence of Gaussian peaks, KANs were able to retain knowledge of previously learned peaks while learning new ones, whereas MLPs suffered from catastrophic forgetting.

Conclusions

The paper demonstrates the effectiveness of KANs through various experiments:

  1. Function Fitting: KANs are tested on function fitting tasks, they outperform traditional MLPs.

  2. Scaling Laws: KANs can achieve high accuracy without the need for excessively large models.

  3. Interpretability: By visualizing the learned splines, KANs provide better interpretability compared to MLPs.

Reply

or to participate.