Activation Functions: With Real-life analogy and Python Code

Activation Functions: With Real-life analogy and Python Code

8 mins read1.6K Views Comment
Updated on Oct 5, 2023 10:02 IST

In this article you will learn about Activation Functions with real life analogy .You will also get answer that why they are needed and what their types.


Activation functions are an important component of neural networks. They help to determine the output of a neural network by applying mathematical transformations to the input signals received from other layers in a network. Activation functions allow for complex non-linear relationships between input and output data points. 

The choice of which function is appropriate depends largely upon the problem one is trying to solve with their model and any constraints imposed by hardware capabilities or time/space limitations. However this article will guide you on which activation function to use and when. You will also learn about activation functions with real-life analogy.

Table of contents

What are Activation Functions?

A neural network activation function is a function that introduces nonlinearity into the model.

A neural network has multiple nodes in each layer, and in a fully connected network, every node in one layer is connected to every node in the next layer. First, let’s look at computing the value of the first neuron in the second layer. Each neuron in the first layer is multiplied by a weight (the weight is learned by training), the multiplied values ​​are added, and the sum is added to the bias (the bias is also learned).

Difference Between Deep Learning and Machine Learning
Difference Between Deep Learning and Machine Learning
Have you ever wondered how your phone knows your voice? Or how Netflix suggests movies you might like? Two powerful tools work this magic: deep learning and machine learning. more
Different Types of Neural Networks in Deep Learning
Different Types of Neural Networks in Deep Learning
Neural networks in deep learning are extensively used in solving problems in supervised learning and reinforcement learning. The functioning of the neurons in our brain inspires neural networks. There more
Introduction to Recurrent neural networks and the Math behind them
Introduction to Recurrent neural networks and the Math behind them
Recurrent neural networks (RNNs) are a type of neural network that can process sequential data. They are particularly useful for tasks such as language translation and speech recognition, where more

Learn more – What is Deep Learning?

Explore DL courses

Real-life analogy for Activation Functions

Imagine that a neural network is a hose:

It takes to water (takes some input), carries it somewhere (modifies your input), and pushes the water out (produces some output).

Without an activation function, your hose will act more like a steel pipe: fixed and inflexible. Sometimes that’s good enough. Nothing wrong with using a pipe to deliver your water:

A rigid steel pipe won’t fit, no matter how you rotate it. An activation function is handy here because it allows your function to be more flexible.

In this case, a neural net with an activation function would act like a plastic garden hose. You can bend it to your specific needs and carry your water to a lot more places that are impossible to reach with a steel pipe:

So, the purpose of an activation function is to add flexibility to your hose (nonlinearity to your neural net).

Free Deep Learning Courses from Top e-learning Platforms

Why use activation functions?

1. Activation functions’ main objective is to add non-linearities into the network so that it can model more intricate and varied interactions between inputs and outputs. In the absence of activation functions, the network would only be capable of performing linear transformations, which cannot adequately represent the complexity and nuances of real-world data. Since neural networks need to implement complex mapping functions, non-linear activation functions must be used to introduce the much-needed nonlinearity property that allows approximating any function. 

2. Normalizing each neuron in the network’s output is a key benefit of utilizing activation functions. Depending on the inputs it gets and the weights associated to those inputs, a neuron’s output can range from extremely high to extremely low. Activation functions make ensuring that each neuron’s output falls inside a defined range, which makes it simpler to optimise the network during training.

Types of Activation Functions

Sigmoid activation function

The sigmoid activation function is a mathematical function used in artificial neural networks to classify information. It maps any input onto a value between 0 and 1, which can then be interpreted as either true or false. A common example of this is when an image recognition system needs to decide whether an object in the image is a cat or not; if the output from the sigmoid activation function for that particular object exceeds 0.5, then it’s classified as being “cat-like”; otherwise, it isn’t. The advantage of using this activation function over others lies in its ability to smooth out data points so that small variations don’t affect results too much – making predictions more reliable overall. 

Note: This function suffers from vanishing gradient problems.

Tanh activation function

The tanh (hyperbolic tangent) activation functions are similar but have some distinct differences compared with sigmoid: instead of mapping inputs onto values between -1 and 1 rather than just 0/1 like Sigmoid do – allowing for more nuance when classifying objects into categories based on their similarity scores across all features considered by the network at once (i.e., multi-dimensional classification). Tanh also has better gradient properties than Sigmoid functions – meaning they allow faster learning rates during training because gradients can be propagated back through layers with less distortion due to curvature effects along each axis (as opposed to flat lines like those produced by Sigmoid). This makes them ideal for deep learning applications where accuracy matters most! 

Note: This function suffers from vanishing gradient problems.

Softmax activation function

Softmax functions are often written as a combination of multiple sigmoid. We know that Sigmoid returns a value between 0 and 1. This can be treated as the probability of a data point belonging to a particular class. Therefore, sigmoid are often used for binary classification problems.

The softmax function can be used for multiclass classification problems. This function returns the probability of a data point belonging to each unique class. Here is the formula for the same −

ReLU activation function

In the case of hidden layers, Relu is the most effective option to use. It is computationally very effective. It also suffers from a vanishing gradient problem as if the value is less than 0. Then the output will be 0 means constant.

Note: If you need more clarification about your choice of activation function, especially for hidden layers, go for the Relu function.

Leaky ReLU activation function

Leaky ReLU is the most popular and effective way to solve the dying ReLU problem. Adds a small slope(as shown in fig.) in the negative direction to prevent ReLU problems from disappearing. Leaky Relu is a variant of ReLU. Instead of being 0 for z < 0, leaky ReLUs allow a small constant non-zero gradient α (typically α = 0.01).

RELU and SIGMOID Activation Functions in a Neural Network
RELU and SIGMOID Activation Functions in a Neural Network
Activation functions are one of the most important features of an artificial neural network. In this article we will briefly discuss about activation function and two of important types more
What is a multilayer perceptron (MLP) neural network?
What is a multilayer perceptron (MLP) neural network?
An MLP is a type of feedforward artificial neural network with multiple layers, including an input layer, one or more hidden layers, and an output layer. Each layer is more
A Comprehensive Guide to Convolutional Neural Networks
A Comprehensive Guide to Convolutional Neural Networks
CNN is a supervised deep neural network that is used in deep learning. In this article we will learn the architecture of CNN, hyperparameters used in CNN and the more

Exponential linear units (ELU)

The Exponential Linear Unit (ELU) function is an AF that is also used to speed up the training of neural networks (similar to the ReLU function). The main advantage of the ELU function is that using identities for positive values ​​solves the vanishing gradient problem and improves the learning properties of the model.

Where” “is the ELU hyperparameter, which is normally set to 1.0, and controls the saturation point for net negative inputs. The ELU function does have one drawback, though. not centred on zero.

ELU has a negative value, which brings the average unit activation closer to zero, reduces computational complexity, and improves learning speed. ELU is a great alternative to ReLU. Reduce the bias shift by bringing the average activation closer to zero during training.

How to choose activation functions?

Consideration Activation Function
Non-linearity Sigmoid, Tanh, ReLU, Leaky ReLU, ELU, SELU
Derivability Sigmoid, Tanh, ReLU, Leaky ReLU, ELU, SELU
Range of output values Sigmoid, Softmax
Computational efficiency ReLU, Leaky ReLU, ELU, SELU
Saturation ReLU, Leaky ReLU, ELU, SELU

Other points to remember

  • If the network is being used for binary classification, a sigmoid function with an output range between 0 and 1 would be suitable.
  • For multiclass classification-Softmax activation function.
  • For other tasks such as anomaly detection, recommendation systems, or reinforcement learning, other activation functions such as the ReLU or the tanh functions may be used, depending on the specifics of the problem.
  • Some activation functions, such as sigmoid and tanh, may saturate at extreme values, leading to slower learning. In such cases, it may be better to use a function that does not saturate, such as ReLU.
  • For the hidden layer the best choice would be ReLU

Note: Other activation functions are available besides those listed here, and the choice of the optimal activation function may depend on the specific problem and neural network architecture.  

Activation function Python code

import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def tanh(x):
return np.tanh(x)
def relu(x):
return np.maximum(0, x)
def leaky_relu(x, alpha=0.01):
return np.maximum(alpha * x, x)
def softmax(x):
exp_x = np.exp(x)
return exp_x / np.sum(exp_x)
Copy code

These functions use the NumPy library to perform element-wise operations on arrays. Here are some examples of how to use these functions:

x = np.array([-1, 0, 1])
# Sigmoid
print(sigmoid(x)) # [0.26894142 0.5 0.73105858]
# Tanh
print(tanh(x)) # [-0.76159416 0. 0.76159416]
# ReLU
print(relu(x)) # [0 0 1]
# Leaky ReLU
print(leaky_relu(x)) # [-0.01 0. 1. ]
# Softmax
print(softmax(x)) # [0.09003057 0.24472847 0.66524096]
Copy code


[0.26894142 0.5 0.73105858]

[-0.76159416 0. 0.76159416]

[0 0 1]

[-0.01 0. 1. ]

[0.09003057 0.24472847 0.66524096]

Code explanation

  1. Sigmoid function: The sigmoid function is a widely used tool for binary classification problems, where it maps any input value to a value between 0 and 1. This allows us to interpret the output as representing the probability of a positive class. 
  2. tanh function: It is similar in structure but produces values with larger range, mapping inputs between -1 and 1. 
  3. ReLU (Rectified Linear Unit) functions: They return the maximum of input or zero if it is negative. A variant called leaky ReLU prevents neurons from always outputting zero by returning either their inputs or their inputs multiplied by a small positive constant instead when negative values are encountered.
  4. Softmax function: It returns a probability distribution over all possible classes given its inputs; this can be useful for multiclass classification tasks such as image recognition, where multiple objects could appear in one image simultaneously.

Summary chart

Summary table

Activation Function Plot Equation Derivative
Sigmoid f(x) = 1 / (1 + e^(-x)) f'(x) = f(x) * (1 – f(x))
Softmax f(x_i) = e^(x_i) / sum(e^(x_j)), for all j The derivative is complicated and depends on the function used
Tanh f(x) = (e^(x) – e^(-x)) / (e^(x) + e^(-x)) f'(x) = 1 – f(x)^2
ReLU f(x) = max(0, x) f'(x) = 1 if x > 0; 0 otherwise
Leaky ReLU f(x) = max(0.01x, x) f'(x) = 1 if x > 0; 0.01 otherwise

Note: In the Softmax activation function, “i” and “j” denote the input and output dimensions of the function.


An activation function is very important in solving complicated problems. This is necessary for our model to perform well on non-linear data problems. We tried to describe all important activation functions using mathematical formulas and Python code. If this helps you and you want to learn more about such concepts, please motivate us by liking and sharing with your friends.



About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio