RELU and SIGMOID Activation Functions in a Neural Network

RELU and SIGMOID Activation Functions in a Neural Network

10 mins read8K Views Comment
Updated on Apr 12, 2023 11:19 IST

Activation functions are one of the most important features of an artificial neural network. In this article we will briefly discuss about activation function and two of important types ReLU and Sigmoid Activation functions in complete detail.

This blog has a very well-seasoned approach with the Sigmoid and Relu Activation functions, as they have been cited as examples from my real-time notebook in Kaggle data source with the underlying source code in python for positive and negative slopes as well as gradient descent alongside the slopes and drills down to the biological neurons as they send spikes to other neurons and send impulses whether to activate the other neurons or not which is on or off. And thus would coin the term Activation function. In this article we will explore Relu and Sigmoid function.

So, how do you define an Activation function?

Let’s explore the article

What is an Activation Function?

An activation function is a very important feature of an artificial neural network that is used to determine what to do with the neurons, i.e., whether they will be activated or not. In artificial neural networks, the activation function defines the output of that node given an input set or just a single input.

One of the reasons for evincing interest in us is the hope to understand our mind, which emerges from neural processing in our brain. Advances in machine learning have been achieved in recent years by combining massive data sets and deep learning techniques.is yet another reason.

Types of Activation Functions

• Linear Activation Function
• Non-Linear Activation Function
• Sigmoid Activation Function
• Tanh Activation Function
• ReLU Activation Function
• Leaky ReLU
• Parametric ReLU

Weights and Inputs

In Linear and Logistic Regression, weights are the set of adaptive parameters in an artificial neuron model. These weights act as a multiplier on the inputs of the added neuron. The linear combination of these inputs is the sum of weight times the input.

Let’s take the shopping bill analogy to understand the above concept:

Shopping bills are a linear combination of the number of units purchased and the price of each unit, i.e., the total amount is each item’s product by its price and then added.

If there’s a neuron with six inputs (analogous to the amounts of the six shopping items: potatoes, carrots, beetroots, etc.), input1, input2, input3, input4, input5, and input6, we thereby need six weights. The weights are similar to the prices of the items. They worry about weight1, weight2, weight3, weight4, weight5, and weight6. We would want to include an intercept term as we did in linear regression. This can be considered an example of a fixed additional charge due to processing a credit card payment, for example, in the shopping cart analogy.

We could then calculate the linear combination as follows: linear combination = intercept + weight1 × input1 + … + weight6 × input6, whereby the … (Three dots) means that the sum includes all the terms from 1 to 6)

Similar to linear and logistic regression, weights in neural networks learn from the data.

Activations used in Activation Function and Outputs

After computing the linear combination, the neuron takes the linear combination (weights and biases) and puts it through a so-called activation function. Accurate examples of the activation function include:

Identity Function

Output the linear combination of weights and biases.

Step Function:

If the input is greater than zero, turn/send a pulse to activate (ON), otherwise (OFF).

Sigmoid Function:

A soft, smooth version of the step function.

Neural Activation:

Communication in biological neurons is done by sending out sharp, electrical pulses called “spikes” spontaneously so that at any given time, the outgoing signal is either on or off (1 or 0). The step function mimics this kind of behavior. However, artificial neural networks tend to use activation functions that output a continuous numerical activation level at all times, such as the sigmoid function.

The output of the neuron, established by the linear combination and the activation function, can be used to extract a prediction or a specific decision. For instance, if the network is designed to identify a HALT sign in front of a self-driving car, the input could be the image pixels, the photo captured by a camera attached to the front of the car itself. The output could activate a stopping procedure using an Activation function that stops the car before the sign.

The neural network adaptation occurs when the neurons’ weights are adjusted to make the network produce the correct outputs, the same as in linear or logistic regression. Many neural networks are very huge, and the largest contains hundreds of billions of weights. Optimizing them could be a real Hercules task that requires massive amounts of computing power.

Most ML models use rectified linear units (ReLU) as non-linearity instead of Sigmoid function in a deep neural network. The question is, why? That’s what we’re here to find out:

What is an activation function?

A nonlinear function used to send the output signal either on or off to a neuron is known as an activation function.

Sigmoid Function:

A mathematical function that transforms the values between the range 0 and 1 is known as the sigmoid function. It is an S-shape curve that mainly used with non-linear activation functions and is also known as the Sigmoidal Curve.

It is a smooth and continuously differentiable function that gives output after activation. This is a nonlinear function, almost like an S- shape. The typical reason to use the sigmoid function is its activation value between 0 and 1. It is used explicitly for models where we have to predict the probability as an output, especially statistical predictions. As the probability lies between the range 0 and 1 so, the sigmoid function is the best choice. Sigmoid is one of the most widely used non-linear activation functions. Sigmoid translates the values between the range 0 and 1.

Here is the mathematical expression for sigmoid-

f(x) = 1/(1+e^-x)

As we know, the sigmoid function sends the output values between 0 and 1. Statistically, a significantly negative number passed through the sigmoid function becomes 0, and a large positive number becomes 1.

Now, we will analyze the source code and the graph of the sigmoid function for high positive and negative values and the relevant slopes for the same and for that we will use dummy dataset of sofa set image.

`import numpy as npfrom matplotlib import pyplot as pltdef sigmoid(w,b,x): return 1.0/(1.0+np.exp(-(w*x+b)))x = np.arange(-15.0, 15.0, 0.2)y = sigmoid(-0.3,0,x)plt.plot(x,y)plt.show()Copy code`

`y = sigmoid(-2,0,x)plt.plot(x,y)Copy code`

`y = sigmoid(-5,0,x)plt.plot(x,y)Copy code`

`y = sigmoid(.5,0,x)plt.plot(x,y)Copy code`

`y = sigmoid(4,0,x)plt.plot(x,y)Copy code`

`y = sigmoid(1,4,x)plt.plot(x,y)Copy code`

`y = sigmoid(1,-1,x)plt.plot(x,y)Copy code`

`y = sigmoid(1,-5,x)plt.plot(x,y)Copy code`

`y = sigmoid(.5,1.2,1)print(y)0.845534734916465Copy code`

Code for sigmoid function in Python

`def sigmoid(x,Derivative=False): if not Derivative: return 1 / (1 + np.exp (-x)) else: out = sigmoid(x) return out * (1 - out)( The one to produce the derivatives)ordef sigmoid(x): return 1 / (1 + np.exp(-x))Copy code`

While writing code for sigmoid, we could use this code for both forward propagations and compute derivatives.

ReLU

ReLU Rectified Linear Unit. This is the most frequently used activation unit in deep learning. R(x) = max(0, x) . Thereby, if x < 0, R(x) = 0 and if x ≥ 0, R(x) = x. It also enhances the convergence of gradient descent compared to sigmoid or tanh activation functions.

Advantage of ReLU Function:

• It doesn’t allow for the activation of all of the neurons at the same time.
• i.e., if any input is negative, ReLU converts it to zero and doesn’t allow the neuron to get activated. This means that only a few neurons are activated, making the network easy for computation.
• Rectifies the vanishing gradient descent problem.

Is ReLU Linear or Nonlinear

As Relu is defined by max(0,x) i.e.

R(x) = 0, if x < 0, and

R(x) = 0, if x >= 0

hence, the graph of ReLU, would be like:

So, there is no way to get the graph as a single straight line, therefore ReLU is a nonlinear function.

Sigmoid vs ReLU Activation Function:

Training deep neural nets was not possible using sigmoid-like activation functions.

ReLU (amidst other things) made the training of deeper nets feasible. And since then, we have been using ReLU as a default activation function for the hidden layers. So exactly what makes ReLU a better and best choice over Sigmoid?

Conclusion

ReLU boasts of training complex models/neural networks with constant values compared to sigmoid.

Relu exhibits better efficiency:

An advantage of ReLU, despite avoiding the vanishing gradients problem, is that it has a much lower run time max(0,f) and runs much faster than any sigmoid function.

Author: Aswini S.

FAQs

Why is Relu better than SIGMOID?

It is simple and faster to compute and does not activate all the neurons simultaneously. Even the derivation of Relu is easy to calculate, which makes a significant difference in training and inference time.

What makes the ReLU function more appropriate in some cases over a Sigmoid activation function?

ReLU prevents gradients from saturating in deep networks and thus mitigates the risk of vanishing gradients. Ideally, you should be coupling ReLUs and sigmoidal activation functions since sigmoidal activation functions are better at capturing nonlinearities. Clarification: you should mix ReLUs and sigmoidal activation functions across layers; each layer should use either of the two activations and still have the same activation for each hidden unit.