Difference Between Covariance and Correlation

Q: What is a Covariance?

Covariance is a statistical measure that indicates the degree to which two variables tend to vary together.

Q: What is a Correlation?

Correlation is a statistical measure that indicates the strength and direction of the linear relationship between two variables.

Q: What are the different types of Correlation used in Data Science?

There are mainly three different types of Correlation: Pearson Correlation, Spearman Correlation, and Kendall Rank or Kendall Tau Correlation.

Q: What are the different types of Covariance?

There are three different types of covariance: Positive Covariance: It indicates that two variable move in the same direction i.e., both are directly proportionate. Zero Covariance: It indicates that there is no relationship between both the random variables. Negative Covariance: It indicates that two variable moves in the opposite direction i.e., both are inversely proportionate.

5 mins read2.6K Views Comment

Call 8585951111Got Doubts?

Download as PDF

Vikram Singh

Updated on Oct 3, 2023 11:46 IST

Looking to understand the difference between covariance and correlation? This article breaks down the key differences between the two statistical measures, including their definitions, range of values, units, sensitivity to scale, interpretation, and formulas. Gain a better understanding of how these measures are used and their impact on data analysis.

Covariance and Correlation are one of the most important concepts in probability. Covariance indicates the linear relationship between variables, whereas correlation measures the direction and strength of the linear relationship between variables. Using these you can quantify the relationship between variables and then use these to select, add, or remove the variable.
This article will discuss what is a Covariance, what is a correlation, and the difference between them.

Must Check: Free Statistics for Data Science Online Courses & Certifications

Must Check: Free Maths for Data Science Courses Online

So, without further delay let’s start.

What is the difference between Covariance and Correlation?

	Covariance	Correlation
Definition	It measures how two variables vary from each other.	Measures the strength and direction between the linear relationship between two variables.
Range of Values	It can be between -inf and +inf.	It can range between -1 and 1.
Scalability	Sensitive to change in scale of variables.	Not sensitive to change in the scale of the variables.
Units	Depends on the unit of the variables	Unitless
Formula	cov(X, Y) = E[(X – E[X])(Y – E[Y])]	corr(X,Y) = cov(X,Y) / (std(X) * std(Y))

Statistics Interview Questions for Data Scientists

In this article, Statistics Interview Questions for Data Scientists are listed. It starts with defining Statistics and ends with describing Empirical Rule.

Read Later

Measures of Central Tendency: Mean, Median and Mode

When we have the dataset having ample records (like passenger traveling through of airplane, weight, and score of all students in a university, share prices) in it and...read more

Read Later

Measures of Dispersion: Range, IQR, Variance, Standard Deviation

To describe the data, a measure of the central tendency is not just enough as it only gives information about the central values of the dataset.

Read Later

What is a Covariance?

Covariance signifies the direction of the linear relationship between two variables.

Stay updated with the latest blogs on online courses and skills

Enter Mobile Number

Here, direction means whether both random variables are direct proportionate (moves in same direction) or inversely proportionate (moves in opposite direction) to each other.

In Layman term covariance is nothing but a measure of variance between two variables. It can take any positive and negative value from -infinity to +infinity.

Mathematical Formula:

Covariance between two variables X and Y is calculated as:

Let’s calculate the covariance using Python:

#import library
import numpy as np
 
#generating random dataset
X = np.random.rand (15)
Y = np.random.rand(15)
 
#calculating the covariance
np.cov(X, Y)
 
#np.cov(a,b) - it gives 2 x 2 matrix, that has elements cov(a,a), cov(a,b), cov(a, b), and cov(b,b).
# note: cov (a,b) = cov(b,a)
Copy code

Covariance is mainly classified into 3 parts:

Positive Covariance:

It indicates that two variable move in the same direction i.e. both are directly proportionate.
- COV(X,Y) > 0

Zero Covariance:

It indicates that there is no relationship between both the random variables.
- COV(X, Y) = 0

Negative Covariance:

It indicates that two variable move in the opposite direction i.e. both are inversely proportionate.
- COV(X, Y) < 0

As covariance doesn’t signify the strength of the relationship between the random variable.

To overcome this problem, correlation comes into existence.

Standard Error vs. Standard Deviation

Standard Error quantifies the variability between samples drawn from the same population, whereas standard deviation quantifies the variability of values in a dataset. In this article, we will discuss Standard...read more

Read Later

Difference between Median and Average

Average and median are two basic terms that are used in statistics very often. Median is the middle value in a set, whereas average is an arithmetic mean of set...read more

Read Later

Difference between Median and Average

Average and median are two basic terms that are used in statistics very often. Median is the middle value in a set, whereas average is an arithmetic mean of set...read more

Read Later

Correlation:

As similar to the covariance it also measures the relationship between two variables, as well as the strength betweenthese two variables.

It can take any values from -1 to 1.

Mainly correlation is represented by r.

Mathematical Formula:

Correlation of two random variable X and Y is given by:

Let’s calculate the correlation using Python:

#import library
import numpy as np
 
#generating random dataset
X = np.random.rand (15)
Y = np.random.rand(15)
 
#calculating the covariance
np.corrcoef(X, Y)
 
#np.corrcoef(a,b) - which is a two-dimensional array with the correlation coefficients
Copy code

Note: closer the value to 1 and -1, more closely two variables are related.

Correlation is mainly classified into 5 parts:

Perfectly Positive
- r = 1

Positive Correlation
- 0 < r < 1

No Correlation
- r = 0

Negative Correlation
- -1 < r < 0

Perfectly Negative Correlation
- r = -1

Nominal vs. Ordinal

There are four levels of measurements: Nominal, Ordinal, Interval and Ratio. Nominal and Ordinal are qualitative data, whereas Interval and Ratio are quantitative data. In this article, we will discuss...read more

Read Later

Difference between Variance and Standard Deviation

Variance and standard deviation are statistical measures of data dispersion. Variance quantifies the average squared deviation from the mean, while standard deviation is the square root of variance, providing a...read more

Read Later

Difference between Correlation and Regression

Correlation measures the degree of relationship between two variables while regression is about how one variable affects the other. In this article, we will briefly discuss the difference between correlation...read more

Read Later

Types of Correlation:

Pearson Correlation
Spearman Rank Correlation
Kendall Rank

What is Pearson Correlation?

Normalized measurement of Covariance
Assumes both the variables are normally distributed
Measures linear relationship between two variables and fails to capture non-linear relationship
It can be used for nominal or continuous variables
Usually not used with the ordinal variable

Mathematical Formula for Pearson Correlation:

For any two random variable X and Y, Pearson correlation coefficient is calculated by:

Lets calculate the Pearson correlation coefficient using python:

#import library
import numpy as np
from scipy.stats import pearsonr #pearsonr : pearson correlation coefficent (r)
 
#generating random dataset such that both are normally distributed
X = np.random.normal (size = 15)
Y = np.random.normal(size = 15)
 
#calculating the pearson correlation coefficient
 
pearsonr(X, Y)
Copy code

Correlation vs Causation

Correlation and causation are one of the most important but confusing topics of statistics. Correlation gives the relationship between two variables, whereas causation means one event is cause due to...read more

Read Later

Difference between Accuracy and Precision

Precision refers to the closeness of multiple reading of the same quantity, whereas accuracy refers to the measured value to the true value. In this article we will discuss difference...read more

Read Later

Difference between Eigenvalue and Eigenvector

Let A be a square matrix of order ‘n-by-n.’ A scalar k is called a eigenvalue of A, if there exist a non-zero vector v satisfying Av = kv, then...read more

Read Later

What is Spearman Rank Correlation?

It is non-parametric measure
Captures both linear and non-linear relationship
Used for Ordinal variables or continuous variables

Mathematical Formula Spearman Rank Correlation:

For any two random variable X and Y, spearman rank correlation coefficient is calculated by:

Lets calculate the Spearman rank correlation coefficient using python:

#import library
import numpy as np
from scipy.stats import spearmanr #spearmanr : spearman correlation coefficent (r)
 
#generating random dataset 
X = np.random.rand (15)
Y = np.random.rand(15)
 
#calculating the spearman rank correlation coefficient
 
spearmanr(X, Y)
Copy code

What is Kendall Rank/Kendall Tau Correlation?

Non-parametric measure for calculating the rank correlation coefficient
Used for ordinal variables
Captures both linear and non-linear relationship

Mathematical Formula Kendall Tau Correlation:

For any two random variable X and Y, Kendall rank correlation coefficient is calculated by:

Concordant Pair: A pair is concordant if the observed rank is higher on one variable and is also higher on another variable.

Discordant Pair: A pair is discordant if the observed rank is higher on one variable and is lower on the other variable.

Let’s calculate the Pearson correlation coefficient using python:

#import library
import numpy as np
from scipy.stats import kendalltau 
 
#generating random dataset 
X = np.random.rand (15)
Y = np.random.rand(15)
 
#calculating the kendall rank correlation coefficient
 
kendalltau(X, Y)
Copy code

Conclusion

In this article, we have briefly discussed what is correlation, what is covariance, and the key differences between them. The article also covers the different types of covariance and correlation and the corresponding examples.

Hope this article will help you in data science and machine learning journey.

Happy Learning!!

Articles You May Be Interested in

Top 10 Probability Questions Asked in Interviews

Probability is defined as the likeliness of something to occur or happen. In this article, we will discuss top 10 probability questions that are asked in the interviews with their...read more

Read Later

Probability Density Function: Definition, Properties, and Application

Probability Density function describes the probability distribution of the continuous random variable. In this article, we will briefly discuss what is probability density function, its properties, its application, and how...read more

Read Later

FAQs

What is a Covariance?

Covariance is a statistical measure that indicates the degree to which two variables tend to vary together.

What is a Correlation?

Correlation is a statistical measure that indicates the strength and direction of the linear relationship between two variables.

What are the different types of Correlation used in Data Science?

There are mainly three different types of Correlation: Pearson Correlation, Spearman Correlation, and Kendall Rank or Kendall Tau Correlation.

What are the different types of Covariance?

There are three different types of covariance: Positive Covariance: It indicates that two variable move in the same direction i.e., both are directly proportionate. Zero Covariance: It indicates that there is no relationship between both the random variables. Negative Covariance: It indicates that two variable moves in the opposite direction i.e., both are inversely proportionate.

About the Author

Vikram Singh

Difference Between Covariance and Correlation

Table of Content:

What is the difference between Covariance and Correlation?

What is a Covariance?

Mathematical Formula:

Positive Covariance:

Zero Covariance:

Negative Covariance:

Correlation:

Mathematical Formula:

Types of Correlation:

What is Pearson Correlation?

Mathematical Formula for Pearson Correlation:

What is Spearman Rank Correlation?

Mathematical Formula Spearman Rank Correlation:

What is Kendall Rank/Kendall Tau Correlation?

Mathematical Formula Kendall Tau Correlation:

Conclusion

FAQs

Comments

Top Picks & New Arrivals