Difference Between Covariance and Correlation

Difference Between Covariance and Correlation

5 mins read2.6K Views Comment
clickHere
Vikram
Vikram Singh
Assistant Manager - Content
Updated on Oct 3, 2023 11:46 IST

Looking to understand the difference between covariance and correlation? This article breaks down the key differences between the two statistical measures, including their definitions, range of values, units, sensitivity to scale, interpretation, and formulas. Gain a better understanding of how these measures are used and their impact on data analysis.

2022_03_Feature-Image-Templates.jpg

Covariance and Correlation are one of the most important concepts in probability. Covariance indicates the linear relationship between variables, whereas correlation measures the direction and strength of the linear relationship between variables. Using these you can quantify the relationship between variables and then use these to select, add, or remove the variable.
This article will discuss what is a Covariance, what is a correlation, and the difference between them.

So, without further delay let’s start.

Table of Content:

What is the difference between Covariance and Correlation?

Covariance Correlation
Definition It measures how two variables vary from each other. Measures the strength and direction between the linear relationship between two variables.
Range of Values It can be between -inf and +inf. It can range between -1 and 1.
Scalability Sensitive to change in scale of variables. Not sensitive to change in the scale of the variables. 
Units Depends on the unit of the variables Unitless
Formula cov(X, Y) = E[(X – E[X])(Y – E[Y])] corr(X,Y) = cov(X,Y) / (std(X) * std(Y))
Statistics Interview Questions for Data Scientists
Statistics Interview Questions for Data Scientists
In this article, Statistics Interview Questions for Data Scientists are listed. It starts with defining Statistics and ends with describing Empirical Rule.
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
When we have the dataset having ample records (like passenger traveling through of airplane, weight, and score of all students in a university, share prices) in it and...read more
Measures of Dispersion: Range, IQR, Variance, Standard Deviation
Measures of Dispersion: Range, IQR, Variance, Standard Deviation
To describe the data, a measure of the central tendency is not just enough as it only gives information about the central values of the dataset. 

What is a Covariance?

Covariance signifies the direction of the linear relationship between two variables.

Here, direction means whether both random variables are direct proportionate (moves in same direction) or inversely proportionate (moves in opposite direction) to each other.

In Layman term covariance is nothing but a measure of variance between two variables. It can take any positive and negative value from -infinity to +infinity.

Mathematical Formula:

Covariance between two variables X and Y is calculated as:

2022_03_covariance_formula.jpg

Let’s calculate the covariance using Python:

 
#import library
import numpy as np
#generating random dataset
X = np.random.rand (15)
Y = np.random.rand(15)
#calculating the covariance
np.cov(X, Y)
#np.cov(a,b) - it gives 2 x 2 matrix, that has elements cov(a,a), cov(a,b), cov(a, b), and cov(b,b).
# note: cov (a,b) = cov(b,a)
Copy code
2022_03_python_co.jpg

Covariance is mainly classified into 3 parts:

Positive Covariance:

  • It indicates that two variable move in the same direction i.e. both are directly proportionate.
    • COV(X,Y) > 0
2022_03_positive.jpg

Zero Covariance:

  • It indicates that there is no relationship between both the random variables.
    • COV(X, Y) = 0
2022_03_zero.jpg

Negative Covariance:

  • It indicates that two variable move in the opposite direction i.e. both are inversely proportionate.
    • COV(X, Y) < 0
2022_03_negative.jpg

As covariance doesn’t signify the strength of the relationship between the random variable.

To overcome this problem, correlation comes into existence.

Standard Error vs. Standard Deviation
Standard Error vs. Standard Deviation
Standard Error quantifies the variability between sample drawn from the same population, whereas the standard deviation quantifies the variability of values in a dataset. In this article we will discuss...read more
Difference between Median and Average
Difference between Median and Average
Average and median are two basic terms that are used in statistics very often. Median is the middle value in a set, whereas average is an arithmetic mean of set...read more
Difference between Median and Average
Difference between Median and Average
Average and median are two basic terms that are used in statistics very often. Median is the middle value in a set, whereas average is an arithmetic mean of set...read more

Correlation:

As similar to the covariance it also measures the relationship between two variables, as well as the strength betweenthese two variables.

It can take any values from -1 to 1.

Mainly correlation is represented by r.

Mathematical Formula:

Correlation of two random variable X and Y is given by:

2022_03_correlation-formula.jpg

Let’s calculate the correlation using Python:

 
#import library
import numpy as np
#generating random dataset
X = np.random.rand (15)
Y = np.random.rand(15)
#calculating the covariance
np.corrcoef(X, Y)
#np.corrcoef(a,b) - which is a two-dimensional array with the correlation coefficients
Copy code
2022_03_correlation.jpg

Note: closer the value to 1 and -1, more closely two variables are related.

Correlation is mainly classified into 5 parts:

  • Perfectly Positive
    • r = 1
2022_03_perfectly-positive.jpg
  • Positive Correlation
    • 0 < r < 1
2022_03_positive-corr.jpg
  • No Correlation
    • r = 0
2022_03_no-corr.jpg
  • Negative Correlation
    • -1 < r < 0
2022_03_negative-corr.jpg
  • Perfectly Negative Correlation
    • r  = -1
2022_03_perfectly-neagtive.jpg
Nominal vs. Ordinal
Nominal vs. Ordinal
There are four levels of measurements: Nominal, Ordinal, Interval and Ratio. Nominal and Ordinal are qualitative data, whereas Interval and Ratio are quantitative data. In this article, we will discuss...read more
Difference between Variance and Standard Deviation
Difference between Variance and Standard Deviation
Variance and Standard Deviation are statistical measure to measure the dispersion of data point from the center or mean. In this article, we will discuss difference between variance and standard...read more
Difference between Correlation and Regression
Difference between Correlation and Regression
Correlation measures the degree of relationship between two variables while regression is about how one variable affects the other. In this article, we will briefly discuss the difference between correlation...read more

Types of Correlation:

What is Pearson Correlation?

  • Normalized measurement of Covariance
  • Assumes both the variables are normally distributed
  • Measures linear relationship between two variables and fails to capture non-linear relationship
  • It can be used for nominal or continuous variables
  • Usually not used with the ordinal variable

Mathematical Formula for Pearson Correlation:

For any two random variable X and Y, Pearson correlation coefficient is calculated by:

2022_03_pearson-correlation-formula.jpg

Lets calculate the Pearson correlation coefficient using python:

 
#import library
import numpy as np
from scipy.stats import pearsonr #pearsonr : pearson correlation coefficent (r)
#generating random dataset such that both are normally distributed
X = np.random.normal (size = 15)
Y = np.random.normal(size = 15)
#calculating the pearson correlation coefficient
pearsonr(X, Y)
Copy code
2022_03_pearson-example.jpg
Correlation vs Causation
Correlation vs Causation
Correlation and causation are one of the most important but confusing topics of statistics. Correlation gives the relationship between two variables, whereas causation means one event is cause due to...read more
Difference between Accuracy and Precision
Difference between Accuracy and Precision
Precision refers to the closeness of multiple reading of the same quantity, whereas accuracy refers to the measured value to the true value. In this article we will discuss difference...read more
Difference between Eigenvalue and Eigenvector
Difference between Eigenvalue and Eigenvector
Let A be a square matrix of order ‘n-by-n.’ A scalar k is called a eigenvalue of A, if there exist a non-zero vector v satisfying Av = kv, then...read more

What is Spearman Rank Correlation?

  • It is non-parametric measure
  • Captures both linear and non-linear relationship
  • Used for Ordinal variables or continuous variables

Mathematical Formula Spearman Rank Correlation:

For any two random variable X and Y, spearman rank correlation coefficient is calculated by:

2022_03_spearman-formula.jpg

Lets calculate the Spearman rank correlation coefficient using python:

 
#import library
import numpy as np
from scipy.stats import spearmanr #spearmanr : spearman correlation coefficent (r)
#generating random dataset
X = np.random.rand (15)
Y = np.random.rand(15)
#calculating the spearman rank correlation coefficient
spearmanr(X, Y)
Copy code
2022_03_spearman-example.jpg

What is Kendall Rank/Kendall Tau Correlation?

  • Non-parametric measure for calculating the rank correlation coefficient
  • Used for ordinal variables
  • Captures both linear and non-linear relationship

Mathematical Formula Kendall Tau Correlation:

For any two random variable X and Y, Kendall rank correlation coefficient is calculated by:

2022_03_kendall-formula.jpg

Concordant Pair: A pair is concordant if the observed rank is higher on one variable and is also higher on another variable.

Discordant Pair: A pair is discordant if the observed rank is higher on one variable and is lower on the other variable.

Let’s calculate the Pearson correlation coefficient using python:

 
#import library
import numpy as np
from scipy.stats import kendalltau
#generating random dataset
X = np.random.rand (15)
Y = np.random.rand(15)
#calculating the kendall rank correlation coefficient
kendalltau(X, Y)
Copy code
2022_03_kendall-example.jpg

Conclusion

In this article, we have briefly discussed what is correlation, what is covariance, and the key differences between them. The article also covers the different types of covariance and correlation and the corresponding examples.

Hope this article will help you in data science and machine learning journey.

Happy Learning!!

Articles You May Be Interested in

Top 10 Probability Questions Asked in Interviews
Top 10 Probability Questions Asked in Interviews
Probability is defined as the likeliness of something to occur or happen. In this article, we will discuss top 10 probability questions that are asked in the interviews with their...read more
All About Probability Mass Function
All About Probability Mass Function
Discover the fundamentals of probability mass function in this comprehensive guide. Learn what a PMF is, and how to use PMFs to calculate probabilities. You’ll also explore the relationship between...read more
Probability Density Function: Definition, Properties, and Application
Probability Density Function: Definition, Properties, and Application
Probability Density function describes the probability distribution of the continuous random variable. In this article, we will briefly discuss what is probability density function, its properties, its application, and how...read more
PDF vs. CDF: Difference Between PDF and CDF
PDF vs. CDF: Difference Between PDF and CDF
PDF describes the probability distribution of a continuous random variable, while PDF describes the probability distribution of both discrete and continuous random variables. In this article, we will learn what...read more
Decoding Probability Formulas: Understand Chance and Uncertainty
Decoding Probability Formulas: Understand Chance and Uncertainty
In this article, we will briefly discuss the probability formulas that can be used to calculate the probability of any event.
Total Probability Theorem: Definition, Example, and Applications
Total Probability Theorem: Definition, Example, and Applications
Explore the total probability theorem in this comprehensive guide. Learn how to calculate the probability of an event based on different conditions, using the formula for the total probability theorem....read more

FAQs

What is a Covariance?

Covariance is a statistical measure that indicates the degree to which two variables tend to vary together.

What is a Correlation?

Correlation is a statistical measure that indicates the strength and direction of the linear relationship between two variables.

What are the different types of Correlation used in Data Science?

There are mainly three different types of Correlation: Pearson Correlation, Spearman Correlation, and Kendall Rank or Kendall Tau Correlation.

What are the different types of Covariance?

There are three different types of covariance: Positive Covariance: It indicates that two variable move in the same direction i.e., both are directly proportionate. Zero Covariance: It indicates that there is no relationship between both the random variables. Negative Covariance: It indicates that two variable moves in the opposite direction i.e., both are inversely proportionate.

About the Author
author-image
Vikram Singh
Assistant Manager - Content

Vikram has a Postgraduate degree in Applied Mathematics, with a keen interest in Data Science and Machine Learning. He has experience of 2+ years in content creation in Mathematics, Statistics, Data Science, and Mac... Read Full Bio