Difference Between Covariance and Correlation

# Difference Between Covariance and Correlation

clickHere
Vikram Singh
Assistant Manager - Content
Updated on Oct 3, 2023 11:46 IST

Looking to understand the difference between covariance and correlation? This article breaks down the key differences between the two statistical measures, including their definitions, range of values, units, sensitivity to scale, interpretation, and formulas. Gain a better understanding of how these measures are used and their impact on data analysis.

Covariance and Correlation are one of the most important concepts in probability. Covariance indicates the linear relationship between variables, whereas correlation measures the direction and strength of the linear relationship between variables. Using these you can quantify the relationship between variables and then use these to select, add, or remove the variable.
This article will discuss what is a Covariance, what is a correlation, and the difference between them.

So, without further delay let’s start.

## What is the difference between Covariance and Correlation?

Statistics Interview Questions for Data Scientists
In this article, Statistics Interview Questions for Data Scientists are listed. It starts with defining Statistics and ends with describing Empirical Rule.
Measures of Central Tendency: Mean, Median and Mode
When we have the dataset having ample records (like passenger traveling through of airplane, weight, and score of all students in a university, share prices) in it and...read more
Measures of Dispersion: Range, IQR, Variance, Standard Deviation
To describe the data, a measure of the central tendency is not just enough as it only gives information about the central values of the dataset.

## What isa Covariance?

Covariance signifies the direction of the linear relationship between two variables.

Here, direction means whether both random variables are direct proportionate (moves in same direction) or inversely proportionate (moves in opposite direction) to each other.

In Layman term covariance is nothing but a measure of variance between two variables. It can take any positive and negative value from -infinity to +infinity.

## Mathematical Formula:

Covariance between two variables X and Y is calculated as:

Let’s calculate the covariance using Python:

` `
`#import libraryimport numpy as np #generating random datasetX = np.random.rand (15)Y = np.random.rand(15) #calculating the covariancenp.cov(X, Y) #np.cov(a,b) - it gives 2 x 2 matrix, that has elements cov(a,a), cov(a,b), cov(a, b), and cov(b,b).# note: cov (a,b) = cov(b,a)Copy code`

Covariance is mainly classified into 3 parts:

### Positive Covariance:

• It indicates that two variable move in the same direction i.e. both are directly proportionate.
• COV(X,Y) > 0

### Zero Covariance:

• It indicates that there is no relationship between both the random variables.
• COV(X, Y) = 0

### Negative Covariance:

• It indicates that two variable move in the opposite direction i.e. both are inversely proportionate.
• COV(X, Y) < 0

As covariance doesn’t signify the strength of the relationship between the random variable.

To overcome this problem, correlation comes into existence.

Standard Error vs. Standard Deviation
Standard Error quantifies the variability between sample drawn from the same population, whereas the standard deviation quantifies the variability of values in a dataset. In this article we will discuss...read more
Difference between Median and Average
Average and median are two basic terms that are used in statistics very often. Median is the middle value in a set, whereas average is an arithmetic mean of set...read more
Difference between Median and Average
Average and median are two basic terms that are used in statistics very often. Median is the middle value in a set, whereas average is an arithmetic mean of set...read more

## Correlation:

As similar to the covariance it also measures the relationship between two variables, as well as the strength betweenthese two variables.

It can take any values from -1 to 1.

Mainly correlation is represented by r.

### Mathematical Formula:

Correlation of two random variable X and Y is given by:

Let’s calculate the correlation using Python:

` `
`#import libraryimport numpy as np #generating random datasetX = np.random.rand (15)Y = np.random.rand(15) #calculating the covariancenp.corrcoef(X, Y) #np.corrcoef(a,b) - which is a two-dimensional array with the correlation coefficientsCopy code`

Note: closer the value to 1 and -1, more closely two variables are related.

Correlation is mainly classified into 5 parts:

• Perfectly Positive
• r = 1
• Positive Correlation
• 0 < r < 1
• No Correlation
• r = 0
• Negative Correlation
• -1 < r < 0
• Perfectly Negative Correlation
• r  = -1
Nominal vs. Ordinal
There are four levels of measurements: Nominal, Ordinal, Interval and Ratio. Nominal and Ordinal are qualitative data, whereas Interval and Ratio are quantitative data. In this article, we will discuss...read more
Difference between Variance and Standard Deviation
Variance and Standard Deviation are statistical measure to measure the dispersion of data point from the center or mean. In this article, we will discuss difference between variance and standard...read more
Difference between Correlation and Regression
Correlation measures the degree of relationship between two variables while regression is about how one variable affects the other. In this article, we will briefly discuss the difference between correlation...read more

## Types of Correlation:

### What isPearson Correlation?

• Normalized measurement of Covariance
• Assumes both the variables are normally distributed
• Measures linear relationship between two variables and fails to capture non-linear relationship
• It can be used for nominal or continuous variables
• Usually not used with the ordinal variable

#### Mathematical Formula for Pearson Correlation:

For any two random variable X and Y, Pearson correlation coefficient is calculated by:

Lets calculate the Pearson correlation coefficient using python:

` `
`#import libraryimport numpy as npfrom scipy.stats import pearsonr #pearsonr : pearson correlation coefficent (r) #generating random dataset such that both are normally distributedX = np.random.normal (size = 15)Y = np.random.normal(size = 15) #calculating the pearson correlation coefficient pearsonr(X, Y)Copy code`
Correlation vs Causation
Correlation and causation are one of the most important but confusing topics of statistics. Correlation gives the relationship between two variables, whereas causation means one event is cause due to...read more
Difference between Accuracy and Precision
Precision refers to the closeness of multiple reading of the same quantity, whereas accuracy refers to the measured value to the true value. In this article we will discuss difference...read more
Difference between Eigenvalue and Eigenvector
Let A be a square matrix of order ‘n-by-n.’ A scalar k is called a eigenvalue of A, if there exist a non-zero vector v satisfying Av = kv, then...read more

### What isSpearman Rank Correlation?

• It is non-parametric measure
• Captures both linear and non-linear relationship
• Used for Ordinal variables or continuous variables

#### Mathematical Formula Spearman Rank Correlation:

For any two random variable X and Y, spearman rank correlation coefficient is calculated by:

Lets calculate the Spearman rank correlation coefficient using python:

` `
`#import libraryimport numpy as npfrom scipy.stats import spearmanr #spearmanr : spearman correlation coefficent (r) #generating random dataset X = np.random.rand (15)Y = np.random.rand(15) #calculating the spearman rank correlation coefficient spearmanr(X, Y)Copy code`

## What isKendall Rank/Kendall Tau Correlation?

• Non-parametric measure for calculating the rank correlation coefficient
• Used for ordinal variables
• Captures both linear and non-linear relationship

#### Mathematical Formula Kendall Tau Correlation:

For any two random variable X and Y, Kendall rank correlation coefficient is calculated by:

Concordant Pair: A pair is concordant if the observed rank is higher on one variable and is also higher on another variable.

Discordant Pair: A pair is discordant if the observed rank is higher on one variable and is lower on the other variable.

Let’s calculate the Pearson correlation coefficient using python:

` `
`#import libraryimport numpy as npfrom scipy.stats import kendalltau #generating random dataset X = np.random.rand (15)Y = np.random.rand(15) #calculating the kendall rank correlation coefficient kendalltau(X, Y)Copy code`

## Conclusion

In this article, we have briefly discussed what is correlation, what is covariance, and the key differences between them. The article also covers the different types of covariance and correlation and the corresponding examples.

Happy Learning!!

Articles You May Be Interested in

Top 10 Probability Questions Asked in Interviews
Probability is defined as the likeliness of something to occur or happen. In this article, we will discuss top 10 probability questions that are asked in the interviews with their...read more
Discover the fundamentals of probability mass function in this comprehensive guide. Learn what a PMF is, and how to use PMFs to calculate probabilities. You’ll also explore the relationship between...read more
Probability Density Function: Definition, Properties, and Application
Probability Density function describes the probability distribution of the continuous random variable. In this article, we will briefly discuss what is probability density function, its properties, its application, and how...read more
PDF vs. CDF: Difference Between PDF and CDF
PDF describes the probability distribution of a continuous random variable, while PDF describes the probability distribution of both discrete and continuous random variables. In this article, we will learn what...read more
Decoding Probability Formulas: Understand Chance and Uncertainty
In this article, we will briefly discuss the probability formulas that can be used to calculate the probability of any event.
Total Probability Theorem: Definition, Example, and Applications
Explore the total probability theorem in this comprehensive guide. Learn how to calculate the probability of an event based on different conditions, using the formula for the total probability theorem....read more

## FAQs

What is a Covariance?

Covariance is a statistical measure that indicates the degree to which two variables tend to vary together.

What is a Correlation?

Correlation is a statistical measure that indicates the strength and direction of the linear relationship between two variables.

What are the different types of Correlation used in Data Science?

There are mainly three different types of Correlation: Pearson Correlation, Spearman Correlation, and Kendall Rank or Kendall Tau Correlation.

What are the different types of Covariance?

There are three different types of covariance: Positive Covariance: It indicates that two variable move in the same direction i.e., both are directly proportionate. Zero Covariance: It indicates that there is no relationship between both the random variables. Negative Covariance: It indicates that two variable moves in the opposite direction i.e., both are inversely proportionate.