Central Limit Theorem

2 mins read1.6K Views Comment

Call 8585951111Got Doubts?

Updated on Sep 9, 2022 09:30 IST

Central limit theorem one of the most important and used theorem in statistics and data science, it is in the heart of Hypothesis testing. 

Introduction

Central limit theorem one of the most important and used theorem in statistics and data science, it is in the heart of Hypothesis testing.

Being in the core of the data science and machine learning, it is quiet confusing.

So, in this we will briefly discuss Central limit Theorem, it’s , assumption and implementation in python.

Table of Content:

Central Limit Theorem
Mean and Standard Deviation of Sample Mean
Assumption for Central Limit Theorem
Implementation in Python

Central Limit Theorem

Then the distribution of the sample mean will be approximately normally distributed regardless of whether the population is normal or skewed.

Provided that sample size is sufficiently large (n > 30).

To know more about mean and variance read the article Measure of Central Tendency and Measures of Dispersion.

Confused! So let’s understand it through an example:

Stay updated with the latest blogs on online courses and skills

Enter Mobile Number

Mean and Standard Deviation of Sample Mean

Mean of the sample means:

Standard deviation of the sample means:

Assumptions

Data must be randomly sampled
Samples should be independent
Sample size should be sufficiently large
- When population is skewed, sample size should be large
- When population is symmetric, sample size of 30 is sufficient
Sample size should not be more than 10% of the population (when sampling is done without replacement)

Implementation in Python

Before directly going for python implementation, let’s understand the step by step process:

Let X be any random variable having finite mean and variance

Randomly pick m samples each of size n
Calculate the mean of each sample
Plot the distribution of m samples

Let’s plot the normal distribution curve using CLT in Python

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# defining the sample size and number of samples we want to have
sample_size = 50
sample_number = 1000
sample_means = []
for i in range(0, sample_number):
    # randomly picking sample from the population distribution
    # In this case the population distribution is an Uniform distribution
    sample = np.random.uniform(1, 20, sample_size)
    sample_mean = sample.mean()
    sample_means.append(sample_mean)
 
plt.figure(figsize = (10, 10))
sns.distplot(sample_means, bins = 15);

Conclusion

In this article, we briefly explained CLT one of the most important and used theorem in statistics and data science.

Hope this article will help you in your data science and machine learning journey.

FAQs

What is Central Limit Theorem?

Central Limit theorem states that, if you have a population mean and standard deviation and takes large random samples from the population with replacement then he distribution of the sample mean will be approximately normally distributed regardless of whether the population is normal or skewed.

What are the assumptions behind the Central Limit Theorem?

1. Data must follow randomization condition. 2. Samples should be independent of each other. 3. 3. Sample size should be large but not be more than 10% of the population

When do we use Central Limit Theorem?

Central Limit Theorem is useful while analyzing the large dataset. It is advised not to use CLT if you have to find the probability of single or individual value.

About the Author

Vikram Singh

Central Limit Theorem

Introduction

Table of Content:

Central Limit Theorem

Mean and Standard Deviation of Sample Mean

Assumptions

Implementation in Python

Conclusion

FAQs

Comments

Top Picks & New Arrivals