Introduction to Sampling and Resampling

Introduction to Sampling and Resampling

3 mins read3.4K Views Comment
clickHere
Vikram
Vikram Singh
Assistant Manager - Content
Updated on Jun 17, 2022 10:50 IST

Introduction

In this article we will discuss about the introduction of sampling and resampling.

2022_02_feature_image_sampling-and-resampling.jpg

Before starting the article let’s discuss about sample and population.

Population is the set of all data points while Sample is the subset of Population.

2022_02_sample-and-population.jpg

Table of Content:

Sampling:

Sampling is a process of selecting group of observations from the population, to study the characteristics of the data to make conclusion about the population.

Example: Covaxin (a covid-19 vaccine) is tested over thousand of males and females before giving to all the people of country.

Types of Sampling:

Whethe the data set for sampling is randomized or not, sampling is classified into two major groups:

  1. Probability Sampling
  2. Non-Probability Sampling

Probability Sampling (Random Sampling):

In this type, data is randomly selected so that every observations of population gets the equal chance to be selected for sampling.

Probability sampling is of 4 types:

  • Simple Random Sampling
  • Cluster Sampling
  • Stratified Sampling
  • Systematic Sampling

Non-Probability Sampling:

In this type, data is not randomly selected. It mainly depends upon how the statistician wants to select the data.

The results may or maynot be biased with the population.

Unlike probability sampling, each observations of population doesn’t get the equal chance to be selected for sampling.

Non-probability sampling is of 4 types:

  • Convenience Sampling
  • Judgmental/Purposive Sampling
  • Snowball/Referral Sampling
  • Quota Sampling

Sampling Error:

Errors which occur during sampling process are known as Sampling Errors.

Or

Difference between observed value of a sample statistics and the actual value of a population parameters.

Mathematical Formula for Sampling Error:

2022_02_mathematical_formula_sampling_error.jpg

Sampling error can be reduced by:

  • Increasing the sample size
  • Classifying population into different groups

Advantage of Sampling:

  • Reduce cost and Time
  • Accuracy of Data
  • Inferences can be applied to a larger population
  • Less resource needed

Resampling:

Resampling is the method that consist of drawing repeatedly drawing samples from the population.

It involves the selection of randomized cases with replacement from sample.

Note: In machine learning resampling is used to improve the performance of the model.

Types of Resampling:

Two  common method of Resampling are:

  • K-fold Cross-validation
  • Bootstrapping

K-fold cross-validation:

In this method population data is divided into k equal sets in which one set is considered as the test set for the experiment while all other set will be used to train the model.

In first experiment, first set is considered as the test set and all other as trained set.

Process will be repeated k-time by choosing different sets as a test set.

2022_02_MicrosoftTeams-image-2-1.jpg

Bootstrapping:

In bootstrapping, samples are drawn with replacement (i.e. one observation can be repeated in more than one group) and

the remaining data which are not used in samples are used to test the model.

2022_02_bootstraping.jpg

Conclusion:

In this article we discussed about the introduction of sampling and resampling.

In predictive modelling problems sampling and resampling play an important role.

Hope the article will help you in data science learning.

Top Trending Articles:
Data Analyst Interview Questions | Data Science Interview Questions | Machine Learning Applications | Big Data vs Machine Learning | Data Scientist vs Data Analyst | How to Become a Data Analyst | Data Science vs. Big Data vs. Data Analytics | What is Data Science | What is a Data Scientist | What is Data Analyst

Frequently Ask Question (FAQ)

Ques 1. What is Sampling?

Ans 1: Sampling is a process of selecting group of observations from the population, to study the characteristics of the data to make conclusion about the population.

Ques 2. What is Resampling?

Ans 2. Resampling is the method that consist of drawing repeatedly drawing samples from the population.

It involves the selection of randomized cases with replacement from sample.

FAQs

What is Sampling?

Sampling is a process of selecting group of observations from the population, to study the characteristics of the data to make conclusion about the population.

What is Resampling?

Resampling is the method that consist of drawing repeatedly drawing samples from the population. It involves the selection of randomized cases with replacement from sample.

About the Author
author-image
Vikram Singh
Assistant Manager - Content

Vikram has a Postgraduate degree in Applied Mathematics, with a keen interest in Data Science and Machine Learning. He has experience of 2+ years in content creation in Mathematics, Statistics, Data Science, and Mac... Read Full Bio

Comments