z-test : Definition and Example

# z-test : Definition and Example

clickHere
Vikram Singh
Assistant Manager - Content
Updated on Aug 22, 2023 10:38 IST

z-test is a statistical method for the comparison of mean in a sample from the normally distributed population or between two independent samples. In this article we will briefly discuss z-test, types of z-tests with different examples.

z-test is a statistically significant test for Hypothesis Testing. There are 3 steps in Hypothesis Testing:

• State Null and Alternate Hypothesis
• Perform Statistical Test
• Accept and reject the Null Hypothesis

In this article, we will discuss the z-test, the mathematical formula, and how to calculate it with the help of an example.

Must Check: Statistics Interview Questions

## What is z-test?

Statistical method for the comparison of mean in a sample from the normally distributed population or between two independent samples

Or

Statistical test to validate the hypothesis (accept or reject) when the data is normally distributed.

z-test is used when:

• Population variance is unknown
• Sample size is greater than 30
Probability Distributions used in Data Science
In this article we listed 5 probability distributions used in Data Science like Uniform, Bernoulli, Binomial, Poisson, and Normal which are .
Measures of Central Tendency: Mean, Median and Mode
When we have the dataset having ample records (like passenger traveling through of airplane, weight, and score of all students in a university, share prices) in it and...read more
Measures of Dispersion: Range, IQR, Variance, Standard Deviation
To describe the data, a measure of the central tendency is not just enough as it only gives information about the central values of the dataset.

## Types of z-test:

Z-test is mainly classified into 2 types:

• One Sample
• Two Sample

## One-Sample

• The one-sample test is used when we have to compare a sample mean with the population mean.
• The region of rejection is located either extreme left or extreme right of the distribution

i.e. if any null hypothesis: Sample mean is 2

Then, its Alternate hypothesis: Sample mean is either greater or less than 2

in this case, the rejection region will be on the left side of the distribution

Note: For the left tailed test, the claimed mean sample value for the null hypothesis will be less than or equal to the mean population value.

or

In this case, the rejection region will be on the right side of the distribution.

Note: For the right-tailed test, the claimed mean sample value for the null hypothesis will be greater than or equal to the mean population value.

Mathematical Formula:

Standard Error vs. Standard Deviation
Standard Error quantifies the variability between sample drawn from the same population, whereas the standard deviation quantifies the variability of values in a dataset. In this article we will discuss...read more
Difference between Accuracy and Precision
Precision refers to the closeness of multiple reading of the same quantity, whereas accuracy refers to the measured value to the true value. In this article we will discuss difference...read more
Difference Between Type 1 and Type 2 Error
Type – 1 error is known as false positive, i.e., when we reject the correct null hypothesis, whereas type -2 error is also known as a false negative, i.e., when...read more

Let’s understand the one-sample z-test by an example:

### z-test Example

A gym trainer claimed that all the new boys in the gym are above average weight.

A random sample of thirty boys weight have a mean score of 112.5 kg and the population mean weight is 100 kg and the standard deviation is 15.

Is there a sufficient evidence to support the claim of gym trainer.

Difference between Variance and Standard Deviation
Variance and Standard Deviation are statistical measure to measure the dispersion of data point from the center or mean. In this article, we will discuss difference between variance and standard...read more
Difference between Correlation and Regression
Correlation measures the degree of relationship between two variables while regression is about how one variable affects the other. In this article, we will briefly discuss the difference between correlation...read more
Difference Between Covariance and Correlation
Looking to understand the difference between covariance and correlation? This article breaks down the key differences between the two statistical measures, including their definitions, range of values, units, sensitivity to...read more

## Two-Sample:

• A two-sample test is used when we have to compare the mean of two samples.
• The region of rejection is located on both the extreme (left and right) of the distribution

i.e. if any null hypothesis: Sample mean is 2

Then, its Alternate hypothesis: Sample mean is not equal to 2

Note: For two sample test, the claimed value for the null hypothesis will be equal to mean population value.

Mathematical Formula:

Let’s understand the two-sample z-test by an example:

Problem Statement:

Random samples of 75 males and 50 female’s donors yields mean concentration of 28 and 33 ppm respectively. The amount of trace elements in blood varies with the standard deviation 14.1 and 9.5 ppm respectively for males and females. What is the likelihood that the population means of concentration of elements are the same for men and women.

## Conclusion:

Z-test is a statistically significant test for the hypothesis testing (null and alternative hypotheses) when the sample size is large, and the population parameter (mean and variance) is known. Hope you will like the article.

Keep Learning!!

Keep Sharing!!

How Can Decision Tree Handle Complex Data?
A decision tree’s objective is to categorize data into one of two groups based on a set of attributes. A decision tree might be used, for instance, to categorize emails...read more
Cross Entropy Loss Function in Machine Learning
Cross entropy loss function is a mathematical tool used in machine learning to measure the difference between predicted and actual probability distributions.
Understanding Decision Tree Algorithm in Machine Learning
Decision tree algorithms are a type of supervised learning method used for both classification and regression problems. These algorithms create a tree-like model of decisions and their possible consequences, allowing...read more
Machine Learning for Fraud Detection
Discover the power of Machine Learning for fraud detection.
Introduction to Word Embeddings in NLP
In this article, we will learn the concept of word embedding, and its importance. Later in the article, we will also learn the concepts of continuous bag of words model,...read more
What is Polynomial Regression in Machine Learning?
You will learn advantages,disadvantages and application of Polynomial Regression.You will also see the implementation of polynomial regression.
Understanding Hierarchical Clustering in Data Science
Data can be challenging to comprehend as it can be extensive. Clustering is a method to divide objects into clusters that are similar and dissimilar to the objects belonging to...read more
3 Important Types of Vector Norm Used in Machine Learning
The length or the magnitude of the vector is known as vector norm or vector magnitude. In mathematics, a function is defined on a vector space that maps each vector...read more
A Simple Explanation of the Bag of Words (BoW) Model
In this article, we will explore all that there is to know about Bag of Words (BOW) Model.
Quadratic Voting – All That You Need To Know
Have you ever felt your vote didn’t matter? Maybe you didn’t feel strongly about a particular candidate or issue, so you just cast your vote and hoped for the best....read more
All that You Need to Know About Logistic Regression
Logistic Regression is a supervised machine-learning model that is used for classification problems. By classification, we mean that this model allows us to classify a set of input variables or...read more
A Comprehensive Guide to Convolutional Neural Networks
CNN is a supervised deep neural network that is used in deep learning. In this article we will learn the architecture of CNN, hyperparameters used in CNN and the applications...read more
Anomaly Detection in Machine Learning
Anomaly detection is a crucial process in machine learning that helps identify unusual patterns in datasets. It plays a vital role in multiple domains, ranging from fraud detection to system...read more
Dot Product – All That You Need To Know
Dot products are an important concept in data science and are used in a variety of applications, including machine learning, natural language processing, and recommendation systems.
Transfer Learning in Machine Learning: Techniques for Reusing Pre-Trained model
In this blog, we will introduce the concept of transfer learning in machine learning and discuss its applications and benefits. Transfer learning involves using knowledge from a previously trained model...read more
Active Learning in Machine Learning: Techniques for Efficiently Labeling Data
In this blog, you will discover the benefits of using active learning in your machine learning projects. Active learning is a powerful technique that allows a model to choose which...read more
A Day in a Life of a Data Science Engineer
Data science engineer builds and deploys machine learning models, designs data pipelines, and maintains models in production to solve business problems using data and programming skills.
Probability Density Function: Definition, Properties, and Application
Probability Density function describes the probability distribution of the continuous random variable. In this article, we will briefly discuss what is probability density function, its properties, its application, and how...read more
10 Ways to Handle Imbalanced Data in a Classification Problem
Imbalanced datasets, where one class greatly outnumbers others, pose machine learning challenges. To address this, techniques like oversampling, undersampling, SMOTE, ADASYN, Tomek links, ENN, CNN, near miss, and one-sided selection...read more
How to Calculate the F1 Score in Machine Learning
f1 score is the evaluation metric that is used to evaluate the performance of the machine learning model. It uses both precision and Recall, that makes it best for unbalanced...read more
Introduction to Maximum Likelihood Estimation: Definition, Type and Calculation
Maximum Likelihood Estimation is used to estimate the parameter value of the likelihood function. This article will briefly discuss the definition, types and calculation of MLE.
How to Calculate the Degrees of Freedom
Degrees of freedom in statistics is the maximum number of logically independent values in any data sample. This article will discuss the definition, formula and how to calculate the degrees...read more
How to Compute Euclidean Distance in Python
Euclidean Distance is one of the most used distance metrics in Machine Learning. In this article, we will discuss Euclidean Distance, how to derive formula, implementation in python and finally...read more
Train test split technique is used to estimate the performance of machine learning algorithms which are used to make predictions on data not used to train the model. In this...read more
Pytorch vs Tensorflow – What’s the Difference?
The main difference between Pytorch vs Tensorflow (as of now, both of these libraries are still evolving) is that more research-oriented developers use the Pytorch library. On the other hand,...read more
K-fold Cross-validation
Cross-validation is a resampling technique used to validate machine learning models against a limited sample of data. In this article we will talk about K-fold Cross-validation and its advantages and...read more
Difference Between Independent and Dependent Variables
Independent variable in mathematics does not depend on another variable and it explains the cause, whereas the dependent variable depends on an independent variable, and it is used to inform...read more
All You Need to Know About Odds Ratio
The odds ratio is defined as the ratio of the number of favorable events to the ratio of unfavorable events. This article, will briefly discuss odd ratio, log odd ratio...read more
All that You Need to Know About Sigmoid Function
The sigmoid function is a special case of a logistic function that has S-shaped characteristic and are used as an activation function in Neural Networks. In this article, we will...read more
Difference Between Precision and Recall
Discover the key differences between Precision and Recall in our latest article. Dive into examples and Python programming to understand how these metrics, based on relevance, measure the percentage of...read more

## FAQs

What is z-test with example?

Statistical test to validate the hypothesis (accept or reject) when the data is normally distributed. z-test is used when: 1. Population variance is unknown. 2. Sample size is greater than 30. Example: Random samples of 75 males and 50 female's donors yields mean concentration of 28 and 33 ppm respectively. The amount of trace elements in blood varies with the standard deviation 14.1 and 9.5 ppm respectively for males and females. What is the likelihood that the population means of concentration of elements are the same for men and women.

What is the difference between z-test and t-test?

z-test is a kind of hypothesis test that ascertains if the average of the two datasets is different from each other when standard deviation and variance are given, whereas the t-test is referred to as a kind of parametric test that is applied to identity how average of two sets of data differ from each other when the standard deviation and variance is not given.

What are the different types of z-tests?

There are two types of z-tests: 1. One Sample z-test 2. Two Sample z-test

What z-score means?

z-score is a measure of how many standard deviations below or above the population mean a raw score is. It gives an idea of how far a data point is from the mean. It can be placed on normal distribution curve. Value of z-score ranges from -3 standard deviation to +3 standard deviation.

What is a good z-score?

The choice of 'good' or bad 'z-score' is totally subjective, it totally depends on the individual choice, to determine whether a good z-score should be one that represents the 70th, 80th, 90th, 95th percentile, etc. The value of z-score ranges from -3 standard deviations (far left of the normal distribution) to +3 standard deviation (far right of the normal distribution)