**Introduction**

Chi-square test is a statistically significant test for Hypothesis Testing.

There are 3 steps in Hypothesis Testing:

- State Null and Alternate Hypothesis
- Perform Statistical Test
- Accept and reject the Null Hypothesis

In this article, we will discuss the Chi-square test.

**Table of Content**

**What is** **Chi-Square test**?

Statistical method which is used to find the difference or correlation between the observed and expected **categorical variables** in the dataset.

Example: Food delivery company wants to find the relationship between gender, location and food choices of peoples India.

It is used to determine that the difference between 2 categorical variables are:

- Due to chance or
- Due to relationship

**Mathematical Formula:**

**Types of Chi-square Test:**

- goodness of fit test
- test for independence

**Goodness of fit test**:

- Number of variable = 1
- Used to determine, whether the variable(sample) belongs to population or not
- Degree of freedom:

To know more about sample and population and degree of freedom, read the article Basics of Statistics for Data Science and z-test

Example:

Problem Statement:

The observed and expected frequency of numbers appearing on dice.

Using chi-square test at 5% significance level determine whether,

Observed frequencies are different from expected frequency or not.

Solution:

**Step-1: State Null and Alternate Hypothesis:**

**Null Hypothesis:**

There is no difference between observed and expected frequency of outcome of rolling dice

**Alternate Hypothesis:**

There is a difference between observed and expected frequency of outcome of rolling dice

**Step-2: Significance level and Degree of Freedom:**

Significance level = 5%

Degree of Freedom = 6-1 = 5

Corresponding chi-square value = 11.07

**Step-3: Find the chi-square value:**

**Step-4: Comparing with the significance level:**

From, step-2 and step – 3, we have:

0.1186 < 11.07

So, we have to accept the Null Hypothesis

There is no difference between observed and expected frequency of outcome of rolling dice.

**Test for independence**

- Number of variables = 2
- Used to determine, whether the variables are different or same
- Degree of Freedom:

Example:

**Problem Statement: **Election commission decides to find the relationship between Gender and casting vote.

A sample of 10,000 people voters were taken, the result are summarized as:

Solution:

**Step-1: State Null and Alternate Hypothesis**

**Null Hypothesis:** Gender is independent of voting.

**Alternate Hypothesis:** Gender and Voting are independent.

**Step-2: Significance level and Degree of Freedom**

Significance level = 5%

Degree of Freedom = (2-1) x (2-1) = 1

Corresponding chi-square value = 3.84

**Step-3: Find the chi-square value**

**Step-4: Comparing with the significance level**

From step-2 and step-3, we have,

6.6 > 3.84

Hence, rejecting the null hypothesis.

i.e. Gender and Voting are independent of each other.

**Distribution Table:**

df |
p = 0.75 |
p = 0.90 |
p = 0.95 |
p = 0.975 |
p = 0.99 |

1 | 1.32 | 2.71 | 3.84 | 5.02 | 6.64 |

2 | 2.77 | 4.60 | 5.99 | 7.37 | 9.21 |

3 | 4.10 | 6.24 | 7.80 | 9.33 | 11.31 |

4 | 5.38 | 7.77 | 9.48 | 11.14 | 13.27 |

5 | 6.62 | 9.23 | 11.07 | 12.83 | 15.08 |

6 | 7.84 | 10.64 | 12.59 | 14.44 | 16.81 |

7 | 9.04 | 12.02 | 14.07 | 16.01 | 18.48 |

8 | 10.22 | 13.36 | 15.51 | 17.54 | 20.09 |

9 | 11.39 | 14.68 | 16.92 | 19.02 | 21.67 |

10 | 12.5 | 15.9 | 18.3 | 20.5 | 23.2 |

11 | 13.7 | 17.3 | 19.7 | 21.9 | 24.7 |

12 | 14.8 | 18.6 | 21.0 | 23.3 | 26.2 |

13 | 16.0 | 19.8 | 22.4 | 24.7 | 27.7 |

14 | 17.1 | 21.1 | 23.7 | 26.1 | 29.1 |

15 | 18.2 | 22.3 | 25.0 | 27.5 | 30.6 |

16 | 19.4 | 23.5 | 26.3 | 28.8 | 32.0 |

17 | 20.5 | 24.8 | 27.6 | 30.2 | 33.4 |

18 | 21.6 | 26.0 | 28.9 | 31.5 | 34.8 |

19 | 22.7 | 27.2 | 30.1 | 32.9 | 36.2 |

20 | 23.8 | 28.4 | 31.4 | 34.2 | 37.6 |

**Chi-square distribution table**

**Conclusion**:

Chi-square is a statistically significant test for the hypothesis testing (null and alternative hypotheses) when the variables are categorical.

**Top Trending Articles:**

Data Analyst Interview Questions | Data Science Interview Questions | Machine Learning Applications | Big Data vs Machine Learning | Data Scientist vs Data Analyst | How to Become a Data Analyst | Data Science vs. Big Data vs. Data Analytics | What is Data Science | What is a Data Scientist | What is Data Analyst

**About the Author**

Vikram has a Postgraduate degree in Applied Mathematics, with a keen interest in Data Science and Machine Learning. He has experience of 2+ years in content creation in Mathematics, Statistics, Data Science, and Mac... Read Full Bio