Understanding Set Theory – What, Where, Why, and How do we use it in Data Science

Understanding Set Theory – What, Where, Why, and How do we use it in Data Science

7 mins read555 Views Comment
Vikram
Vikram Singh
Assistant Manager - Content
Updated on May 7, 2024 16:52 IST

Set in mathematics is a well-defined collection of objects that doesn’t vary from person to person. In this article, we will briefly discuss set theory, its representation, subset, cardinality, union and intersection of sets with the help of examples.

2022_12_MicrosoftTeams-image-104-2.jpg

Set theory is a mathematical theory of a well-defined collection of objects called a set, and the objects of the set are called elements. 

Every dataset that we use for the machine learning model is a collection of objects of a particular kind, such as

Meteorological data consists of temperature (minimum and maximum), wind speed, wind direction, visibility, sea level pressure, humidity, geographical location, humidity, precipitation, and many more.

Meteorologist uses this data to forecast the weather of any particular region, but it is more complex than it looks. They first pre-process the data, i.e., 

  • Classifies the given dataset into categorical and numerical datasets
  • Joining different variables (union & intersection) to find the correlation between the variables
  • Split the datasets into two different subsets for training and testing data.

This article will briefly discuss sets, types of sets, subsets, the cardinality of the set, the union and intersection of sets, and how they can be used in Data science.

So, let’s dive deep to learn more about set theory.

Table of Content

What is Set

Set in mathematics is a well-defined collection of objects that doesn’t vary from person to person.

Example: 

  1. First five Natural Numbers: {1, 2, 3, 4, 5}
  2. Vowels in English: {a, e, i, o, u}

Note: 

  1. The objects of the set are called elements.
  2. A set of 5 best cricketers in the world is not a set, as it will vary from person to person.

Representation

Set can be represented in two forms:

Roster Form: In the roster form, all the elements of a set are listed. The set elements are separated by commas and enclosed in {}.

Example: Vowels in English: {a, e, i, o, u}

Set-Builder Form: All the set elements possess a single common property, and there will not be any data point outside the set that will satisfy the defined property.

A = {x: x is a vowel in the English alphabet}

Subset

A set (B) is said to be the subset of a set (A) if the elements of B are contained in set A. 

In other words, if all the elements of set B are contained in set A, then B is said to be the subset of A, and A is said to be the superset of B.

Notation: if B is a subset of A, then it is represented by B A.

Example: A = {1, 2, 3}, then the subset of A are {}, {1}, {2},{3}, {1, 2}, {1, 3}, {2, 3}, and {1, 2, 3}.

Now, let’s move to see what are the different types of sets:

Types of Set

  • Empty Set: A set that has no elements is called an empty, null, or void set.
    • Example: B = {x: x is an integer between 0.2 and 0.5}
  • Singleton Set: Set that has only one element is known as a singleton set.
    • Example: C = {x: 0.5 < x < 1.5 for x belongs to integer}
  • Finite and Infinite Set: A set containing a finite number of elements is known as a finite set, whereas a set with an infinite number of elements is known as an infinite set.
    • Example: D = {number of prime numbers between 1 to 50}
    • Example: E = {number of stars in the galaxy}
  • Equal Set: If A and B are two sets, then A = B, if and only if:
    • The number of elements in both sets is the same.
    • Elements in both sets are the same.
      • Example: A = {2, 3, 5, 7, 11, 13, 17, 19}, B = {11, 2, 13, 19, 7, 5, 3, 17}
        • Here, A = B since both set contains the same 11 elements.
  • Power Set: The set of all the subsets of a set is known as the power set.
    • Example: Power set of {1, 2, 3} is {{}, {1}, {2},{3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}
What is Programming What is Python
What is Data Science What is Machine Learning

Cardinality of a Set

The number of unique elements in the set is known as the cardinality of the set. 

  • If any set A has k elements, then the cardinality of A is given by: n(A) = k.

Example: 

  1. F = {1, 2, 3, 4, 5}, then the cardinality of F is 5.
  2. G = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5}, then the cardinality of G is 5.

Union and Intersection of Sets

Union: Union of two sets is the set that contains all the elements of both sets. It is the smallest set that contains all the elements of both sets.

Representation:

2022_12_image-114.jpg

Example 1: A = {1, 4, 9, 16, 25}, B = {2, 3, 5, 7, 11}, then A U B = {1, 2, 3, 4, 5, 7, 9, 11, 16, 25}.

Example 2: A = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5}, B = {1, 4, 9, 16, 25}, then A U B = {1, 2, 3, 4, 5, 9, 16, 25}.

Properties of Union of Sets

  • A U B = B U A
  • A U (B U C) = (A U B) U C
  • {} U A = A
  • A U A = A
  • If B is a subset of A, then 
    • A U B = A

Intersection: Intersection of two sets is the set of all elements that are common to both the sets. 

Representation

2022_12_image-115.jpg

Example 1: A = {1, 4, 9, 16, 25}, B = {2, 3, 5, 7, 11}, then

2022_12_image-116.jpg

Example 2: A = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5}, B = {1, 4, 9, 16, 25}, then

2022_12_image-117.jpg

Properties of Intersection of Sets

2022_12_image-119.jpg

Relation between the Cardinality of Union and Intersection

2022_12_image-120.jpg

where,

2022_12_image-121.jpg
Programming Online Courses and Certification Python Online Courses and Certifications
Data Science Online Courses and Certifications Machine Learning Online Courses and Certifications

Now, let’s take a real-life example to get a better understanding of sets.

Problem Statement: Let we are testing a machine learning model that predicts pregnancy in females, and for that, we have taken blood samples of both males and females. The model produces output into 4 different subsets.

  • Male: Pregnant
  • Male: Not Pregnant
  • Pregnant Female: Pregnant
  • Pregnant Female: Not-Pregnant

Now, represents the data into a set and find True positive, false positive, True negative, and false negative.

Solution: 

X = {set of all people (male + female) who have take blood test)

A = {set of Males}

B = {set of Pregnant Females}

C = {set of output: Pregnant}

D = {set of output: Not-Pregnant}

Here, A, B, C, and D are the subsets of X.

True Negative: Males who are tested not-pregnant.

2022_12_image-122.jpg

False Positive: Males who are tested pregnant.

2022_12_image-125.jpg

False Negative: Pregnant Females who are tested not-pregnant.

2022_12_image-124.jpg

 True Positive: Pregnant Females who are tested pregnant.

2022_12_image-126.jpg
2022_12_image-127.jpg

Now, we will find the true positive, true negative, false positive, and true negative rates using the cardinality of sets.

True Positive Rate (TPR)

2022_12_image-128.jpg

False Positive Rate (FPR)

2022_12_image-129.jpg

True Negative Rate (TNR)

2022_12_image-130.jpg

False Negative Rate (FNR)

2022_12_image-131.jpg

The performance of the model will be good if 

  • True Positive Rate and True Negative Rate are closer to 1.
  • False Positive Rate and False Negative Rate are closer to 0.

Conclusion

In this article, we have briefly discussed set theory, its representation, subset, cardinality, union and intersection of sets with the help of examples.

Hope you will like the article.

Top Trending Article

Top Online Python Compiler | How to Check if a Python String is Palindrome | Feature Selection Technique | Conditional Statement in Python | How to Find Armstrong Number in Python | Data Types in Python | How to Find Second Occurrence of Sub-String in Python String | For Loop in Python |Prime Number | Inheritance in Python | Validating Password using Python Regex | Python List |Market Basket Analysis in Python | Python Dictionary | Python While Loop | Python Split Function | Rock Paper Scissor Game in Python | Python String | How to Generate Random Number in Python | Python Program to Check Leap Year | Slicing in Python

Interview Questions

Data Science Interview Questions | Machine Learning Interview Questions | Statistics Interview Question | Coding Interview Questions | SQL Interview Questions | SQL Query Interview Questions | Data Engineering Interview Questions | Data Structure Interview Questions | Database Interview Questions | Data Modeling Interview Questions | Deep Learning Interview Questions |

FAQs

What is a Set?

Set in mathematics is a well-defined collection of objects that doesnu2019t vary from person to person. Example: Five five natural numbers, vowels in English

What is a Subset?

A set B is said to be the subset of A, if the element of B is contained in A.

What are the different types of sets?

Empty Set, Singleton Set, Finite and Infinite Set, Equal Set, and Power Set are some common types of sets.

About the Author
author-image
Vikram Singh
Assistant Manager - Content

Vikram has a Postgraduate degree in Applied Mathematics, with a keen interest in Data Science and Machine Learning. He has experience of 2+ years in content creation in Mathematics, Statistics, Data Science, and Mac... Read Full Bio