Understanding Set Theory – What, Where, Why, and How do we use it in Data Science

# Understanding Set Theory – What, Where, Why, and How do we use it in Data Science

Vikram Singh
Assistant Manager - Content
Updated on May 7, 2024 16:52 IST

Set in mathematics is a well-defined collection of objects that doesn’t vary from person to person. In this article, we will briefly discuss set theory, its representation, subset, cardinality, union and intersection of sets with the help of examples.

Set theory is a mathematical theory of a well-defined collection of objects called a set, and the objects of the set are called elements.

Every dataset that we use for the machine learning model is a collection of objects of a particular kind, such as

Meteorological data consists of temperature (minimum and maximum), wind speed, wind direction, visibility, sea level pressure, humidity, geographical location, humidity, precipitation, and many more.

Meteorologist uses this data to forecast the weather of any particular region, but it is more complex than it looks. They first pre-process the data, i.e.,

• Classifies the given dataset into categorical and numerical datasets
• Joining different variables (union & intersection) to find the correlation between the variables
• Split the datasets into two different subsets for training and testing data.

This article will briefly discuss sets, types of sets, subsets, the cardinality of the set, the union and intersection of sets, and how they can be used in Data science.

## What is Set

Set in mathematics is a well-defined collection of objects that doesn’t vary from person to person.

Example:

1. First five Natural Numbers: {1, 2, 3, 4, 5}
2. Vowels in English: {a, e, i, o, u}

Note:

1. The objects of the set are called elements.
2. A set of 5 best cricketers in the world is not a set, as it will vary from person to person.

### Representation

Set can be represented in two forms:

Roster Form: In the roster form, all the elements of a set are listed. The set elements are separated by commas and enclosed in {}.

Example: Vowels in English: {a, e, i, o, u}

Set-Builder Form: All the set elements possess a single common property, and there will not be any data point outside the set that will satisfy the defined property.

A = {x: x is a vowel in the English alphabet}

### Subset

A set (B) is said to be the subset of a set (A) if the elements of B are contained in set A.

In other words, if all the elements of set B are contained in set A, then B is said to be the subset of A, and A is said to be the superset of B.

Notation: if B is a subset of A, then it is represented by B A.

Example: A = {1, 2, 3}, then the subset of A are {}, {1}, {2},{3}, {1, 2}, {1, 3}, {2, 3}, and {1, 2, 3}.

Now, let’s move to see what are the different types of sets:

### Types of Set

• Empty Set: A set that has no elements is called an empty, null, or void set.
• Example: B = {x: x is an integer between 0.2 and 0.5}
• Singleton Set: Set that has only one element is known as a singleton set.
• Example: C = {x: 0.5 < x < 1.5 for x belongs to integer}
• Finite and Infinite Set: A set containing a finite number of elements is known as a finite set, whereas a set with an infinite number of elements is known as an infinite set.
• Example: D = {number of prime numbers between 1 to 50}
• Example: E = {number of stars in the galaxy}
• Equal Set: If A and B are two sets, then A = B, if and only if:
• The number of elements in both sets is the same.
• Elements in both sets are the same.
• Example: A = {2, 3, 5, 7, 11, 13, 17, 19}, B = {11, 2, 13, 19, 7, 5, 3, 17}
• Here, A = B since both set contains the same 11 elements.
• Power Set: The set of all the subsets of a set is known as the power set.
• Example: Power set of {1, 2, 3} is {{}, {1}, {2},{3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}

## Cardinality of a Set

The number of unique elements in the set is known as the cardinality of the set.

• If any set A has k elements, then the cardinality of A is given by: n(A) = k.

Example:

1. F = {1, 2, 3, 4, 5}, then the cardinality of F is 5.
2. G = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5}, then the cardinality of G is 5.

## Union and Intersection of Sets

Union: Union of two sets is the set that contains all the elements of both sets. It is the smallest set that contains all the elements of both sets.

Representation:

Example 1: A = {1, 4, 9, 16, 25}, B = {2, 3, 5, 7, 11}, then A U B = {1, 2, 3, 4, 5, 7, 9, 11, 16, 25}.

Example 2: A = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5}, B = {1, 4, 9, 16, 25}, then A U B = {1, 2, 3, 4, 5, 9, 16, 25}.

#### Properties of Union of Sets

• A U B = B U A
• A U (B U C) = (A U B) U C
• {} U A = A
• A U A = A
• If B is a subset of A, then
• A U B = A

Intersection: Intersection of two sets is the set of all elements that are common to both the sets.

Representation

Example 1: A = {1, 4, 9, 16, 25}, B = {2, 3, 5, 7, 11}, then

Example 2: A = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5}, B = {1, 4, 9, 16, 25}, then

#### Relation between the Cardinality of Union and Intersection

where,

Now, let’s take a real-life example to get a better understanding of sets.

Problem Statement: Let we are testing a machine learning model that predicts pregnancy in females, and for that, we have taken blood samples of both males and females. The model produces output into 4 different subsets.

• Male: Pregnant
• Male: Not Pregnant
• Pregnant Female: Pregnant
• Pregnant Female: Not-Pregnant

Now, represents the data into a set and find True positive, false positive, True negative, and false negative.

Solution:

X = {set of all people (male + female) who have take blood test)

A = {set of Males}

B = {set of Pregnant Females}

C = {set of output: Pregnant}

D = {set of output: Not-Pregnant}

Here, A, B, C, and D are the subsets of X.

True Negative: Males who are tested not-pregnant.

False Positive: Males who are tested pregnant.

False Negative: Pregnant Females who are tested not-pregnant.

True Positive: Pregnant Females who are tested pregnant.

Now, we will find the true positive, true negative, false positive, and true negative rates using the cardinality of sets.

True Positive Rate (TPR)

False Positive Rate (FPR)

True Negative Rate (TNR)

False Negative Rate (FNR)

The performance of the model will be good if

• True Positive Rate and True Negative Rate are closer to 1.
• False Positive Rate and False Negative Rate are closer to 0.

## Conclusion

In this article, we have briefly discussed set theory, its representation, subset, cardinality, union and intersection of sets with the help of examples.

Hope you will like the article.

Top Trending Article

Interview Questions

## FAQs

What is a Set?

Set in mathematics is a well-defined collection of objects that doesnu2019t vary from person to person. Example: Five five natural numbers, vowels in English

What is a Subset?

A set B is said to be the subset of A, if the element of B is contained in A.

What are the different types of sets?

Empty Set, Singleton Set, Finite and Infinite Set, Equal Set, and Power Set are some common types of sets.