Using Partition By SQL Clause

Using Partition By SQL Clause

8 mins read436 Views Comment
Updated on Aug 16, 2024 11:57 IST

The Partition By SQL clause is a subclause of OVER clause that is used in every invocation of window functions such as MAX(), RANK() and AVG().

2023_02_MicrosoftTeams-image-155.jpg

Table of Contents

What is Partition By SQL Clause?

In SQL, the “PARTITION BY” clause is often used in the context of window functions. It is used to specify the columns by which the rows of a result set should be divided into partitions. Within each partition, the window function operates independently, treating the rows in each partition as if they were a separate group.

Let’s take an example, considering a table which contains information about sales and transactions, where one row per transaction. Now, if we want to compute the running total of sales by store, we can use the window function having a “PARTITION BY” clause which specifies the “store” column, as mentioned below:

SELECT store, sales, SUM(sales) OVER (PARTITION BY store ORDER BY transaction_date) AS running_total
FROM sales_table
ORDER BY store, transaction_date;
Copy code

The query written above returns a result set which includes a “running_total” column, which shows the cumulative total sales for each store, up to and it also includes the current row. The “PARTITION BY” SQL clause ensures that the running total is being reset every time for each store so that it only considers the sales for that particular store.

Recommended online courses

Best-suited Database and SQL courses for you

Learn Database and SQL with these high-rated online courses

Free
4 weeks
4.24 K
6 weeks
– / –
2 months
Free
2 hours
– / –
15 hours
Free
1 hours
Free
9 hours
– / –
3 hours
Free
7 hours

Checkout the Top Online SQL Courses and Certifications

Usage of “Partition By” SQL Clause

The “PARTITION BY” clause in SQL is used to specify the columns by which the data in a query should be partitioned or grouped. Here are a few examples to illustrate its usage:

1. Partitioning by a single column:

SELECT department, AVG(salary)
FROM employees
GROUP BY department
PARTITION BY department;
Copy code

2. Partitioning by multiple columns:

SELECT department, location, AVG(salary)
FROM employees
GROUP BY department, location
PARTITION BY department, location;
Copy code

3. Partitioning by a calculated value:

SELECT floor(hire_date/7) as week, AVG(salary)
FROM employees
GROUP BY floor(hire_date/7)
PARTITION BY floor(hire_date/7);
Copy code

In the above examples, the data is first partitioned based on the columns specified in the PARTITION BY clause and then grouped based on the columns specified in the GROUP BY clause. The aggregate functions (e.g., AVG, SUM) are then applied to each partition.

Examples of “Partition By” SQL Clause

Below are a few detailed examples of the PARTITION BY clause in SQL:

1. Partitioning sales data by year and product category:

SELECT year, category, SUM(sales)
FROM sales_data
GROUP BY year, category
PARTITION BY year, category;
Copy code

In this example, the “PARTITION BY” clause partitions the sales data by year and product category, and the “GROUP BY” clause groups the data by the same columns. The SUM function calculates the total sales for each partition.

2. Partitioning employee data by hire date and department:

SELECT department, hire_date, AVG(salary)
FROM employees
GROUP BY department, hire_date
PARTITION BY hire_date, department;
Copy code

In this example, the PARTITION BY clause partitions the employee data by hire date and department, and the GROUP BY clause groups the data by the same columns. The AVG function calculates the average salary for each partition.

3. Partitioning product data by month and product name:

SELECT product_name, MONTH(order_date) as month, SUM(quantity)
FROM product_orders
GROUP BY product_name, MONTH(order_date)
PARTITION BY MONTH(order_date), product_name;
FROM employees
GROUP BY department, hire_date
PARTITION BY hire_date, department;
Copy code

In this example, the “PARTITION BY” clause partitions the product data by the month of the order date and product name, and the “GROUP BY” clause groups the data by the same columns. The SUM function calculates the total quantity of each product ordered in each partition.

In each of these examples, the “PARTITION BY” clause allows for the efficient calculation of aggregate functions over subgroups of data. This can be useful for performance optimization and for organizing results in a specific way.

Applications of Partition By Clause

The “PARTITION BY” clause in SQL is used in various applications where you need to perform calculations based on subsets of data within a larger set. Here are a few common uses of the PARTITION BY clause:

  1. Data Aggregation- The “PARTITION BY” clause can be used to perform aggregation (such as sum, average, count, etc.) on subsets of data based on specific columns. For example, you can calculate the total sales of each product in each quarter of the year.
  2. Window Function- The “PARTITION BY” clause is used with window functions (such as ROW_NUMBER(), RANK(), DENSE_RANK(), etc.) to perform calculations based on subsets of data. For example, you can calculate the running total of sales for each product in a given time period.
  3. Rank Calculation- The “PARTITION BY” clause can be used to calculate the rank of rows within a subset of data. For example, you can determine the rank of each employee within their department based on their salary.
  4. Pivot Tables- The “PARTITION BY” clause can be used to create pivot tables in SQL, where you can summarize data in a compact format with rows and columns. For example, you can create a pivot table that shows the total sales of each product by region.
  5. REPORT GENERATION- The “PARTITION BY” clause can be used to generate reports that summarize data based on specific columns. For example, you can generate a report that shows the average salary of employees by department and year of hire.

In all of these applications, the “PARTITION BY” clause is used to divide a large set of data into smaller partitions, allowing you to perform more specific calculations on each partition and make more informed decisions based on the results.

Types Of Partition By SQL Clause

As mentioned above, the “PARTITION BY” SQL clause is used in the context of window functions and is used to divide the result set into partitions or groups. Each partition is processed independently and the window function is applied to each partition.

Below are illustrated the different types of partitioning which can be done using the “PARTITION BY” clause:

1. Partition By a Single Column

In this type of partitioning, the result set is divided into partitions based on the values of a single column. For example, you can partition the result set by the values of the “Department” column.

A practical query example is as followed-

SELECT
EmployeeID,
Department,
Salary,
SUM(Salary) OVER (PARTITION BY Department ORDER BY Salary) AS RunningTotal
FROM
Employees;
Copy code

In the above example, the result set is divided into partitions based on the values of the “Department” column. The SUM function calculates the running total of the salary for each department.

2. Partition By Multiple Columns

In this type of partitioning, the result set is divided into partitions based on the values of multiple columns. For example, you can partition the result set by both the “Department” and “Designation” columns.

A practical query example is as followed-

SELECT
EmployeeID,
Department,
Designation,
Salary,
SUM(Salary) OVER (PARTITION BY Department, Designation ORDER BY Salary) AS RunningTotal
FROM
Employees;
Copy code

In this example, the result set is divided into partitions based on the values of both the “Department” and “Designation” columns. The SUM function calculates the running total of the salary for each department and designation combination.

3. Partition By Expressions

In this type of partitioning, the result set is divided into partitions based on the results of a mathematical expression or user-defined function. For example, you can partition the result set by the result of an expression that calculates the total salary for each employee.

SELECT
EmployeeID,
Department,
Salary,
(Salary + (Salary * 0.1)) AS TotalSalary,
SUM(TotalSalary) OVER (PARTITION BY Department ORDER BY TotalSalary) AS RunningTotal
FROM
Employees;
Copy code

In this example, the result set is divided into partitions based on the values of the expression that calculates the total salary for each employee. The SUM function calculates the running total of the total salary for each department.

4. Partition By Range

In this type of partitioning, the result set is divided into partitions based on the values of a single column, where each partition represents a range of values. For example, you can partition the result set by the values of the “Age” column, where each partition represents a range of 5 years.

SELECT
EmployeeID,
Age,
Salary,
SUM(Salary) OVER (PARTITION BY
CASE
WHEN Age BETWEEN 18 AND 22 THEN '18-22'
WHEN Age BETWEEN 23 AND 27 THEN '23-27'
ELSE '28+'
END
ORDER BY Age) AS RunningTotal
FROM
Employees;
Copy code

In this example, the result set is divided into partitions based on the range of values of the “Age” column. The SUM function calculates the running total of the salary for each age range.

5. Partition By List

In this type of partitioning, the result set is divided into partitions based on specific values of a single column. For example, you can partition the result set by the values of the “Department” column, where each partition represents a specific department.

SELECT
EmployeeID,
Department,
Salary,
SUM(Salary) OVER (PARTITION BY
CASE Department
WHEN 'IT' THEN 'IT'
WHEN 'HR' THEN 'HR'
ELSE 'OTHER'
END
ORDER BY Department) AS RunningTotal
FROM
Employees;
Copy code

In this example, the result set is divided into partitions based on specific values of the “Department” column. The SUM function calculates the running total of the salary for each department type.

Note that the “PARTITION BY” clause is optional, and if it is not specified, the entire result set is treated as a single partition.

Advantages Of Partition By SQL Clause

The “PARTITION BY” clause in SQL has several advantages, including:

  1. Improved Performance: Partitioning the result set can significantly improve the performance of window functions, especially when the result set is large and complex. This is because window functions are processed independently for each partition, reducing the amount of data that needs to be processed.
  2. Better Organization: Partitioning the result set can help organize the data into meaningful groups, making it easier to understand and analyze.
  3. Increased Flexibility: Partitioning the result set can allow for more complex calculations, as you can apply different window functions to different partitions.
  4. Improved Readibility: Partitioning the result set can make the SQL code more readable, as it separates the calculations for each partition into separate sections.
  5. Easier Maintainance: Partitioning the result set can make the SQL code easier to maintain, as it reduces the complexity of the calculations and makes it easier to understand and modify the code.
  6. Better Scalability: Partitioning the result set can make the SQL code more scalable, as it reduces the amount of data that needs to be processed, making it easier to scale the calculations as the data grows.

Please Checkout More SQL Blogs

How to Find Nth Highest Salary in SQL
How to Find Nth Highest Salary in SQL
Finding out the N’th highest salary from a table is one of the most frequently asked SQL interview questions. In this article, we will discuss four different approaches to find...read more

Order of Execution in SQL
Order of Execution in SQL
An SQL query comprises of various clauses like SELECT, FROM, WHERE, GROUPBY, HAVING, and ORDERBY clauses. Each clause has a specific role in the query. The correct order of execution...read more

How to Find Second Highest Salary in SQL
How to Find Second Highest Salary in SQL
It's crucial to master SQL queries for managing and analyzing databases. This article focuses on finding the second-highest salary in SQL, which is a common yet important task in database...read more

Using Partition By SQL Clause
Using Partition By SQL Clause
The Partition By SQL clause is a subclause of OVER clause that is used in every invocation of window functions such as MAX(), RANK() and AVG(). In SQL, the “PARTITION...read more

Top 30 SQL Query Interview Questions
Top 30 SQL Query Interview Questions
Structured Query Language or most commonly known as SQL is used on a daily basis to handle, manipulate and analyze relational databases.

SQL ALTER TABLE : ADD, DROP, MODIFY, RENAME
SQL ALTER TABLE : ADD, DROP, MODIFY, RENAME
ALTER TABLE in SQL is used to change the structure of the existing table. In this article, we will briefly discuss how to add, modify, drop, rename columns, constraints, and...read more

How to Delete a Column in SQL?
How to Delete a Column in SQL?
Sometimes in the dataset, you have some columns that are not of your use but when you run a query it will also get executed that leads to you increase...read more

How to Use TRUNCATE Command in SQL?
How to Use TRUNCATE Command in SQL?
SQL TRUNCATE TABLE command is used to remove the record from the table. Confused, the SQL DELETE statement also removes the record from the table. So, why use another command...read more

How to use DROP command in SQL?
How to use DROP command in SQL?
In SQL, the DROP command is used to delete a database, table, index, or view. To use the DROP command, you must be very careful because it permanently deletes the...read more

How to Create, Update, Insert and Delete SQL Views?
How to Create, Update, Insert and Delete SQL Views?
This article will give you a detailed insight on SQL views. You will learn the method to create, update, insert, drop and delete SQL views. This article will give you...read more

All About Natural Joins in SQL
All About Natural Joins in SQL
Natural join is an inner join that automatically joins two or more tables with the same name and data type on all columns. But both inner join and natural join...read more

100+ SQL Interview Questions and Answers for 2023
100+ SQL Interview Questions and Answers for 2023
Are you wondering what SQL interview questions you will be asked? If you are preparing for an SQL interview, this article will provide you with 100+ most commonly asked SQL...read more

Difference Between SQL and PLSQL
Difference Between SQL and PLSQL
In this article, we will learn what SQL is, What PLSQL is, their types, features, applications, and their differences.

75 Most Popular MySQL Commands
75 Most Popular MySQL Commands
This article covers the most popular MySQL commands with examples to help you work more efficiently with MySQL databases. This article covers the most popular MySQL commands with examples to...read more

Conclusion

The “PARTITION BY” clause in SQL is used to divide a result set into partitions based on the values of one or more columns. These partitions can then be used to perform calculations and aggregate functions, such as running totals or cumulative sums, using window functions. This can greatly improve the performance of these calculations, as well as make the SQL code more readable, flexible, and scalable.

“PARTITION BY” SQL clause is an important tool for data analysis and reporting, allowing for more complex and meaningful calculations to be performed on large and complex result sets. Overall, the “PARTITION BY” clause can greatly enhance the functionality and performance of window functions in SQL, making it a valuable tool for data analysis and reporting.

Explore free data analysis courses

Contributed by: Nimisha

About the Author