Using Partition By SQL Clause

Using Partition By SQL Clause

8 mins read436 Views Comment
clickHere
Updated on Mar 13, 2024 14:24 IST

The Partition By SQL clause is a subclause of OVER clause that is used in every invocation of window functions such as MAX(), RANK() and AVG().

2023_02_MicrosoftTeams-image-155.jpg

Table of Contents

What is Partition By SQL Clause?

In SQL, the “PARTITION BY” clause is often used in the context of window functions. It is used to specify the columns by which the rows of a result set should be divided into partitions. Within each partition, the window function operates independently, treating the rows in each partition as if they were a separate group.

Let’s take an example, considering a table which contains information about sales and transactions, where one row per transaction. Now, if we want to compute the running total of sales by store, we can use the window function having a “PARTITION BY” clause which specifies the “store” column, as mentioned below:

 
SELECT store, sales, SUM(sales) OVER (PARTITION BY store ORDER BY transaction_date) AS running_total
FROM sales_table
ORDER BY store, transaction_date;
Copy code

The query written above returns a result set which includes a “running_total” column, which shows the cumulative total sales for each store, up to and it also includes the current row. The “PARTITION BY” SQL clause ensures that the running total is being reset every time for each store so that it only considers the sales for that particular store.

Usage of “Partition By” SQL Clause

The “PARTITION BY” clause in SQL is used to specify the columns by which the data in a query should be partitioned or grouped. Here are a few examples to illustrate its usage:

1. Partitioning by a single column:

 
SELECT department, AVG(salary)
FROM employees
GROUP BY department
PARTITION BY department;
Copy code

2. Partitioning by multiple columns:

 
SELECT department, location, AVG(salary)
FROM employees
GROUP BY department, location
PARTITION BY department, location;
Copy code

3. Partitioning by a calculated value:

 
SELECT floor(hire_date/7) as week, AVG(salary)
FROM employees
GROUP BY floor(hire_date/7)
PARTITION BY floor(hire_date/7);
Copy code

In the above examples, the data is first partitioned based on the columns specified in the PARTITION BY clause and then grouped based on the columns specified in the GROUP BY clause. The aggregate functions (e.g., AVG, SUM) are then applied to each partition.

Examples of “Partition By” SQL Clause

Below are a few detailed examples of the PARTITION BY clause in SQL:

1. Partitioning sales data by year and product category:

 
SELECT year, category, SUM(sales)
FROM sales_data
GROUP BY year, category
PARTITION BY year, category;
Copy code

In this example, the “PARTITION BY” clause partitions the sales data by year and product category, and the “GROUP BY” clause groups the data by the same columns. The SUM function calculates the total sales for each partition.

2. Partitioning employee data by hire date and department:

 
SELECT department, hire_date, AVG(salary)
FROM employees
GROUP BY department, hire_date
PARTITION BY hire_date, department;
Copy code

In this example, the PARTITION BY clause partitions the employee data by hire date and department, and the GROUP BY clause groups the data by the same columns. The AVG function calculates the average salary for each partition.

3. Partitioning product data by month and product name:

 
SELECT product_name, MONTH(order_date) as month, SUM(quantity)
FROM product_orders
GROUP BY product_name, MONTH(order_date)
PARTITION BY MONTH(order_date), product_name;
FROM employees
GROUP BY department, hire_date
PARTITION BY hire_date, department;
Copy code

In this example, the “PARTITION BY” clause partitions the product data by the month of the order date and product name, and the “GROUP BY” clause groups the data by the same columns. The SUM function calculates the total quantity of each product ordered in each partition.

In each of these examples, the “PARTITION BY” clause allows for the efficient calculation of aggregate functions over subgroups of data. This can be useful for performance optimization and for organizing results in a specific way.

Applications of Partition By Clause

The “PARTITION BY” clause in SQL is used in various applications where you need to perform calculations based on subsets of data within a larger set. Here are a few common uses of the PARTITION BY clause:

  1. Data Aggregation- The “PARTITION BY” clause can be used to perform aggregation (such as sum, average, count, etc.) on subsets of data based on specific columns. For example, you can calculate the total sales of each product in each quarter of the year.
  2. Window Function- The “PARTITION BY” clause is used with window functions (such as ROW_NUMBER(), RANK(), DENSE_RANK(), etc.) to perform calculations based on subsets of data. For example, you can calculate the running total of sales for each product in a given time period.
  3. Rank Calculation- The “PARTITION BY” clause can be used to calculate the rank of rows within a subset of data. For example, you can determine the rank of each employee within their department based on their salary.
  4. Pivot Tables- The “PARTITION BY” clause can be used to create pivot tables in SQL, where you can summarize data in a compact format with rows and columns. For example, you can create a pivot table that shows the total sales of each product by region.
  5. REPORT GENERATION- The “PARTITION BY” clause can be used to generate reports that summarize data based on specific columns. For example, you can generate a report that shows the average salary of employees by department and year of hire.

In all of these applications, the “PARTITION BY” clause is used to divide a large set of data into smaller partitions, allowing you to perform more specific calculations on each partition and make more informed decisions based on the results.

Types Of Partition By SQL Clause

As mentioned above, the “PARTITION BY” SQL clause is used in the context of window functions and is used to divide the result set into partitions or groups. Each partition is processed independently and the window function is applied to each partition.

Below are illustrated the different types of partitioning which can be done using the “PARTITION BY” clause:

1. Partition By a Single Column

In this type of partitioning, the result set is divided into partitions based on the values of a single column. For example, you can partition the result set by the values of the “Department” column.

A practical query example is as followed-

 
SELECT
EmployeeID,
Department,
Salary,
SUM(Salary) OVER (PARTITION BY Department ORDER BY Salary) AS RunningTotal
FROM
Employees;
Copy code

In the above example, the result set is divided into partitions based on the values of the “Department” column. The SUM function calculates the running total of the salary for each department.

2. Partition By Multiple Columns

In this type of partitioning, the result set is divided into partitions based on the values of multiple columns. For example, you can partition the result set by both the “Department” and “Designation” columns.

A practical query example is as followed-

 
SELECT
EmployeeID,
Department,
Designation,
Salary,
SUM(Salary) OVER (PARTITION BY Department, Designation ORDER BY Salary) AS RunningTotal
FROM
Employees;
Copy code

In this example, the result set is divided into partitions based on the values of both the “Department” and “Designation” columns. The SUM function calculates the running total of the salary for each department and designation combination.

3. Partition By Expressions

In this type of partitioning, the result set is divided into partitions based on the results of a mathematical expression or user-defined function. For example, you can partition the result set by the result of an expression that calculates the total salary for each employee.

 
SELECT
EmployeeID,
Department,
Salary,
(Salary + (Salary * 0.1)) AS TotalSalary,
SUM(TotalSalary) OVER (PARTITION BY Department ORDER BY TotalSalary) AS RunningTotal
FROM
Employees;
Copy code

In this example, the result set is divided into partitions based on the values of the expression that calculates the total salary for each employee. The SUM function calculates the running total of the total salary for each department.

4. Partition By Range

In this type of partitioning, the result set is divided into partitions based on the values of a single column, where each partition represents a range of values. For example, you can partition the result set by the values of the “Age” column, where each partition represents a range of 5 years.

 
SELECT
EmployeeID,
Age,
Salary,
SUM(Salary) OVER (PARTITION BY
CASE
WHEN Age BETWEEN 18 AND 22 THEN '18-22'
WHEN Age BETWEEN 23 AND 27 THEN '23-27'
ELSE '28+'
END
ORDER BY Age) AS RunningTotal
FROM
Employees;
Copy code

In this example, the result set is divided into partitions based on the range of values of the “Age” column. The SUM function calculates the running total of the salary for each age range.

5. Partition By List

In this type of partitioning, the result set is divided into partitions based on specific values of a single column. For example, you can partition the result set by the values of the “Department” column, where each partition represents a specific department.

 
SELECT
EmployeeID,
Department,
Salary,
SUM(Salary) OVER (PARTITION BY
CASE Department
WHEN 'IT' THEN 'IT'
WHEN 'HR' THEN 'HR'
ELSE 'OTHER'
END
ORDER BY Department) AS RunningTotal
FROM
Employees;
Copy code

In this example, the result set is divided into partitions based on specific values of the “Department” column. The SUM function calculates the running total of the salary for each department type.

Note that the “PARTITION BY” clause is optional, and if it is not specified, the entire result set is treated as a single partition.

Advantages Of Partition By SQL Clause

The “PARTITION BY” clause in SQL has several advantages, including:

  1. Improved Performance: Partitioning the result set can significantly improve the performance of window functions, especially when the result set is large and complex. This is because window functions are processed independently for each partition, reducing the amount of data that needs to be processed.
  2. Better Organization: Partitioning the result set can help organize the data into meaningful groups, making it easier to understand and analyze.
  3. Increased Flexibility: Partitioning the result set can allow for more complex calculations, as you can apply different window functions to different partitions.
  4. Improved Readibility: Partitioning the result set can make the SQL code more readable, as it separates the calculations for each partition into separate sections.
  5. Easier Maintainance: Partitioning the result set can make the SQL code easier to maintain, as it reduces the complexity of the calculations and makes it easier to understand and modify the code.
  6. Better Scalability: Partitioning the result set can make the SQL code more scalable, as it reduces the amount of data that needs to be processed, making it easier to scale the calculations as the data grows.
What Is a Cursor in SQL?
What Is a Cursor in SQL?
A cursor is a control structure in SQL that enables traversal over the rows in a result set. It is a way to retrieve data from a database and process...read more
All About Natural Joins in SQL
All About Natural Joins in SQL
Natural join is an inner join that automatically joins two or more tables with the same name and data type on all columns. But both inner join and natural join...read more
What are TCL Commands in SQL?
What are TCL Commands in SQL?
SQL, or Structured Query Language, is a database language that allows you to create a database and perform various operations. This is done using various types of SQL commands, such...read more

Conclusion

The “PARTITION BY” clause in SQL is used to divide a result set into partitions based on the values of one or more columns. These partitions can then be used to perform calculations and aggregate functions, such as running totals or cumulative sums, using window functions. This can greatly improve the performance of these calculations, as well as make the SQL code more readable, flexible, and scalable.

“PARTITION BY” SQL clause is an important tool for data analysis and reporting, allowing for more complex and meaningful calculations to be performed on large and complex result sets. Overall, the “PARTITION BY” clause can greatly enhance the functionality and performance of window functions in SQL, making it a valuable tool for data analysis and reporting.

Explore free data analysis courses

Contributed by: Nimisha

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio