Introduction to Window Functions in SQL

Introduction to Window Functions in SQL

8 mins read374 Views Comment
clickHere
Updated on Nov 22, 2022 19:37 IST

As a beginner, the majority of queries we write in SQL are focused on returning a single piece of data, or a single piece of data per group.

2022_05_Introduction-to-Window-Functions-in-SQL.jpg

But as we progress, it becomes evident that by doing so, some of our information is lost and that there is much more to view. Using window functions in SQL allows you to get more information. To get this data, we need to open a window, no pun intended! 

In this article, we will go through the following topics to get a close look at the Window Functions. 

  1. What are Window Functions? 
  2. What are different types of Window Functions? 
  3. Aggregate Window Functions 
  4. Ranking Window Functions 
  5. Value Window Functions 

What are Window Functions? 

  1. A window function is a calculation that is performed across a collection of rows in a table that is related to the current row in some way.  
  2. A window function is similar to aggregate functions with the GROUP_BY clauses, with one major difference:  
    • Unlike the aggregate function where the rows are merged into a single row, window functions help each row to retain its own identity. 

This image will explain to you what I’m talking about. 

2022_05_image-83.jpg

Interesting, right? 

Window functions are distinguished from other analytical and reporting functions by the OVER() clause (window definition). The following are the capabilities of the OVER() clause: 

  1. Creates groups of rows by defining window partitions (PARTITION BY clause). 
  2. Within a partition, it sorts the rows (ORDER BY clause). 

Following is a basic syntax for writing any query using Window functions- 

2022_05_image-86.jpg

Let’s look at some definitions since there are a lot of words here: 

  1. window_func = any aggregate/ranking/value function we want to use   
  2. col_name = columns that are to be selected 
  3. expression = this is the column that we want the window function operated on 
  4. OVER = to denote that this is a window function 
  5. PARTITION BY = divides the rows into partitions so we can define which rows to use to compute the window function 
  6. partition_list = the column(s) we want to partition by 
  7. ORDER BY = used to order the rows within each partition (this is optional and can be omitted) 
  8. order_list = name of the column(s) we want to order by 
  9. tbl_name= Name of the source table 

Now that you know the syntax, let us take look at the different types of window functions that can be substituted in place of the red font above. 

What are different types of Window Functions? 

Although the SQL window functions are not officially separated into categories, they are commonly divided into two or three types. The SQL window functions are divided into aggregate and built-in functions (rating and value). 

In the chart below, you can see some of the names of the functions that fall inside each group. 

2022_05_image-88.jpg

Here’s a quick rundown of the different types of window functions. 

  1. Aggregate functions: We can utilize these functions to calculate various aggregations within each window or partition, such as average, total # of rows, maximum or minimum values, or total sum. 
  2. Ranking functions: We can use these functions to simply assign numbers to the existing rows according to some requirements. 
  3. Value functions: These functions allow you to compare values from the partition’s preceding and subsequent rows, as well as the partition’s initial and last value. 

Before we go further, let me start by introducing you to the dataset we’ll be dealing with in this article. Assume that a corporation keeps track of the intern’s name, job, and compensation as follows: 

2022_05_image-97.jpg
2022_05_image-168.jpg

This sample dataset will be used to help us understand the topics in this article. Let’s get started then! 

Aggregate Window Function 

For example, if I were to display the total stipend of the interns along with every row value, it would look something like this: 

Code-

 
\n <span class="has-inline-color has-vivid-red-color">\n SELECT *, \n
\n <span class="has-inline-color has-vivid-red-color">\n SUM \n <span class="has-inline-color has-light-green-cyan-color">\n (stipend\n <span class="has-inline-color has-light-green-cyan-color">\n )\n <span class="has-inline-color has-vivid-red-color">\n OVER\n <span class="has-inline-color has-light-green-cyan-color">\n () \n <span class="has-inline-color has-vivid-red-color">\n AS total_stipend \n
\n <span class="has-inline-color has-vivid-red-color">\n FROM INTERN;\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">
Copy code

Output-

2022_05_image-169.jpg

The OVER clause denotes a set of rows that a window function is applied to. It can also be used with window functions that aren’t aggregate functions.  

The next step is to decide how we want to partition. To put it another way, how should the rows be grouped together to make our partitions? 

  • The OVER clause is to be used in conjunction with the PARTITION BY clause. This will divide the rows into various partitions. The window function then reacts to these partitions. 

To display the total stipend per job category for all rows, for example, we would need to change our initial SQL query to: 

Code-

 
\n <span class="has-inline-color has-vivid-red-color">\n SELECT *, \n
\n <span class="has-inline-color has-vivid-red-color">\n SUM \n <span class="has-inline-color has-light-green-cyan-color">\n (stipend\n <span class="has-inline-color has-light-green-cyan-color">\n )\n <span class="has-inline-color has-vivid-red-color">\n OVER\n <span class="has-inline-color has-light-green-cyan-color">\n () \n <span class="has-inline-color has-vivid-red-color">\n AS total_stipend \n
\n <span class="has-inline-color has-vivid-red-color">\n FROM INTERN;\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">
Copy code

Output-

2022_05_image-183.jpg

As you can see, the total job stipend column shows the total sales for that particular job category, not for the entire table. 

Now let’s rearrange the rows within each partition. 

We know that to arrange rows in a table within each partition, we have to modify the OVER clause with the ORDER BY clause. 

Code-

 
\n <span class="has-inline-color has-vivid-red-color">\n SELECT *, \n
\n <span class="has-inline-color has-vivid-red-color">\n SUM \n <span class="has-inline-color has-light-green-cyan-color">\n (stipend\n <span class="has-inline-color has-light-green-cyan-color">\n ) \n <span class="has-inline-color has-vivid-red-color">\n OVER\n <span class="has-inline-color has-light-green-cyan-color">\n (\n <span class="has-inline-color has-vivid-red-color">\n PARTITION BY job \n <span class="has-inline-color has-vivid-red-color">\n ORDER BY stipend \n <span class="has-inline-color has-vivid-red-color">\n DESC\n <span class="has-inline-color has-light-green-cyan-color">\n ) \n <span class="has-inline-color has-vivid-red-color">\n AS \n
ordered_job_stipend \n
\n <span class="has-inline-color has-vivid-red-color">\n FROM INTERN;\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">
Copy code

Output-

2022_05_image-179.jpg

Here, the job column has been used to split the rows according to their job category. As you look at the above table, you’ll note that the stipend column is in decreasing order, and the ordered_job_stipend column shows the job category’s running total (starting over after every partition). 

Next in the article, we can check the working of the ranking window functions. 

Ranking Window Functions 

In the example below, we’ll use three ranking window functions to generate results. 

  1. Row_number() 

It’s possible that your dataset doesn’t have a column that shows the rows in sequential order. In that scenario, the ROW_NUMBER() window function can be used. Each row of the table is given a unique sequential number. 

Code-

 
\n <span class="has-inline-color has-vivid-red-color">\n SELECT *, \n
\n <span class="has-inline-color has-vivid-red-color">\n ROW_NUMBER\n <span class="has-inline-color has-light-green-cyan-color">\n ()\n <span class="has-inline-color has-vivid-red-color">\n OVER\n <span class="has-inline-color has-light-green-cyan-color">\n () \n <span class="has-inline-color has-vivid-red-color">\n AS row_number \n
\n <span class="has-inline-color has-vivid-red-color">\n FROM INTERN;\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">
Copy code

Output-

2022_05_image-188.jpg

Here, the rows on the job column have been partitioned and ordered based on the intern’s stipend. Notice how each time a new partition begins, the numbering is reset. 

But suppose we want to rank the interns based on their pay? 

  1. Rank() 

The RANK () window function, as its name implies, ranks the rows within a partition based on a criterion. 

Code-

 
\n <span class="has-inline-color has-vivid-red-color">\n SELECT *, \n
\n <span class="has-inline-color has-vivid-red-color">\n ROW_NUMBER\n <span class="has-inline-color has-light-green-cyan-color">\n ()\n <span class="has-inline-color has-vivid-red-color">\n OVER\n <span class="has-inline-color has-light-green-cyan-color">\n (\n <span class="has-inline-color has-vivid-red-color">\n PARTITION BY job \n <span class="has-inline-color has-vivid-red-color">\n ORDER BY stipend\n <span class="has-inline-color has-light-green-cyan-color">\n ) \n <span class="has-inline-color has-vivid-red-color">\n AS row_number, \n
\n <span class="has-inline-color has-vivid-red-color">\n RANK\n <span class="has-inline-color has-light-green-cyan-color">\n ()\n <span class="has-inline-color has-vivid-red-color">\n OVER\n <span class="has-inline-color has-light-green-cyan-color">\n (\n <span class="has-inline-color has-vivid-red-color">\n PARTITION BY job \n <span class="has-inline-color has-vivid-red-color">\n ORDER BY stipend\n <span class="has-inline-color has-light-green-cyan-color">\n ) \n <span class="has-inline-color has-vivid-red-color">\n AS rank_row \n
\n <span class="has-inline-color has-vivid-red-color">\n FROM INTERN;\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">
Copy code

Output-

2022_05_image-190.jpg

Take note of the highlighted area. We have a sequential number in the case of ROW_NUMBER (). In the case of RANK (), however, we have the same rank for rows with the same value. 

But, there is a problem. Even though rows with the same value are assigned the same rank, the next rank skips the absent rank. If we had to deliver the “top N distinct” values from a table, this would not produce the necessary results. As a result, we have the dense_rank function to deal with this problem. 

  1. Dense_rank() 

The DENSE_RANK () function is similar to RANK () with one exception: when ranking rows, it does not skip any levels. 

Code-

 
\n <span class="has-inline-color has-vivid-red-color">\n SELECT *, \n
\n <span class="has-inline-color has-vivid-red-color">\n ROW_NUMBER\n <span class="has-inline-color has-light-green-cyan-color">\n ()\n <span class="has-inline-color has-vivid-red-color">\n OVER\n <span class="has-inline-color has-light-green-cyan-color">\n (\n <span class="has-inline-color has-vivid-red-color">\n PARTITION BY job \n <span class="has-inline-color has-vivid-red-color">\n ORDER BY stipend\n <span class="has-inline-color has-light-green-cyan-color">\n ) \n <span class="has-inline-color has-vivid-red-color">\n AS row_number, \n
\n <span class="has-inline-color has-vivid-red-color">\n RANK\n <span class="has-inline-color has-light-green-cyan-color">\n ()\n <span class="has-inline-color has-vivid-red-color">\n OVER\n <span class="has-inline-color has-light-green-cyan-color">\n (\n <span class="has-inline-color has-vivid-red-color">\n PARTITION BY job \n <span class="has-inline-color has-vivid-red-color">\n ORDER BY stipend\n <span class="has-inline-color has-light-green-cyan-color">\n ) \n <span class="has-inline-color has-vivid-red-color">\n AS rank_row,\n
\n <span class="has-inline-color has-vivid-red-color">\n DENSE_RANK\n <span class="has-inline-color has-light-green-cyan-color">\n ()\n <span class="has-inline-color has-vivid-red-color">\n OVER\n <span class="has-inline-color has-light-green-cyan-color">\n (\n <span class="has-inline-color has-vivid-red-color">\n PARTITION BY job \n <span class="has-inline-color has-vivid-red-color">\n ORDER BY stipend\n <span class="has-inline-color has-light-green-cyan-color">\n ) \n <span class="has-inline-color has-vivid-red-color">\n AS dense_rank_row\n
\n <span class="has-inline-color has-vivid-red-color">\n FROM INTERN;\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">
Copy code

Output-

2022_05_image-193.jpg

Within each partition, all of the ranks are distinctive and increase sequentially. DENSE_RANK()  has not skipped any ranks within a partition, unlike the RANK() function. 

Moving forward, we will learn about the value functions. 

Value Window Functions 

Value functions, in my opinion, are the most important reason why window functions are so useful. Sometimes, you’ll want to compare the current row’s value to that of the preceding or succeeding row. The window functions LEAD () and LAG () are specifically designed for this purpose. 

  1. LAG can return values from prior rows. 
  2. LEAD can only return data from subsequent rows. 

Code-

 
\n <span class="has-inline-color has-vivid-red-color">\n SELECT *, \n
\n <span class="has-inline-color has-vivid-red-color">\n LEAD\n <span class="has-inline-color has-light-green-cyan-color">\n (stipend,1\n <span class="has-inline-color has-light-green-cyan-color">\n ) \n <span class="has-inline-color has-vivid-red-color">\n OVER\n <span class="has-inline-color has-light-green-cyan-color">\n (\n <span class="has-inline-color has-vivid-red-color">\n PARTITION BY job \n <span class="has-inline-color has-vivid-red-color">\n ORDER BY stipend\n <span class="has-inline-color has-light-green-cyan-color">\n ) \n <span class="has-inline-color has-vivid-red-color">\n AS next \n
\n <span class="has-inline-color has-vivid-red-color">\n FROM INTERN;\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">
Copy code

Output-

2022_05_image-196.jpg

Here, we created a new column containing stipends from the next row within each partition ordered by stipend using the LEAD function. Notice that the last row from each partition has a null value because there is no successive row for it to retrieve data from. 

Now, let’s try it again with the LAG function. 

Code-

 
\n <span class="has-inline-color has-vivid-red-color">\n SELECT *, \n
\n <span class="has-inline-color has-vivid-red-color">\n LAG\n <span class="has-inline-color has-light-green-cyan-color">\n (stipend,1\n <span class="has-inline-color has-light-green-cyan-color">\n ) \n <span class="has-inline-color has-vivid-red-color">\n OVER\n <span class="has-inline-color has-light-green-cyan-color">\n (\n <span class="has-inline-color has-vivid-red-color">\n PARTITION BY job \n <span class="has-inline-color has-vivid-red-color">\n ORDER BY stipend\n <span class="has-inline-color has-light-green-cyan-color">\n ) \n <span class="has-inline-color has-vivid-red-color">\n AS previous, \n
stipend - \n <span class="has-inline-color has-vivid-red-color">\n LAG\n <span class="has-inline-color has-light-green-cyan-color">\n (stipend,1\n <span class="has-inline-color has-light-green-cyan-color">\n ) \n <span class="has-inline-color has-vivid-red-color">\n OVER\n <span class="has-inline-color has-light-green-cyan-color">\n (\n <span class="has-inline-color has-vivid-red-color">\n PARTITION BY job \n <span class="has-inline-color has-vivid-red-color">\n ORDER BY stipend\n <span class="has-inline-color has-light-green-cyan-color">\n ) \n <span class="has-inline-color has-vivid-red-color">\n AS difference \n
\n <span class="has-inline-color has-vivid-red-color">\n FROM INTERN;\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-light-green-cyan-color">\n </span class="has-inline-color has-vivid-red-color">\n </span class="has-inline-color has-vivid-red-color">
Copy code

Output-

2022_05_image-198.jpg

We added two additional columns here. Within each partition, the first column contains a stipend from the previous row, sorted by it. The difference between the stipend from the previous row and the current row is shown in the second column. As you can see, this is really useful for quickly analyzing pay differences within the same partition. 

Endnotes 

By now, we’ve seen quite a few window functions, and I hope you’ve realized the power of SQL window functions. The SQL window functions are quite powerful and efficient, and their syntax is fairly logical while being rich and slightly distinct from regular SQL. Here are some key points to remember from this article: 

  • Window functions allow you to aggregate data across any number of rows in the window frame. 
  • When you add the OVER() clause to an aggregate set function, it becomes an aggregate window function. 
  • Window functions can only be used in the SELECT list and ORDER BY clauses of a query. 
  • ORDER BY comes after window functions. 
Top Trending Tech Articles:
Career Opportunities after BTech | Online Python Compiler | What is Coding | Queue Data Structure | Top Programming Language | Trending DevOps Tools | Highest Paid IT Jobs | Most In Demand IT Skills | Networking Interview Questions | Features of Java | Basic Linux Commands | Amazon Interview Questions

Recently completed any professional course/certification from the market? Tell us what liked or disliked in the course for more curated content.

Click here to submit its review with Shiksha Online.

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio

Comments