SQL TUTORIALS

SQL Tutorials: GROUP BY Extensions

[Enter image here]

DISCLAIMER: Before starting this tutorial, it's highly recommended that you download an RDBMS to follow along. We recommend using either PostgreSQL, MS SQL Server, or MySQL.

Aggregating data is a very popular task in data analytics. Because of this, SQL provides various extensions to the GROUP BY Clause to give you more power with your aggregations. This tutorial will cover the following learning objectives:

  • ROLLUP Extension
  • CUBE Extension
  • GROUPING SETS Extension

How to Create Subtotals in SQL




Summary

  • The ROLLUP Clause is a subclause of the GROUP BY Clause and assumes there is a hierarchy among columns. It's primary purpose is to generate subtotals for reporting. To implement this extension use the following syntax:
    SELECT [column_name1], SUM([column_name2])
    FROM [table_name]
    GROUP BY ROLLUP ([column_name1]);
  • If you don't want to include a grand total in your result set, use the following syntax:
    SELECT [col1], [col2], [col3], SUM([col4])
    FROM [table_name]
    GROUP BY [col1], ROLLUP ([col2], [col3])
  • In the outputted Result Set, you'll notice that the subgroups will have NULL values in the other dimensional columns. When this occurs, that means it's a subtotal row for that particular dimension. It's assumed that all dimensions don't have any NULLs.

How to Create Multi-Dimensional Subtotals in SQL




Summary:

  • The CUBE operator allows you to find the sum of every combination of a list of attributes (or dimensions) to compare subtotals across groups. To implement this extension, use the following syntax:
    SELECT [column_name1], [column_name2], SUM([column_name3]) AS [alias1]
    FROM [table_name]
    GROUP BY CUBE([column_name1], [column_name2]);
  • Although not used very often, the CUBE extension is a common SQL Interview question as it shows that you know the different grouping methods available in SQL.
  • Similar to the Rollup Extension, the Cub Clause will show the subtotals for each dimension as nullified rows for each level in the hierarchy.

How to Create Customized Grouped Dimenions in SQL




Summary:

  • The GROUPING SETS extension is a more refined version of the CUBE extension in that rather than giving you every possible combination of groups, GROUPING SETS allows you to specify which combinations you want to compare. To implement this extension, use the following syntax:
    SELECT [column_name1], [column_name2], SUM([column_name3]) AS [alias]
    FROM [table_name]
    GROUP BY GROUPING SETS (([column_name1]), ([column_name2]), ([column_name1], [colUmn_name2]));
  • If you've ever working with Pivot Tables in Excel or a similar tool, this may look familiar. GROUPING SETS allows you to customize how you drill down your dimensions and focus only on the categories you want rather than every possible combination of categories.

Exercise

Congratulations! You just completed the GROUP BY Extensions Tutorial! To help test your knowledge, let's use the "Sales" database to practice using the GROUP BY extensions covered in this tutorial. This exercise is based on the exercise of the previous tutorial.
**It's highly recommended that you complete the exercise outlined in the previous tutorial before beginning this exercise.**

Instructions:

  1. Open your RDBMS Client (e.g., MySQL, PostgreSQL, MS SQL Server).
  2. If you haven't created the "lego" database yet, please view the instructions in the "Joins" tutorial.
  3. "lego" database ERD:

  4. The Database used in this exercise includes data on hundreds of Lego sets and their associated parts, colors, and themes.

  5. Using the "lego" database, solve the following problems:
    1. Write a query that generates subtotals across the set name and theme name attributes (join required) for the total number of each set or theme in inventory. Order the result set by the total quantity in descending order. Give your aggregate column a descriptive name.
    2. Write a query that generates subtotals across all dimensions across the part name and color name attributes (join required) for the total number of each part or color in inventory. Order the result set by the number of parts in descending order, the part name is ascending order, and the color name in ascending order. Give your aggregate column a descriptive name.
    3. Write the same query as above, but include the GROUPING SETS operator. Play around and create custom grouping sets that make sense for analysis.
  6. Exercise Completed! Click here to view the answers.
  7. Have any issues with the above exercise? Post your question on Discord!