Crosstab() Query in PostgreSQL

Crosstab() Query in PostgreSQL

Crosstab() allows you to create “pivot tables” within your PostgreSQL database. It can convert data from a vertical to a horizontal format, which helps in examining data. Crosstab () is not a built-in SQL function but comes from the tablefunc extension in PostgreSQL, which needs to be enabled before use. It is used for reporting and analyzing data. In this blog, you will explore crosstab() and its function in detail with examples.

Table of Contents:

What are Crosstabs()?

Crosstab() is a technique to convert data from a vertical to a horizontal format, which helps in examining data. It essentially pivots data, turning unique values from one column into column headers and displaying values from another column under these headers.

Syntax:

SELECT *
FROM crosstab(
  'source_query',
  'category_query'
) AS ct(row_name data_type, col1_name data_type, col2_name data_type, ...);

Let us create a table, students_grades, for understanding Crosstab()

Example:

--Create a table
CREATE TABLE student_grades( student_id INT 
,subject VARCHAR(50),
 grade INT );
--Insert data
INSERT INTO student_grades (student_id, subject, grade) VALUES
    (1, 'Math', 80),
    (1, 'Science', 80),
    (2, 'Math', 78),
    (2, 'Science', 92),
    (3, 'Math', 65),
    (3, 'English', 85),
    (1, 'English', 70);
------Query to display table
SELECT * from student_grades;

Output:

This will be output after creating the required table in PostgreSQL.

Master Crosstab in PostgreSQL – Boost Your Data Skills!
Enroll now and transform your future!
quiz-icon

We need to enable the Tablefunc Extension for crosstab()

---Enable Tablefunc Extension
CREATE EXTENSION tablefunc;
--Cross tab query
SELECT *
FROM crosstab(
    'SELECT student_id, subject, grade FROM student_grades ORDER BY 1,2',
    'SELECT DISTINCT subject FROM student_grades ORDER BY 1'
) AS ct(student_id INT, "English" INT, "Math" INT, "Science" INT);

Output:

Explanation: This SQL Crosstab() function transforms a table of student grades. It pivots the student_grades table using distinct subjects as column headers (“ENGLISH”,” MATH”, “SCIENCE”) and student IDs as row identifiers. 

How does Crosstab() operate in PostgreSQL?

The first query is the source query, which obtains the raw data and makes sure it is arranged correctly. The second query is a category that establishes distinct values that will be used as the headers for the columns. After that, the data is transformed by the crosstab() function, which produces a table with rows determined by the source query’s first column and columns determined by the category query.

Handling Null Values and Missing Data

When you use the crosstab() to transpose your data, it automatically fills the blanks in the rows and columns, but NULL values are not ideal for representing a value. To correct these issues, PostgreSQL supplies a COALESCE() function that replaces NULL values with a default value you specify.

Example:

SELECT student_id, COALESCE("English", 0) AS "English", COALESCE("Math", 0) AS "Math", COALESCE("Science", 0) AS "Science"
FROM crosstab(
    'SELECT student_id, subject, grade FROM student_grades ORDER BY 1,2',
    'SELECT DISTINCT subject FROM student_grades ORDER BY 1'
) AS ct(student_id INT, "English" INT, "Math" INT, "Science" INT);

Output:

Explanation: Here, in this SQL Query, the COALESCE function is used to replace any null grades with 0 and subsequently pivot the student_grade table with the student ID and the corresponding grades in subjects like English, Math, and Science.

Using Aggregate Functions in Crosstab()

Aggregate Function in Crosstab() combines multiple values of a table into a single value using SUM(), AVG(), COUNT(), MIN(), MAX(), and many other commonly used functions.

Using AVG() Aggregate Function in PostgreSQL

The AVG() function in PostgreSQL is used to calculate the mean or average value of a dataset.

For example:

SELECT 'Average Grade' as student_id, AVG("English") as "English", AVG("Math") as "Math", AVG("Science") as "Science"
FROM crosstab(
    'SELECT student_id, subject, grade FROM student_grades ORDER BY 1,2',
    'SELECT DISTINCT subject FROM student_grades ORDER BY 1'
) AS ct(student_id INT, "English" INT, "Math" INT, "Science" INT);

Output:

Explanation: Here, the AVG() aggregate function is used to calculate the average grade for each subject from the Student_grades table.

Using SUM() Aggregate Function in PostgreSQL

The SUM() function in PostgreSQL is used to calculate the total sum of numeric columns in the dataset.

Example:

SELECT student_id, SUM("English") as "English", SUM("Math") as "Math", SUM("Science") as "Science"
FROM crosstab(
    'SELECT student_id, subject, grade FROM student_grades ORDER BY 1,2',
    'SELECT DISTINCT subject FROM student_grades ORDER BY 1'
) AS ct(student_id INT, "English" INT, "Math" INT, "Science" INT)
GROUP BY student_id;

Output:

Explanation: Here, this query calculates the total grade for each subject. It combines the results by student_id, which provides the sum of marks for each student across all subjects.

Get 100% Hike!

Master Most in Demand Skills Now!

Using the MAX() Aggregate Function in PostgreSQL

The MAX() function in PostgreSQL is an aggregate function that returns the largest value from specified values.

Example: 

--  Find the highest grade for each subject
SELECT subject, MAX(grade) AS max_grade
FROM student_grades
GROUP BY subject;

Output:

Explanation: Here, this query groups the student_grades table by subject and uses MAX(grade) to find the highest marks achieved in each subject.

Using the MIN() Aggregate Function in PostgreSQL

The MIN() function in PostgreSQL is an aggregate function that returns the lowest value in a column.

Example:

SELECT student_id,
       MIN(grade) AS min_grade
FROM student_grades
GROUP BY student_id
ORDER BY student_id;

Output:

Explanation: Here, this query finds the lowest grade each student received and displays it with their respective student ID.

Example:

SELECT *
FROM crosstab(
    'SELECT student_id, subject, grade FROM student_grades ORDER BY 1,2',
    'SELECT DISTINCT subject FROM student_grades ORDER BY 1'
) AS ct(student_id INT, "English" INT, "Math" INT, "Science" INT)
WHERE "Math" > 80
ORDER BY "Science" DESC;

Output:

Explanation: In this case, this query pivots the student_grades table and takes the subjects (English, Math, Science) as the columns and the student IDs as the rows, filters for student ids with Math Marks greater than or equal to 80, and sorts by Science grade descending.

Performance Consideration

  • Source Query Efficiency: For improving the performance of the source query, use correct indexes.
  • Data Volume: Working on a large dataset may cause the consumption of resources
  • Indexes: Make sure the columns used in the ORDER_BY clauses of your sources and categories are indexed perfectly.
  • Aggregation Optimization: Check the performance of the aggregation functions used in the source query.
  • Connection Pooling: Connection pooling can help to increase performance when crosstab() is used with a lot of users.

Common Crosstab() Problem and Troubleshoot

Incorrect ordering

If you got undesirable output from crosstab(), the source query’s ORDER BY clause is where things may have gone wrong, so check the ORDER BY clause to ascertain that the row and column layout that you expect is presented.

If you want to fix the problem, just check out the source query output, as improper ordering is misaligning your data during the crosstab transformation.

Mismatches in Data Type

Incorrect crosstab() output can arise from mismatched data types declared, so to avoid this problem, verify the source column with d table_name, and you can also use explicit typecasting.

The tablefunc extension is missing

To troubleshoot “function does not exist” errors when using crosstab(), verify the tablefunc extension is enabled and execute CREATE EXTENSION tablefunc; if it is missing, confirm its installation with dx tablefunc.

Start Your SQL Adventure!
Start Your SQL Journey for Free Today
quiz-icon

Conclusion

Crosstabs are effective tools for associating and transforming data in PostgreSQL. If you understand the basics and practice examples, you will gain insights from your data. You know you can examine crosstabs for patterns, relationships, and changes in raw data. In addition to finding insights and solutions, crosstabs provide a meaningful report to make informed choices.

To learn more about SQL functions, check out this SQL course and also explore SQL Interview Questions prepared by industry experts.

Crosstab() Query in PostgreSQL – FAQs

Q1. Can I use crosstab() with dynamic columns?

Yes, you can use crosstab() with dynamic columns.

Q2. Are crosstabs performant with large datasets?

Yes, the Performance of crosstabs depends on the complexity and size of the data.  

Q3. Can I use multiple aggregation functions within one crosstab()?

Yes, you can use multiple aggregation functions within one crosstab, but more SQL queries and multiple crosstab() operations are require

Q4. What is the function of a crosstab() query?

A Crosstab() query uses a function to calculate data that is grouped by two types of information, one down the left side of the datasheet and another across the top.  

Q5. Can I use crosstab() with complex queries?

Yes, crosstab() with complex queries can be used. For example, you can apply JOIN and GROUP BY to your data.