Power BI RANKX vs SQL ROW_NUMBER

Power BI RANKX vs SQL ROW_NUMBER

When working with large datasets, assigning row numbers in specific categories is very important. In SQL, this is handled using ROW_NUMBER(). However, in Power BI, you can achieve similar results using DAX functions like RANKX(). However, the syntax and working of both differ from each other. Understanding these differences is crucial when switching from SQL to Power BI. In this blog, let us understand the methods of both SQL and Power BI and explore the differences between them.

Table of Contents:

What is Indexing in Power BI?

Indexing is a process in Power BI that refers to assigning unique numbers to rows within categories. This function is useful when you have to rank employees by salary within departments, identifying top-performing products across regions. In SQL, this is done using ROW_NUMBER() with PARTITION BY, while in Power BI, the RANKX() function is commonly used to achieve a similar result.

Advantages of Indexing in Power BI

  • Indexing helps to organise data into groups.
  • Indexing helps in making reports maintainable.
  • Indexing helps to detect patterns in data.
  • When you are working with large datasets, indexing helps improve performance.
  • Indexing is helpful when you need to find data in reports.

Why is RANKX used in Power BI?

RANKX is used in Power BI to assign a rank to data based on a specific measure or value. For example, it helps to rank employees by their salary or products by sales. This makes it easy to see who is performing the best or where trends are going. RANKX is helpful in organizing data and showing comparisons, making it easier to understand performance or ranking within different categories.

Power BI Approach Using DAX

In Power BI, you can achieve the same using RANKX()

Step 1: Load Data into Power BI

Load Data into Power BI

This is how the data looks after loading it in Power BI.

Step 2: Write DAX Formula

Go to “Data View,” click “New Column”, and run the following DAX formula.

RowNum = 

RANKX(

FILTER(EmployeeData, EmployeeData[Department] = EARLIER(EmployeeData[Department])),

    EmployeeData[Salary],

    ,

    ASC

)

Output:

Write DAX Formula

Explanation:

  • FILTER() is used to divide data by department.
  • EARLIER() refers to the current row’s department.
  • RANKX() is used to rank employees based on salary.
  • ASC is used to sort in ascending order.

SQL Approach Using ROW_NUMBER()

In SQL, ROW_NUMBER() is used to assign a unique row number to each row within a partition of a result set. You can use the PARTITION BY clause to divide the data into groups and ORDER BY to sort within those groups, similar to Power BI’s RANKX() function.

First, let us create the sample table in SQL called EmployeeData so that we can understand ROW_NUMBER().

—Creating table

CREATE TABLE EmployeeData (

    EmployeeID INT PRIMARY KEY,

    Name VARCHAR(50),

    Department VARCHAR(50),

    Salary INT

);

---Inserting Data into it

INSERT INTO EmployeeData (EmployeeID, Name, Department, Salary)

VALUES

(1, 'John Doe', 'Sales', 60000),

(2, 'Jane Smith', 'Sales', 70000),

(3, 'Alice Lee', 'HR', 50000),

(4, 'Bob Brown', 'HR', 55000),

(5, 'Charlie Z', 'Sales', 72000);

---Displaying result

SELECT * from EmployeeData;
output inserting data

Here is  how the table looks after inserting data

–Generate Index by Category using ROW_NUMBER()

SELECT 

    EmployeeID,

    Name,

    Department,

    Salary,

    ROW_NUMBER() OVER (PARTITION BY Department ORDER BY Salary ASC) AS RowNum

FROM EmployeeData;

Output:

Generate Index by Category using ROW_NUMBER

Explanation:

  • PARTITION BY DEPARTMENT is used to divide the data by department.
  • ORDER BY Salary ASC is used to sort salaries in ascending order.
  • ROW_NUMBER() is used to generate a unique number for each employee within the department.

Difference Between ROW_NUMBER in SQL and RANKX in Power BI

FeaturePower BI(RANKX)SQL (ROW_NUMBER)
PartitioningUses FILTER + EARLIER for partitioning.Uses PARTITION BY to group data.
Row Number GenerationRANKX by default returns unique ranks unless specified with DENSE, which gives tied ranks the same number.Uses ROW_NUMBER() to generate unique row numbers.
SortingSorting is defined within the RANKX() function.Sorting is defined using ORDER BY.
Performance on Large DataEfficient with optimised data models, but may slow down with large datasets without optimisation.Efficient with indexed tables and optimised queries.

Real-world Example

Sales Performance Analysis by Region: To rank sales representatives within each region using Power BI, we can use the RANKX() function.

Step 1: Load the dataset into Power BI

Write a DAX function

This is how data looks in Power BI.

Step 2: Write a DAX function

Sales Rank = 

RANKX(

    FILTER(Sales, Sales[Region] = EARLIER(Sales[Region])),

    Sales[TotalSales],

    ,

    DESC,

    DENSE

)

Output:

 Write a DAX function

Explanation:

  • Filter() makes sure that the ranking is applied within the same region.
  • Sales[TotalSales] is used to determine the rank.
  • DESC ranks the highest sales first.
  • DENSE is used to avoid gaps in numbers.

Troubleshooting Common Issues

When working with SQL and Power BI for data analysis, various issues can arise:

1. Incorrect Indexing

Incorrect indexing can slow down query performance, which causes inaccurate results when using ranking functions. To avoid this problem, make sure that indexes are applied on the columns used in ORDER BY and PARTITION BY statements. In Power BI, check the data model for unnecessary fields and reduce data load by applying filters at the data sources.

2. Performance Issues

When using large datasets, poor performance may occur. To avoid this problem, try to create indexes on frequently used columns in SQL, which improves readability.

3. Handling Ties

Ties in data are a condition in which records have the same value, which leads to poor ranking. But this can be resolved by implementing secondary sorting using additional fields. You can use the RANKX() function in Power BI to break ties in data

4. Data Mismatch

When you have a large table consisting of multiple columns, data mismatch can arise, which is caused by improper data types, leading to errors in results. To avoid this, you can check the data types of columns in which you are performing a join. 

Which Approach is Best?

  • When you are working on interactive reports and dashboards, it is preferred to use the RANKX() function in Power BI, which offers flexibility and is ideal for dynamic visualisations.
  • If your data is inside the database and requires complex transformation, then you can use ROW_NUMBER() in SQL.
  • For mixed use, you can perform complex calculations in SQL and import the results into Power BI for visualisations.

Best Practices

  • Make sure that your data is sorted to maintain good results.
  • Use indexing when you are working with large data.
  • Regularly try to update and maintain indexes to optimise performance.
  • Test your queries on sample data before using them on real data.

Conclusion

Power BI, as well as SQL, can provide distinct advantages based on the need. SQL is best for server-side manipulation of data, especially data that is large in databases. SQL provides functions, such as RANK(), DENSE_RANK(), and ROW_NUMBER(), that will provide good ranking results. Power BI provides the RANKX() functions, which are very flexible and provide many measures to create dynamic and interactive dashboards. Each tool has advantages based on the size of the data and the reporting method.

To learn more about Power BI and its functions, check out this Power BI Course and also explore Power BI Interview Questions prepared by industry experts.

Power BI DAX Function – RANK and ROW_NUMBER() – FAQs

Data Analytics for Business