When you are working with a large dataset, understanding pagination in REST APIs is important to avoid overloading the system with a huge load of data in a single request. Power Query in Power BI can be an excellent tool to query and process such paginated API responses, making REST API pagination in Power BI seamless and efficient. Pagination is essential when handling large datasets in Power BI to avoid performance issues and memory overload with the help of parameters like page, offset, or limit. In this blog, let us explore how paginated API responses with Power Query in Power BI are handled.
Table of Contents:
A paginated response is a technique used by APIs to split large datasets into smaller ones. The API does not return all data in a single response; it returns a limited set of results along with metadata of the next set of results so that you can fetch more data if needed.
The pagination metadata includes:
- Current page number
- Total pages
- Next page URL
- Performance Optimization: Large datasets in a table are maintained by pagination, as it prevents slow performance that occurs while handling large data in one request.
- Managing Server Load: By dividing large data into smaller chunks, the server can manage requests more easily, as it reduces the risk of the system being overloaded.
- Improved User Experience: Data pagination allows faster loading times and smoother user interaction, which improves user experience.
- Memory Efficiency: Pagination helps to reduce memory usage by loading smaller sets of data at a time.
- Reduced Network Traffic: Paginated responses minimize the amount of data transferred in a single request, which reduces network usage and improves response time
Querying RESTful APIs with Power Query
Power Query API integration allows you to fetch, transform, and clean the data from paginated APIs directly into Power BI. The most common use of Power Query is querying RESTful APIs. When you have a large dataset, these APIs return data in a paginated format. Power BI can handle such a paginated response very well by allowing you to process large amounts of data.
Custom Power Query Paging Function
Using Power Query, you can handle a paginated REST API Power Query response efficiently by creating a custom paging function.
To understand this function step-by-step, we will consider an API example from the U.S. Treasury Reporting Rates of Exchange.
Creating The GetPage Function
We will create a custom function named GetPage that takes a page number as an argument and fetches the data for that page.
1. Open Power Query Editor: Click on the “Home” tab and then click “Transform Data” to open Power Query Editor.
2. Create a Blank Query: Go to the “Home” tab and then click “New Source”> “Blank Query”.
3. Open Advanced Editor: Click on the “Advanced Editor” in the home tab to write custom code.
4. Write the code: Paste the following code on the tab that will appear after clicking on Advanced Editor and name it GetPage.
(page as text) =>
let
Source = try
Json.Document(
Web.Contents(
"https://api.fiscaldata.treasury.gov/services/api/fiscal_service/v1/accounting/od/rates_of_exchange?page[number]="
& page
& "&fields=country_currency_desc,exchange_rate,record_date"
)
)
otherwise
[data = null, meta = null, links = null],
#"Converted to Table" = Table.FromRecords({Source}),
#"Removed Columns" = Table.RemoveColumns(#"Converted to Table", {"meta", "links"}),
#"Expanded data List" = Table.ExpandListColumn(#"Removed Columns", "data"),
#"Expanded data Records" = Table.ExpandRecordColumn(
#"Expanded data List",
"data",
{"country_currency_desc", "exchange_rate", "record_date"},
{"data.country_currency_desc", "data.exchange_rate", "data.record_date"}
)
in
#"Expanded data Records"
This GetPage function helps you fetch paginated data, Power BI style, page by page, ensuring no records are missed.
Here are some API pagination examples that will help you fetch all pages from a REST API using Power Query.
Methods to Retrieve Pages
Now, we have a function called GetPage, and we can retrieve all pages by iterating through the pages.
Method 1: Page Count is Known
To loop through API pages, Power Query provides several methods depending on the metadata available from the API.
Steps to retrieve all pages:
- Open the Power Query Editor
- Create a Blank Query
- Paste the following code in the Advanced Editor
Code for Method 1:
let
Source = try
Json.Document(
Web.Contents(
"https://api.fiscaldata.treasury.gov/services/api/fiscal_service/v1/accounting/od/rates_of_exchange?fields=country_currency_desc,exchange_rate,record_date"
)
)
otherwise
[data = null, meta = null, links = null],
#"Page List" =
if Source = null or Source[meta] = null then
{null}
else
{1 .. Source[meta][#"total-pages"]},
#"Converted to Table" = Table.FromList(
#"Page List",
Splitter.SplitByNothing(),
null,
null,
ExtraValues.Error
),
#"Renamed Columns" = Table.RenameColumns(#"Converted to Table", {{"Column1", "PAGES"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns", {{"PAGES", type text}}),
#"Invoked Custom Function" = Table.AddColumn(
#"Changed Type",
"GetPage",
each if [PAGES] <> null then GetPage([PAGES]) else null
),
#"Expanded GetPage" = Table.ExpandTableColumn(
#"Invoked Custom Function",
"GetPage",
{"data.country_currency_desc", "data.exchange_rate", "data.record_date"},
{"GetPage.data.country_currency_desc", "GetPage.data.exchange_rate", "GetPage.data.record_date"}
)
in
#"Expanded GetPage"
After running the query, you will get a table containing all the data from both tables.
Output:
Method 2: Page Count is Unknown; Entry Count is Known
If the total number of entries and entries per page are given but the total number of pages is not provided, then we can calculate the number of pages.
let
Source = try
Json.Document(
Web.Contents(
"https://api.fiscaldata.treasury.gov/services/api/fiscal_service/v1/accounting/od/rates_of_exchange?fields=country_currency_desc,exchange_rate,record_date"
)
)
otherwise
[data = null, meta = null, links = null],
#"Page List" =
if Source = null or Source[meta] = null then
{null}
else
{1 .. Number.RoundUp(Source[meta][#"total-count"] / Source[meta][#"count"])},
#"Converted to Table" = Table.FromList(
#"Page List",
Splitter.SplitByNothing(),
null,
null,
ExtraValues.Error
),
#"Renamed Columns" = Table.RenameColumns(#"Converted to Table", {{"Column1", "PAGES"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns", {{"PAGES", type text}}),
#"Invoked Custom Function" = Table.AddColumn(
#"Changed Type",
"GetPage",
each if [PAGES] <> null then GetPage([PAGES]) else null
),
#"Expanded GetPage" = Table.ExpandTableColumn(
#"Invoked Custom Function",
"GetPage",
{"data.country_currency_desc", "data.exchange_rate", "data.record_date"},
{"GetPage.data.country_currency_desc", "GetPage.data.exchange_rate", "GetPage.data.record_date"}
)
in
#"Expanded GetPage"
Output:
Method 3: Both Page Count and Entry Count are Unknown
In this case, neither the total number of pages nor entries is provided, so we will use List. Generate to fetch pages until no data is returned.
let
Source = List.Generate(
() => [PAGE = 1, RESULTS = GetPage("1")],
each Table.RowCount(
Table.SelectRows(_[RESULTS], each not List.Contains(Record.ToList(_), null))
)
> 0,
each [PAGE = _[PAGE] + 1, RESULTS = GetPage(Number.ToText([PAGE] + 1))],
each _[[PAGE], [RESULTS]]
),
#"Converted to Table" = Table.FromList(
Source,
Splitter.SplitByNothing(),
null,
null,
ExtraValues.Error
),
#"Expanded Column" = Table.ExpandRecordColumn(
#"Converted to Table",
"Column1",
{"PAGE", "RESULTS"},
{"PAGES", "GetPage"}
),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Column", {{"PAGES", type text}}),
#"Expanded GetPage" = Table.ExpandTableColumn(
#"Changed Type",
"GetPage",
{"data.country_currency_desc", "data.exchange_rate", "data.record_date"},
{"GetPage.data.country_currency_desc", "GetPage.data.exchange_rate", "GetPage.data.record_date"}
)
in
#"Expanded GetPage"
Output:
|
Method |
Performance |
Efficiency |
Best for |
| Method 1: Page Count is known | Fastest Performance. | Directly uses total-pages metadata for looping. | Best for APIs that provide the total pages. |
| Method 2: Entry Count is known | Performance is slightly slower than Method 1. | Calculates total pages using total-count and count. | Best for APIs that provide the total count but not the total pages. |
| Method 3: Both Counts are Unknown | Performance is Slowest. | Dynamically generates pages, checking for data presence. | Best for APIs that do not provide a total count or total pages. |
Best Practices
- Check Available Metadata: Check the API’s documentation to check if metadata is present.
- Handle API Limits and Errors: Make sure you implement error handling and retries in Power Query if your API has limits. You can use try… otherwise expressions to catch and handle errors.
- Optimize Performance for Queries: Request larger pages to reduce the total number of requests in order to minimize the number of API calls. Performance test when scaling up larger datasets.
- Caching response in API: When you are using an API that does not change frequently, consider caching data locally or storing it in a database to minimize requests.
Conclusion
Power Query offers several solutions while handling paginated data, each solution having its benefits. Method 1 (Page Count is Known) is the most efficient when the page count is available, while Method 2 (Page Count is unknown and Entry Count is known) works well when you only have the total entry count. Method 3 ( Both Counts are Unknown) is the slowest and should be used when no metadata is available.
To learn more about Power BI and its functions, check out this Power BI Course and also explore Power BI Interview Questions prepared by industry experts.
Handling Paginated REST API Data – FAQs
1. What is paginated data?
Paginated data means dividing the larger datasets into smaller ones in order to avoid the rush at the server or client side.
2. Why do APIs paginate responses?
APIs paginate responses to manage large datasets efficiently and reduce server load by returning data in smaller parts.
3. Can Power Query automatically retrieve all pages from an API?
Yes, Power Query automatically retrieves all pages from an API by using custom functions and looping methods.
4. What is the 'GetPage' function in Power Query?
The “GetPage” is a custom Power Query function used to fetch individual pages of data.
5. What happens if the page count is unknown?
If the page count is unknown, then you can calculate the number of pages based on the total entries.