When you are working with a large dataset, RESTful APIs mostly paginate responses to avoid overloading the system with a huge load of data in a single request. Power Query in Power BI can be an excellent tool to query and process such paginated API responses. Pagination makes sure that data is delivered in controlled pieces with the help of parameters like page, offset, or limit. In this blog, let us explore how paginated API responses with Power Query in Power BI are handled.
Table of Contents:
What is a Paginated Response?
A paginated response is a technique used by APIs to split large datasets into smaller ones. The API does not return all data in a single response; it returns a limited set of results along with metadata of the next set of results so that you can fetch more data if needed.
The pagination metadata includes:
- Current page number
- Total pages
- Next page URL
- Performance Optimization: Large datasets in a table are maintained by pagination, as it prevents slow performance that occurs while handling large data in one request.
- Managing Server Load: By dividing large data into smaller chunks, the server can manage requests more easily, as it reduces the risk of the system being overloaded.
- Improved User Experience: Data pagination allows faster loading times and smoother user interaction, which improves user experience.
- Memory Efficiency: Pagination helps to reduce memory usage by loading smaller sets of data at a time.
- Reduced Network Traffic: Paginated responses minimize the amount of data transferred in a single request, which reduces network usage and improves response time
Querying RESTful APIs with Power Query
Power Query is a powerful tool as it allows you to get the data, transform it, and clean that data from a variety of sources. The most common use of Power Query is querying RESTful APIs. When you have a large dataset, these APIs return data in a paginated format. Power BI can handle such a paginated response very well by allowing you to process large amounts of data.
Custom Power Query Paging Function
Power Query allows us to create a custom function to handle dynamic pagination. This function will retrieve data from the API page by page.
To understand this function step-by-step, we will consider an API example from the U.S. Treasury Reporting Rates of Exchange.
Creating The GetPage Function
We will create a custom function named GetPage that takes a page number as an argument and fetches the data for that page.
1. Open Power Query Editor: Click on the “Home” tab and then click “Transform Data” to open Power Query Editor.
2. Create a Blank Query: Go to the “Home” tab and then click “New Source”>”Blank Query”.
3. Open Advanced Editor: Click on the “Advanced Editor” in the home tab to write custom code.
4. Write the code: Paste the following code on the tab that will appear after clicking on Advanced Editor and name it GetPage.
(page as text) =>
let
Source = try
Json.Document(
Web.Contents(
"https://api.fiscaldata.treasury.gov/services/api/fiscal_service/v1/accounting/od/rates_of_exchange?page[number]="
& page
& "&fields=country_currency_desc,exchange_rate,record_date"
)
)
otherwise
[data = null, meta = null, links = null],
#"Converted to Table" = Table.FromRecords({Source}),
#"Removed Columns" = Table.RemoveColumns(#"Converted to Table", {"meta", "links"}),
#"Expanded data List" = Table.ExpandListColumn(#"Removed Columns", "data"),
#"Expanded data Records" = Table.ExpandRecordColumn(
#"Expanded data List",
"data",
{"country_currency_desc", "exchange_rate", "record_date"},
{"data.country_currency_desc", "data.exchange_rate", "data.record_date"}
)
in
#"Expanded data Records"
This GetPage function will fetch data from a given page number.
Methods to Retrieve Pages
Now, we have a function called GetPage, and we can retrieve all pages by iterating through the pages.
Method 1: Page Count is Known
The API returns the total number of pages. Power Query will loop through all pages and fetch data.
Steps to retrieve all pages:
- Open the Power Query Editor
- Create a Blank Query
- Paste the following code in the Advanced Editor
Code for Method 1:
let
Source = try
Json.Document(
Web.Contents(
"https://api.fiscaldata.treasury.gov/services/api/fiscal_service/v1/accounting/od/rates_of_exchange?fields=country_currency_desc,exchange_rate,record_date"
)
)
otherwise
[data = null, meta = null, links = null],
#"Page List" =
if Source = null or Source[meta] = null then
{null}
else
{1 .. Source[meta][#"total-pages"]},
#"Converted to Table" = Table.FromList(
#"Page List",
Splitter.SplitByNothing(),
null,
null,
ExtraValues.Error
),
#"Renamed Columns" = Table.RenameColumns(#"Converted to Table", {{"Column1", "PAGES"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns", {{"PAGES", type text}}),
#"Invoked Custom Function" = Table.AddColumn(
#"Changed Type",
"GetPage",
each if [PAGES] <> null then GetPage([PAGES]) else null
),
#"Expanded GetPage" = Table.ExpandTableColumn(
#"Invoked Custom Function",
"GetPage",
{"data.country_currency_desc", "data.exchange_rate", "data.record_date"},
{"GetPage.data.country_currency_desc", "GetPage.data.exchange_rate", "GetPage.data.record_date"}
)
in
#"Expanded GetPage"
After running the query, you will get a table having all the data from both tables.
Output:
Method 2: Page Count is Unknown; Entry Count is Known
If the total number of entries and entries per page are given but the total number of pages is not provided, then we can calculate the number of pages.
let
Source = try
Json.Document(
Web.Contents(
"https://api.fiscaldata.treasury.gov/services/api/fiscal_service/v1/accounting/od/rates_of_exchange?fields=country_currency_desc,exchange_rate,record_date"
)
)
otherwise
[data = null, meta = null, links = null],
#"Page List" =
if Source = null or Source[meta] = null then
{null}
else
{1 .. Number.RoundUp(Source[meta][#"total-count"] / Source[meta][#"count"])},
#"Converted to Table" = Table.FromList(
#"Page List",
Splitter.SplitByNothing(),
null,
null,
ExtraValues.Error
),
#"Renamed Columns" = Table.RenameColumns(#"Converted to Table", {{"Column1", "PAGES"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns", {{"PAGES", type text}}),
#"Invoked Custom Function" = Table.AddColumn(
#"Changed Type",
"GetPage",
each if [PAGES] <> null then GetPage([PAGES]) else null
),
#"Expanded GetPage" = Table.ExpandTableColumn(
#"Invoked Custom Function",
"GetPage",
{"data.country_currency_desc", "data.exchange_rate", "data.record_date"},
{"GetPage.data.country_currency_desc", "GetPage.data.exchange_rate", "GetPage.data.record_date"}
)
in
#"Expanded GetPage"
Output:
Method 3: Both Page Count and Entry Count are Unknown
In this case, neither the total number of pages nor entries is provided, so we will use List. Generate to fetch pages until no data is returned.
let
Source = List.Generate(
() => [PAGE = 1, RESULTS = GetPage("1")],
each Table.RowCount(
Table.SelectRows(_[RESULTS], each not List.Contains(Record.ToList(_), null))
)
> 0,
each [PAGE = _[PAGE] + 1, RESULTS = GetPage(Number.ToText([PAGE] + 1))],
each _[[PAGE], [RESULTS]]
),
#"Converted to Table" = Table.FromList(
Source,
Splitter.SplitByNothing(),
null,
null,
ExtraValues.Error
),
#"Expanded Column" = Table.ExpandRecordColumn(
#"Converted to Table",
"Column1",
{"PAGE", "RESULTS"},
{"PAGES", "GetPage"}
),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Column", {{"PAGES", type text}}),
#"Expanded GetPage" = Table.ExpandTableColumn(
#"Changed Type",
"GetPage",
{"data.country_currency_desc", "data.exchange_rate", "data.record_date"},
{"GetPage.data.country_currency_desc", "GetPage.data.exchange_rate", "GetPage.data.record_date"}
)
in
#"Expanded GetPage"
Output:
Method |
Performance |
Efficiency |
Best for |
Method 1: Page Count is known | Fastest Performance. | Directly uses total-pages metadata for looping. | Best for APIs that provide the total pages. |
Method 2: Entry Count is known | Performance is slightly slower than Method 1. | Calculates total pages using total-count and count. | Best for APIs that provide the total count but not the total pages. |
Method 3: Both Counts are Unknown | Performance is Slowest. | Dynamically generates pages, checking for data presence. | Best for APIs that do not provide a total count or total pages. |
Best Practices
- Check Available Metadata: Check the API’s documentation to check if metadata is present.
- Handle API Limits and Errors: Make sure you implement error handling and retries in Power Query if your API has limits. You can use try… otherwise expressions to catch and handle errors.
- Optimize Performance for Queries: Request larger pages to reduce the total number of requests in order to minimize the number of API calls. Performance test when scaling up larger datasets.
- Caching response in API: When you are using an API that does not change frequently, consider caching data locally or storing it in a database to minimize requests.
Conclusion
Power Query offers several solutions while handling paginated data, each solution having its benefits. Method 1 (Page Count is Known) is the most efficient when the page count is available, while Method 2 (Page Count is unknown and Entry Count is known) works well when you only have the total entry count. Method 3 ( Both Counts are Unknown) is the slowest and should be used when no metadata is available.
To learn more about Power BI and its functions, check out this Power BI Course and also explore Power BI Interview Questions prepared by industry experts.
Handling Paginated REST API Data – FAQs
1. What is paginated data?
Paginated data means dividing the larger datasets into smaller ones in order to avoid the rush at the server or client side.
2. Why do APIs paginate responses?
APIs paginate responses to manage large datasets efficiently and reduce server load by returning data in smaller parts.
3. Can Power Query automatically retrieve all pages from an API?
Yes, Power Query automatically retrieves all pages from an API by using custom functions and looping methods.
4. What is the 'GetPage' function in Power Query?
The “GetPage” is a custom Power Query function used to fetch individual pages of data.
5. What happens if the page count is unknown?
If the page count is unknown, then you can calculate the number of pages based on the total entries.