Scraping Data from a website which uses Power BI - retrieving data from Power BI on a website

Question

asked Jul 16, 2019 in BI by Vaibhav Ameta (17.6k points)

I want to scrap data from this page (and pages similar to it): https://cereals.ahdb.org.uk/market-data-centre/historical-data/feed-ingredients.aspx

This page uses Power BI. Unfortunately, finding a way to scrap Power BI is hard, because everyone wants to scrap using/into Power BI, not from it. The closest answer was this question. Yet unrelated.

Firstly, I used Apache tika, and soon I realized the table data is been loading after loading the page. I need the rendered version of the page.

Therefore, I used Selenium. I wanted to Select All at the begining (sending Ctrl+A key combination), but it doesn't work. Maybe it is restricted by the page events (I also tried to remove all the events using developer tools, yet still Ctrl+A doesn't work.

I also tried to read the HTML contents, but Power BI puts div elements on the screen using position:absolute and distinguishing the location of a div in the table (both row and column) is an effortful activity.

Since Power BI uses JSON, I tried to read data from there. However, it is so complicated I couldn't find out the rules. It seems it puts keywords somewhere and uses their indices in the table.

Note: I realized that all of the data is not loaded and even shown at the same time. A div of class scroll-bar-part-bar is responsible to act as a scroll bar and moving that loads/show other parts of the data.

The code I used to read data is as follows. As mentioned, the order of the produced data differs from what is rendered on the browser:

from selenium import webdriver from selenium.webdriver.common.keys import Keys options = webdriver.ChromeOptions() options.binary_location = "C:/Program Files (x86)/Google/Chrome/Application/chrome.exe" driver = webdriver.Chrome(options=options, executable_path="C:/Drivers/chromedriver.exe") driver.get("https://app.powerbi.com/view?r=eyJrIjoiYjVjM2MyNjItZDE1Mi00OWI1LWE5YWYtODY4M2FhYjU4ZDU1IiwidCI6ImExMmNlNTRiLTNkM2QtNDM0Ni05NWVmLWZmMTNjYTVkZDQ3ZCJ9") parent = driver.find_element_by_xpath('//*[@id="pvExplorationHost"]/div/div/div/div[2]/div/div[2]/div[2]/visual-container[4]/div/div[3]/visual/div') children = parent.find_elements_by_xpath('.//*') values = [child.get_attribute('title') for child in children]

I appreciate solutions for any of the above problems. The most interesting for me though is the convention of storing Power BI data in JSON format.

1 Answer

Related questions

0 votes

1 answer

Android Web Scraping with a Headless Browser

asked Jul 18, 2019 in DevOps and Agile by Han Zhyang (19.7k points)

0 votes

0 answers

Scraping dynamic website

asked Apr 27, 2021 in R Programming by Aytan (160 points)

0 votes

1 answer

Unable to call Firefox from selenium in python on AWS machine

asked Jul 16, 2019 in DevOps and Agile by Han Zhyang (19.7k points)

0 votes

1 answer

Is Python good for web scraping?

asked Nov 12, 2020 in Python by ashely (50.2k points)

0 votes

1 answer

What is Web scraping in Python?

asked Nov 12, 2020 in Python by ashely (50.2k points)

Ashok · Answer 1 · 2019-07-17T06:08:39+0000

Putting the scroll part and the JSON aside, I managed to read the data. The key is to read all of the elements inside the parent (which is done in the question):

parent = driver.find_element_by_xpath('//*[@id="pvExplorationHost"]/div/div/div/div[2]/div/div[2]/div[2]/visual-container[4]/div/div[3]/visual/div')
children = parent.find_elements_by_xpath('.//*')

Then sort them using their location:

x = [child.location['x'] for child in children]
y = [child.location['y'] for child in children]
index = np.lexsort((x,y))

To sort what we have read in different lines, this code may help:

rows = []
row = []
last_line = y[index[0]]
for i in index:
    if last_line != y[i]:
        row.append[children[i].get_attribute('title')]
    else:
        rows.append(row)
        row = list([children[i].get_attribute('title')]
rows.append(row)

Want to learn more about Power BI, Come & join Power BI tutorial. Also, enroll in Power BI training to become proficient in this BI tool.

Scraping Data from a website which uses Power BI - retrieving data from Power BI on a website

1 Answer

Related questions

Browse Categories