From a technical perspective, how does Selenium click an element on a web page?

Question

asked Jul 26, 2019 in DevOps and Agile by Han Zhyang (19.7k points)

Problem Context

I am spearheading the development of a test automation framework for a web application which uses Web Components. This has presented a problem when testing in Internet Explorer, because Internet Explorer does not support Web Components natively; instead, a polyfill is used to provide this functionality.

A primary repercussion of this is that much of Selenium will not work as expected. It cannot 'see' the Shadow DOM in Internet Explorer the way it can in Firefox and Chrome.

The alternative is to write a test framework which provides an alternate mechanism for accessing elements via JavaScript - this allows elements to be located through the polyfill.

Our current implementation checks the WebDriver being used, and either uses the original Selenium implementation of a method (in the case of Chrome or Firefox), or our own alternative implementation (in the case of Internet Explorer).

This means that we want our implementation to be as close as possible to Selenium's implementation, at its core, browser-interaction, level.

Problem

I am trying to replicate the functionality of Actions.click(WebElement onElement) (source), in a simplified form (without following the Builder design pattern of the Actions class, and making assumptions that the click is with the left mouse button and no other keys (Ctrl, Shift, Alt) are being held down).

I want to find the core code which handles the click does (specifically in Chrome, Firefox, and Internet Explorer), so I can replicate it as closely as possible, however, I've found myself lost in a deep pit of classes and interfaces...

A new ClickAction (source) is created (to later be performed). Performing this includes a 'click()' call on an instance of the Mouse interface (source), and I'm lost. I see from generated JavaDocthat this is implemented by either EventFiringMouse (source) or HtmlUnitMouse (source), but I'm not sure which one will be implemented. I made an assumption (with little basis) that HtmlUnitMousewould be used, which has led me further down the rabbit hole looking at HTMLUnit code from Gargoyle Software...

In short, I am totally lost.

Any guidance would be much appreciated :)

Research

I have found that I was incorrect in my assumption that HTMLUnit is used by Chrome, Firefox, and Internet Explorer. Documentation shows that RemoteWebDriver (source) is subclassed by ChromeDriver, FirefoxDriver, and InternetExplorerDriver.

1 Answer

Prabhpreet Kaur · Answer 1 · 2019-07-26T12:46:45+0000

Selenium Webdriver API helps in communication between languages and browsers. Selenium supports several programming languages like Java, C#, Python etc., and additionally, it supports multiple browsers like Google Chrome, Firefox, Internet Explorer etc. Every browser has a totally different logic of performing actions like loading a page, closing the browser etc. Selenium WebDriver architecture.

There are four components of selenium Architecture:

Selenium client Library
JSON Wire Protocol over communications protocol
Browser Drivers
Browsers

Selenium client Libraries/Language Bindings:

Selenium supports multiple libraries like Java, Ruby, Python, etc., selenium Developers have developed language bindings to permit selenium to support multiple languages. cross-check selenium Libraries within the official website.

2. JSON WIRE PROTOCOL Over HTTP Client:

JSON stands for JavaScript Object Notation. it's used to transfer data between a server and a client on the net. JSON Wire Protocol is a REST API that transfers the data between HTTP server. each BrowserDriver (such as FirefoxDriver, ChromeDriver etc.,) has its own HTTP server.

3. Browser Drivers:

Each browser contains a separate browser driver. Browser drivers communicate with a various browser while not revealing the interior logic of the browser’s practicality. once a browser driver has received any command then that command is executed on the various browser and therefore the response can return within the kind of communications protocol response.

4. Browsers:

Selenium supports multiple browsers like Firefox, Chrome, IE, Safari etc.,

Let’s see how selenium WebDriver works internally. In real-time, you write a code in your UI (say Eclipse IDE) using any one of the supported selenium client libraries (say Java).

Example:

WebDriver driver = new FirefoxDriver();
driver.get(https://intellipaat.com/);

Once you're prepared along with your script, you'll click on Run to execute the program. based on the above statements, the Firefox browser will be launched and it'll navigate to Intllipaat's website.

Here we tend to see what is going to happen internally once you click on Run until the launch of Firefox browser.

Once you click on Run, each statement in your script will be converted as a URL with the help of JSON Wire Protocol over communications protocol. The URL’s are passed to the Browser Drivers. (In the above code, we took FirefoxDriver). Here in our case the client library (java) will convert the statements of the script to JSON format and communicates with the FirefoxDriver. URL looks as shown below.

http://localhost:8080/{"url":"https://intellipaat.com/"}

Every Browser Driver uses an HTTP server to receive HTTP requests. Once the URL reaches the Browser Driver, then the Browser Driver can pass that request to the real browser over HTTP. Then the commands in your selenium script will be executed on the browser.

If the request is POST request then there'll be action on browser

If the request is a GET request then the corresponding response is going to be generated at the browser end and it'll be sent over communications protocol to the browser driver and the Browser Driver over JSON Wire Protocol and sends it to the UI (Eclipse IDE).

From a technical perspective, how does Selenium click an element on a web page?

1 Answer

Related questions

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources