0 votes
1 view
in Devops and Agile by (19.8k points)

I've been testing out Selenium with Chromedriver and I noticed that some pages can detect that you're using Selenium even though there's no automation at all. Even when I'm just browsing manually just using chrome through Selenium and Xephyr I often get a page saying that suspicious activity was detected. I've checked my user agent, and my browser fingerprint and they are all exactly identical to the normal chrome browser.

When I browse to these sites in normal chrome everything works fine, but the moment I use Selenium I'm detected.

In theory, chrome driver and chrome should look literally exactly the same to any webserver, but somehow they can detect it.

If you want some test code try out this:

from pyvirtualdisplay import Display

from selenium import webdriver

display = Display(visible=1, size=(1600, 902))

display.start()

chrome_options = webdriver.ChromeOptions()

chrome_options.add_argument('--disable-extensions')

chrome_options.add_argument('--profile-directory=Default')

chrome_options.add_argument("--incognito")

chrome_options.add_argument("--disable-plugins-discovery");

chrome_options.add_argument("--start-maximized")

driver = webdriver.Chrome(chrome_options=chrome_options)

driver.delete_all_cookies()

driver.set_window_size(800,800)

driver.set_window_position(0,0)

print 'arguments done'

driver.get('http://stubhub.com')

If you browse around stubhub you'll get redirected and 'blocked' within one or two requests. I've been investigating this and I can't figure out how they can tell that a user is using Selenium.

How do they do it?

EDIT UPDATE:

I installed the Selenium IDE plugin in Firefox and I got banned when I went to stubhub.com in the normal firefox browser with only the additional plugin.

EDIT:

When I use Fiddler to view the HTTP requests being sent back and forth I've noticed that the 'fake browser\'s' requests often have 'no-cache' in the response header.

EDIT:

Results like this Is there a way to detect that I'm in a Selenium Webdriver page from Javascript suggest that there should be no way to detect when you are using a web driver. But this evidence suggests otherwise.

EDIT:

The site uploads a fingerprint to their servers, but I checked and the fingerprint of selenium is identical to the fingerprint when using chrome.

EDIT:

This is one of the fingerprint payloads that they send to their servers

{"appName":"Netscape","platform":"Linuxx86_64","cookies":1,"syslang":"en-US","userlang":"en-US","cpu":"","productSub":"20030107","setTimeout":1,"setInterval":1,"plugins":{"0":"ChromePDFViewer","1":"ShockwaveFlash","2":"WidevineContentDecryptionModule","3":"NativeClient","4":"ChromePDFViewer"},"mimeTypes":{"0":"application/pdf","1":"ShockwaveFlashapplication/x-shockwave-flash","2":"FutureSplashPlayerapplication/futuresplash","3":"WidevineContentDecryptionModuleapplication/x-ppapi-widevine-cdm","4":"NativeClientExecutableapplication/x-nacl","5":"PortableNativeClientExecutableapplication/x-pnacl","6":"PortableDocumentFormatapplication/x-google-chrome-pdf"},"screen":{"width":1600,"height":900,"colorDepth":24},"fonts":{"0":"monospace","1":"DejaVuSerif","2":"Georgia","3":"DejaVuSans","4":"TrebuchetMS","5":"Verdana","6":"AndaleMono","7":"DejaVuSansMono","8":"LiberationMono","9":"NimbusMonoL","10":"CourierNew","11":"Courier"}}

It's identical in selenium and in chrome

EDIT:

VPNs work for single use but get detected after I load the first page. Clearly, some javascript is being run to detect Selenium.

1 Answer

0 votes
by (63.4k points)

The answer is YES!

Websites can detect the automation using JavaScript experimental technology navigator.webdriver in the navigator interface.

 If the website is loaded with automation tools like Selenium, the value of navigator.webdriver is set to true.

So by bringing analytics tools into use, any website can track the automation.

by (100 points)
this is easy fixable . I just tested with Stubhub, I can do what i want, no block/ban.

1) take chromedriver.exe into a hex editor (works on win, heard on mac etc. it can cause issues).  Search for $cdc. You'll find something like '$cdc_asdjflasutopfhvcZLmcfl_'. Replace the whole part after the $ with another string of exactly same character length. Save. Use this executable as driver from now on. This is one of the exposed js variables most detectors scan for. That being done, many gates are already wide open to you.

2) C#
var options = new ChromeOptions();
options.AddExcludedArguments(new List<string>() { "enable-automation" });

this lets the navigator.webdriver flag magically disappear - which is the one thing the most basic detectors scan for. Even more gates open now!

3) The rest is based on IP rating/classification, fingerprinting, trust scores or a combination.
Basic rule: The more you block, the more unique you'll get - and thereby more trackable. Try to be as usual as most of the users, in terms of setup (browser selection, user agent etc.) - and avoid using cheap overused datacenter ips . Example: Using a multi-screen setup with a 4k screen involved but having an outdated useragent makes you pretty unique :P
by (100 points)
Well, it doesn't seem to work to login in account.google.com

I tried everything you offered, still blocked.

Have you any clue of what else could be done ?
Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...