0 votes
1 view
in Devops and Agile by (25k points)

I have spent a day researching a library that can be used to accomplish the following:

  • Retrieve the full contents of a webpage like in the background without rendering the result to a view.
  • The lib should support pages that fire off ajax requests to load some additional result data after the initial HTML has loaded for example.
  • From the resulting html, I need to grab elements in XPath or CSS selector form.
  • In future, I also possibly need to navigate to a next page (fire off events, submitting buttons/links, etc)

Here is what I have tried without success:

  • Jsoup: Works great but no support for javascript/ajax (so it does not load full page)
  • Android built-in HttpEntity: the same problem with javascript/ajax as jsoup
  • HtmlUnit: Looks exactly what I need but after hours cannot get it to work on Android (Other users failed by trying to load the 12MB+ worth of jar files. I myself loaded the full source code and referenced it as a project library only to find that things such as Applets and java.awt (used by HtmlUnit) do not exist in Android).
  • Rhino - I find this very confusing and don't know how to get it working in Android and even if it is what I am looking for.
  • Selenium Driver: Looks like it can work but you don't have a straightforward way to implement it in a headless way so that you don't have the actual HTML displayed to a view.

I really want HtmlUnit to work as it seems the best suited for my solution. Is there any way or at least another library I have missed that is suitable for my needs?

I am currently using Android Studio 0.1.7 and can move to Ellipse if needed.

Thanks in advance!

1 Answer

0 votes
by (63.3k points)

I used android's built-in WebView and overrode the onPageFinished method of Webview class to inject Javascript that grabs all the HTML after the page has fully loaded. WebView can also be used to called further javascript actions, clicking buttons, filling in forms, etc.

Code: 

webView.getSettings().setJavaScriptEnabled(true); 

MyJavaScriptInterface jInterface = new MyJavaScriptInterface(context); 

webView.addJavascriptInterface(jInterface, "HtmlViewer");

webView.setWebViewClient(new WebViewClient() { @Override

public void onPageFinished(WebView view, String url) { 

//Load HTML 

webView.loadUrl("javascript:window.HtmlViewer.showHTML ('<head>'+document.getElementsByTagName('html')[0].innerHTML+'</head>');"); 

}

 webView.loadUrl(StartURL);

 ParseHtml(jInterface.html); 

public class MyJavaScriptInterface { 

private Context ctx; 

public String html;

MyJavaScriptInterface(Context ctx) { 

this.ctx = ctx;

 }

@JavascriptInterface

public void showHTML(String _html) {

html = _html;

  }

 }

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...