Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
5 views
in Java by (7k points)

I am not able to find any good web scraping Java-based API. The site which I need to scrape does not provide any API as well; I want to iterate over all web pages using some pageID and extract the  HTML titles / other stuff in their DOM trees.

Are there any ways other than web scraping?

1 Answer

0 votes
by (13.1k points)

You can use jsoup. Extracting the title is not difficult and you have many options. It is a nice library with no dependencies so it is quite lightweight. Also, it is headless so it doesn’t need a browser.

You can also use the Selenium web driver. It provides visual feedback to the coder. Accurate and consistent as it directly controls the browser you use. Slow, it doesn’t hit web pages as HtmlUnit does but sometimes you don’t want to hit too fast.

Want to learn Java? Check out the core Java certification from Intellipaat.

Related questions

0 votes
1 answer
asked Nov 12, 2020 in Python by ashely (50.2k points)
0 votes
0 answers
0 votes
1 answer
0 votes
1 answer
asked Nov 12, 2020 in Python by ashely (50.2k points)

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...