Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in RPA by (29.5k points)
I want to scrape data from the “w2 form” (pdf) so that i can use it to save into database but not able to get field wise data.

1 Answer

0 votes
by (12.7k points)
To extract data you can actually do a few things, suppose you want to use the entire the pdf you can use read pdf activity. let's say you want some particular information then in that case you can just use string split method to extract data.

Another approach which is more sophisticated is applicable only if all your pdf are in a particular format, if that is the case use can use screen scraping or ocr scraping and scrape around your anchored texts, for example you have pdf with customer details and all pdf have name , number etc in same format then use name number text as the anchor to extract your variable information

hope this helps

Related questions

0 votes
1 answer
asked Jul 13, 2019 in RPA by Abhishek_31 (12.7k points)
0 votes
1 answer
+1 vote
1 answer
0 votes
1 answer
0 votes
1 answer
asked Jul 13, 2019 in RPA by Abhishek_31 (12.7k points)

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...