0 votes
1 view
in Python by (12.7k points)
Can anyone help me, how can I utilize the Tika package in python(2.7) to parse the PDF files?

1 Answer

0 votes
by (26.4k points)

Click this link, If you want to install the Tika server jar.

  1. Download the Jar
  2. Store it somewhere and run it as java -jar tika-server-x.x.jar --port xxxx
  3. In your Code you now don't need to do the tika.initVM() Add tika.TikaClientOnly = True instead of tika.initVM()
  4. Change parsed = parser.from_file('/path/to/file') to parsed = parser.from_file('/path/to/file', '/path/to/server') You will get the server path in Step 2. when the tika server initiates - just plug that in here

Wanna become a Python expert? Come and join the python certification course and get certified.

Related questions

0 votes
1 answer
asked Jan 2 in Python by ashely (50.5k points)
0 votes
1 answer
0 votes
1 answer
asked Dec 2, 2020 in Python by ashely (50.5k points)
Welcome to Intellipaat Community. Get your technical queries answered by top developers !

Categories

...