Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (50.2k points)

I want to extract the position (x, y) and properties(font/size) of every term in a document.

Can I use this particular code?

from docx import Document

main_file = Document("/tmp/file.docx")

for paragraph in main_file.paragraphs:

    for word in paragraph.text:  # <= Non-existing (yet wished) functionnalities, IMHO

        print(word.x, word.y)  # <= Non-existing (yet wished) functionnalities, IMHO

Does somebody have an idea?

1 Answer

0 votes
by (107k points)
edited by

for word in paragraph.text:  # <= Non-existing (yet wished) functionalities, IMHO    

 These can be comprehended easily as:

for word in paragraph.text.split():

    ...

Regarding

print(word.x, word.y)  # <= Non-existing (yet wished) functionalities, IMHO

From the current scenario, it is reliable to say this record will never get displayed in python-docx, and if it gets displayed, then it could not look like this.

It may be likely to get these values from Word itself because Word does have a rendering engine (which it uses for screen display and printing).

But, if there was such an in-built method, it should take a paragraph and a character offset within that paragraph as an input, like:

 document.position(paragraph, offset=42) 

or perhaps:

 paragraph.position(offset=42).

For more information regarding the same, do check out the python online course that will help you out in a better way. 

Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
asked Mar 23, 2021 in Python by Rekha (2.2k points)

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...