XPath is a language for addressing parts of an XML document
Now the point is why to learn XPath?
- Having this skill is very important for accurate web data extraction
- It is more powerful than CSS selectors
- It also allows selection and filtering with a fine-grained look at the text content
- XPath is extensible with custom functions.
- XPath to query parts of an HTML structure
So you can use XPath with Python by using its libraries that support XML. For that, you will need to install PyXML and Libxml2 libraries.
PyXML:
The PyXML package is a collection of libraries to process XML with Python.
Below is an example of this:-
from xml.dom.ext.reader import Sax2
from xml import xpath doc = Sax2.FromXmlFile('foo.xml').documentElement
for url in xpath.Evaluate('//@Url', doc):
print (url.value)
Libxml2:
Libxml2 is the XML C parser and toolkit developed for the Gnome project it is free software. XML itself is a metalanguage to design markup languages, i.e. text language where semantic and structure are added to the content using extra "markup" information enclosed between angle brackets.
Below is the example of libxml2:-
import libxml2 doc = libxml2.parseFile('foo.xml')
for url in doc.xpathEval('//@Url'):
print (url.content)