Your task comes under the information extraction domain that comes under the area of research.
There are two ways to start working on this task:
You can directly extract information from an HTML page or a website with a fixed template. In your case, the best way is to look at the HTML code of the pages and craft the corresponding XPath or DOM selectors to get to the right info. The disadvantage with this approach is that it is not generalizable to new websites since you have to do it for each website one by one.
You can also create a model that extracts the same information from many websites within one domain. In this case, you should create some features to use the ML approach and let the IE algorithm to "understand the content of pages". The most common features are the DOM path, the format of the value (attribute) to be extracted, layout (like bold, italic and etc.), and surrounding context words. If you label some values (you need at least 100-300 pages depending on domain to do it with some sort of reasonable quality). Then you train a model on the labeled pages. In this case, your algorithm tries to find repetitive patterns across pages (without labeling).
You need to work with the DOM tree and generate the right features. Also, data labeling in the right way is a deliberate task. For ML models,You should have a look at CRF, 2DCRF, semi-Markov CRF.
In the general case a cutting edge in IE research and not a hack that you can do it a few evenings.
I hope this answer helps.
Also, you can learn more about machine learning and its concepts by joining Intellipaat's ML Training.