Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I am new to Python and been trying to learn both Python and BeautifulSoup.

How would I incorporate the href links as other "key":"value" into a JSON object:

from bs4 import BeautifulSoup

import json

html = """<table>

  <tbody>

      <tr>

        <td><a href="/page/some-page">Some Page Title</a></td>

        <td class="created-at">2020-08-01</td>

        <td><a href="/id/400">Text Description 1</a></td>

      </tr>

      <tr>

          <td><a href="/page/some-page-2">Some Page Title 2</a></td>

          <td class="created-at">2020-08-02</td>

          <td><a href="/id/400">Text Description 2</a></td>

      </tr>

      <tr>

          <td><a href="/page/some-page-3">Some Page Title 3</a></td>

          <td class="created-at">2020-08-03</td>

          <td><a href="/id/400">Text Description 3</a></td>

      </tr>

  </tbody>

</table>"""

data = []

soup = BeautifulSoup(html, 'html.parser')

rows = soup.select('table > tbody > tr')

for table in rows:

    keys = ["Name","Date","Description"]

    values = [td.get_text(strip=True) for td in table.find_all('td')]

    d = dict(zip(keys, values))

    data.append(d)

print(json.dumps(data, indent=4))

I basically need to add 2 more keys and get href values:

keys = ["Name","Date","Description","Url1","Url2"]

1 Answer

0 votes
by (36.8k points)

This is how to do:

for table in rows:

    keys = ["Name","Date","Description",'Url1','Url2']

    values = [td.get_text(strip=True) for td in table.find_all('td')] + [a.attrs['href'] for a in table.find_all('a')]

    d = dict(zip(keys, values))

    data.append(d)

print(json.dumps(data, indent=4))

Output

[

    {

        "Name": "Some Page Title",

        "Date": "2020-08-01",

        "Description": "Text Description 1",

        "Url1": "/page/some-page",

        "Url2": "/id/400"

    },

    {

        "Name": "Some Page Title 2",

        "Date": "2020-08-02",

        "Description": "Text Description 2",

        "Url1": "/page/some-page-2",

        "Url2": "/id/400"

    },

    {

        "Name": "Some Page Title 3",

        "Date": "2020-08-03",

        "Description": "Text Description 3",

        "Url1": "/page/some-page-3",

        "Url2": "/id/400"

    }

]

If you want to know more about the Data Science then do check out the following Data Science which will help you in understanding Data Science from scratch

 

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...