Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I have a below text:

text = """<table class="table table-striped">\n <thead>\n <tr>\n <th data-field="placement">Placement</th>\n <th data-field="production">Production</th>\n <th data-field="application">Eng.Vol.</th>\n <th data-field="body">Body No</th>\n <th data-field="eng">Eng No</th>\n <th data-field="eng">Notes</th>\n </tr>\n <tr>\n <td data-field="placement">Front Stabilizer</td>\n <td data-field="production">Oct 16~</td>\n <td data-field="application">1.5 L</td>\n <td data-field="body">HRW18</td>\n <td data-field="eng">L15BY</td>\n <td data-field="note" class="">\n Pos:Left/Right </td>\n </tr>\n <tr>\n <td data-field="placement">Front Stabilizer</td>\n <td data-field="production">Oct 16~</td>\n <td data-field="application">1.5 L</td>\n <td data-field="body">HRW18 LHD</td>\n <td data-field="eng">L15BY</td>\n <td data-field="note" class="">\n Pos:Left/Right </td>\n </tr>\n <tr>\n <td data-field="placement">Front Stabilizer</td>\n <td data-field="production">Oct 16~</td>\n <td data-field="application">1.5 L</td>\n <td data-field="body">HRW28</td>\n <td data-field="eng">L15BY</td>\n <td data-field="note" class="">\n Pos:Left/Right </td>\n </tr>\n <tr>\n <td data-field="placement">Front Stabilizer</td>\n <td data-field="production">Oct 16~</td>\n <td data-field="application">2.0 L</td>\n <td data-field="body">HRW38 RHD</td>\n <td data-field="eng">R20A9</td>\n <td data-field="note" class="">\n Pos:Left/Right </td>\n </tr>\n </thead>\n </table>"""

this HTML text is the properly closed with table tag, and has all required tags. still pandas is not reading as the table.

code:

pd.read_html(text)

output:

[Empty DataFrame

 Columns: [(Placement, Front Stabilizer, Front Stabilizer, Front Stabilizer, Front Stabilizer), (Production, Oct 16~, Oct 16~, Oct 16~, Oct 16~), (Eng.Vol., 1.5 L, 1.5 L, 1.5 L, 2.0 L), (Body No, HRW18, HRW18 LHD, HRW28, HRW38 RHD), (Eng No, L15BY, L15BY, L15BY, R20A9), (Notes, Pos:Left/Right, Pos:Left/Right, Pos:Left/Right, Pos:Left/Right)]

 Index: []]```

1 Answer

0 votes
by (36.8k points)

Your table is wrapped inside the <thead></thead>. It's understandable that pandas interprete everything as a columns. Let's try:

tmp=pd.read_html(text)[0]

pd.DataFrame(tmp.columns.to_frame().values)

Output:

    0           1                 2                 3                 4

--  ----------  ----------------  ----------------  ----------------  ----------------

 0  Placement   Front Stabilizer  Front Stabilizer  Front Stabilizer  Front Stabilizer

 1  Production  Oct 16~           Oct 16~           Oct 16~           Oct 16~

 2  Eng.Vol.    1.5 L             1.5 L             1.5 L             2.0 L

 3  Body No     HRW18             HRW18 LHD         HRW28             HRW38 RHD

 4  Eng No      L15BY             L15BY             L15BY             R20A9

 5  Notes       Pos:Left/Right    Pos:Left/Right    Pos:Left/Right    Pos:Left/Right

Do check out data science with python certification to understand from scratch.

Related questions

0 votes
1 answer
asked Oct 5, 2019 in Data Science by sourav (17.6k points)
0 votes
1 answer
0 votes
1 answer

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...