Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Python by (19.9k points)

I am writing a python code to scrape the pdfs of meetings off this website: https://www.gmcameetings.co.uk The pdf links are within links, which are also within links. I have the first set of links off the page above, then I need to scrape links within the new urls. When I do this I get the following error:

AttributeError: ResultSet object has no attribute 'find_all'. You're 

probably treating a list of items like a single item. Did you call 

find_all() when you meant to call find()?

This is my code so far which is all fine and checked in jupyter notebook:

# importing libaries and defining

import requests

import urllib.request

import time 

from bs4 import BeautifulSoup as bs

# set url

url = "https://www.gmcameetings.co.uk/" 

# grab html 

r = requests.get(url)

page = r.text

soup = bs(page,'lxml')

# creating folder to store pfds - if not create seperate folder

folder_location = r'E:\Internship\WORK'

# getting all meeting href off url

meeting_links = soup.find_all('a',href='TRUE')

for link in meeting_links:

    print(link['href'])

    if link['href'].find('/meetings/')>1:

        print("Meeting!") 

This is the line that then receives the error:

second_links = meeting_links.find_all('a', href='TRUE')

I have tried the find() as python suggests but that doesn't work either. But I understand that it can't treat meeting_links as a single item.

So basically, how do you search for links within each bit of the new string variable (meeting_links).

I already have code to get the pdfs once I have the second set of urls which seems to work fine but need to obviously get these first. Hopefully this makes sense and I've explained ok - I only properly started using python on Monday so I'm a complete beginner.

1 Answer

0 votes
by (25.1k points)

To get all meeting links try

from bs4 import BeautifulSoup as bs

import requests

# set url

url = "https://www.gmcameetings.co.uk/" 

# grab html 

r = requests.get(url)

page = r.text

soup = bs(page,'lxml')

# Scrape to find all links

all_links = soup.find_all('a', href=True)

# Loop through links to find those containing '/meetings/'

meeting_links = []

for link in all_links:

    href = link['href']

    if '/meetings/' in href:

        meeting_links.append(href)

print(meeting_links)

The .find() function that you use in your original code is specific to beautiful soup objects. To find a substring within a string, just use native Python: 'a' in 'abcd'.

Hope that helps!

Browse Categories

...