Treating a list of items like a single item error: how to find links within each 'link' within string already scraped

Question

asked Jul 31, 2019 in Python by Rajesh Malhotra (19.9k points)

I am writing a python code to scrape the pdfs of meetings off this website: https://www.gmcameetings.co.uk The pdf links are within links, which are also within links. I have the first set of links off the page above, then I need to scrape links within the new urls. When I do this I get the following error:

AttributeError: ResultSet object has no attribute 'find_all'. You're
probably treating a list of items like a single item. Did you call
find_all() when you meant to call find()?

This is my code so far which is all fine and checked in jupyter notebook:

# importing libaries and defining
import requests
import urllib.request
import time
from bs4 import BeautifulSoup as bs
# set url
url = "https://www.gmcameetings.co.uk/"
# grab html
r = requests.get(url)
page = r.text
soup = bs(page,'lxml')
# creating folder to store pfds - if not create seperate folder
folder_location = r'E:\Internship\WORK'
# getting all meeting href off url
meeting_links = soup.find_all('a',href='TRUE')
for link in meeting_links:
print(link['href'])
if link['href'].find('/meetings/')>1:
print("Meeting!")

This is the line that then receives the error:

second_links = meeting_links.find_all('a', href='TRUE')

I have tried the find() as python suggests but that doesn't work either. But I understand that it can't treat meeting_links as a single item.

So basically, how do you search for links within each bit of the new string variable (meeting_links).

I already have code to get the pdfs once I have the second set of urls which seems to work fine but need to obviously get these first. Hopefully this makes sense and I've explained ok - I only properly started using python on Monday so I'm a complete beginner.

1 Answer

Anirudh Singh · Answer 1 · 2019-07-31T13:46:59+0000

To get all meeting links try

from bs4 import BeautifulSoup as bs
import requests
# set url
url = "https://www.gmcameetings.co.uk/"
# grab html
r = requests.get(url)
page = r.text
soup = bs(page,'lxml')
# Scrape to find all links
all_links = soup.find_all('a', href=True)
# Loop through links to find those containing '/meetings/'
meeting_links = []
for link in all_links:
href = link['href']
if '/meetings/' in href:
meeting_links.append(href)
print(meeting_links)

The .find() function that you use in your original code is specific to beautiful soup objects. To find a substring within a string, just use native Python: 'a' in 'abcd'.

Hope that helps!

Treating a list of items like a single item error: how to find links within each 'link' within string already scraped

Treating a list of items like a single item error: how to find links within each 'link' within string already scraped

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Browse Categories

Popular Courses

Top Tutorials

Top Articles

Top Interview Questions