I have scraped a set of links off a website (https://www.gmcameetings.co.uk) - all the links including the words meetings, i.e. the meeting papers, which are now contained in 'meeting_links'. I now need to follow each of them links to scrape some more links within them.
I've gone back to using the request library and tried
r2 = requests.get("meeting_links")
But it returns the following error:
MissingSchema: Invalid URL 'list_meeting_links': No schema supplied.
Perhaps you meant http://list_meeting_links?
Which I've changed it to but still no difference.
This is my code so far and how I got the links from the first url that I wanted.
# importing libaries and defining
import requests
import urllib.request
import time
from bs4 import BeautifulSoup as bs
# set url
url = "https://www.gmcameetings.co.uk/"
# grab html
r = requests.get(url)
page = r.text
soup = bs(page,'lxml')
# creating folder to store pfds - if not create seperate folder
folder_location = r'E:\Internship\WORK'
# getting all meeting href off url
meeting_links = soup.find_all('a',href='TRUE')
for link in meeting_links:
print(link['href'])
if link['href'].find('/meetings/')>1:
print("Meeting!")
#second set of links
r2 = requests.get("meeting_links")
Do I need to do something with the 'meeting_links' before I can start using the requests library again? I'm completely lost.