Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (19.9k points)

I have scraped a set of links off a website (https://www.gmcameetings.co.uk) - all the links including the words meetings, i.e. the meeting papers, which are now contained in 'meeting_links'. I now need to follow each of them links to scrape some more links within them.

I've gone back to using the request library and tried

r2 = requests.get("meeting_links") 

But it returns the following error:

MissingSchema: Invalid URL 'list_meeting_links': No schema supplied. 

Perhaps you meant http://list_meeting_links?

Which I've changed it to but still no difference.

This is my code so far and how I got the links from the first url that I wanted.

# importing libaries and defining

import requests

import urllib.request

import time 

from bs4 import BeautifulSoup as bs

# set url

url = "https://www.gmcameetings.co.uk/" 

# grab html 

r = requests.get(url)

page = r.text

soup = bs(page,'lxml')

# creating folder to store pfds - if not create seperate folder

folder_location = r'E:\Internship\WORK'

# getting all meeting href off url

meeting_links = soup.find_all('a',href='TRUE')

for link in meeting_links:

    print(link['href'])

    if link['href'].find('/meetings/')>1:

        print("Meeting!") 

#second set of links

r2 = requests.get("meeting_links") 

Do I need to do something with the 'meeting_links' before I can start using the requests library again? I'm completely lost.

1 Answer

0 votes
by (25.1k points)

it looks like you are trying to pass a string to the requests method. Request method should look like this:

requests.get('https://example.com')

You should change your code to look like this:

for link in meeting_links:

    if link['href'].find('/meetings/')>1:

        r2 = requests.get(link['href']) 

31k questions

32.9k answers

507 comments

693 users

Browse Categories

...