Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I am working on web scrapping, I am trying to pull the links and extract the data present in the link. I am able to extract the data as shown below:

/users/sign_up

/topics

/smarties

/posts

/users/sign_in

/users/sign_up

/posts/installing-anaconda-python-data-science-platform

/topics/python

/topics/anaconda-python

/topics/machine-learning

/jordan

/posts/python-libraries-to-import-for-data-science-programs

/topics/python

/topics/data-science

/topics/machine-learning

/jordan

/posts/shortcut-for-opening-the-object-inspector-in-python-spyder

/topics/python

/topics/anaconda-python

/topics/spyder-python

/topics/machine-learning

/jordan

/posts/python-script-for-replacing-missing-data-in-a-machine-learning-algorithm

/topics/machine-learning

/topics/python

/jordan

/posts/python-script-for-pulling-in-the-same-column-from-an-array-of-arrays

/topics/python

/jordan

/posts/how-to-implement-fizzbuzz-in-python

/topics/fizzbuzz

/topics/python

/jordan

/posts/how-to-think-like-a-computer-scientist

/topics/computer-science

/topics/python

/topics/programming

/jordan

/posts/base-case-example-for-how-to-test-a-python-class

/topics/python

/topics/tdd

/jordan

/posts/installing-and-working-with-pipenv

/topics/pipenv

/topics/python

/jordan

/posts/steps-for-building-a-flask-api-application-with-python-3

/topics/flask

/topics/tutorial

/topics/python

/jordan

None

/topics/python?page=2

/topics/python?page=3

/topics/python?page=4

/topics/python?page=2

/topics/python?page=4

I have run the code as shown below:

import requests

from bs4 import BeautifulSoup as bs

r = requests.get('http://www.dailysmarty.com/topics/python')

soup = bs(r.text, 'html.parser')

for link in soup.find_all('a'):

    print(link.get('href'))

But when I run the below code on the generator 

def generator(web):

    titles = []

    for link in web:

        if 'posts' in link.get('href'):

            print(link.get('href'))

        else:

            pass

data = soup.find_all('a')

#generator(data)

I am getting this data

/posts

/posts/installing-anaconda-python-data-science-platform

/posts/python-libraries-to-import-for-data-science-programs

/posts/shortcut-for-opening-the-object-inspector-in-python-spyder

/posts/python-script-for-replacing-missing-data-in-a-machine-learning-algorithm

/posts/python-script-for-pulling-in-the-same-column-from-an-array-of-arrays

/posts/how-to-implement-fizzbuzz-in-python

/posts/how-to-think-like-a-computer-scientist

/posts/base-case-example-for-how-to-test-a-python-class

/posts/installing-and-working-with-pipenv

/posts/steps-for-building-a-flask-api-application-with-python-3

Traceback (most recent call last):

  File "C:\Users\joshu\AppData\Local\Programs\Python\Python38\classes.py", line 18, in <module>

    generator(data)

  File "C:\Users\joshu\AppData\Local\Programs\Python\Python38\classes.py", line 13, in generator

    if 'posts' in link.get('href'):

TypeError: argument of type 'NoneType' is not iterable

What should I do to avoid the errors when I pass my code in the generator. Should I use any for loop?

1 Answer

0 votes
by (36.8k points)

You need not use any for loop for it, you just need to use a simple if condition to check if it is a href or not. I am attaching the code below which will help you solve your problem:

if link.has_attr('href') and 'posts' in link.get('href'):

If you want to know more about the Data Science then do check out the following Data Science link which will help you in understanding Data Science from scratch

Browse Categories

...