I am working on web scrapping, I am trying to pull the links and extract the data present in the link. I am able to extract the data as shown below:
/users/sign_up
/topics
/smarties
/posts
/users/sign_in
/users/sign_up
/posts/installing-anaconda-python-data-science-platform
/topics/python
/topics/anaconda-python
/topics/machine-learning
/jordan
/posts/python-libraries-to-import-for-data-science-programs
/topics/python
/topics/data-science
/topics/machine-learning
/jordan
/posts/shortcut-for-opening-the-object-inspector-in-python-spyder
/topics/python
/topics/anaconda-python
/topics/spyder-python
/topics/machine-learning
/jordan
/posts/python-script-for-replacing-missing-data-in-a-machine-learning-algorithm
/topics/machine-learning
/topics/python
/jordan
/posts/python-script-for-pulling-in-the-same-column-from-an-array-of-arrays
/topics/python
/jordan
/posts/how-to-implement-fizzbuzz-in-python
/topics/fizzbuzz
/topics/python
/jordan
/posts/how-to-think-like-a-computer-scientist
/topics/computer-science
/topics/python
/topics/programming
/jordan
/posts/base-case-example-for-how-to-test-a-python-class
/topics/python
/topics/tdd
/jordan
/posts/installing-and-working-with-pipenv
/topics/pipenv
/topics/python
/jordan
/posts/steps-for-building-a-flask-api-application-with-python-3
/topics/flask
/topics/tutorial
/topics/python
/jordan
None
/topics/python?page=2
/topics/python?page=3
/topics/python?page=4
/topics/python?page=2
/topics/python?page=4
I have run the code as shown below:
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('http://www.dailysmarty.com/topics/python')
soup = bs(r.text, 'html.parser')
for link in soup.find_all('a'):
print(link.get('href'))
But when I run the below code on the generator
def generator(web):
titles = []
for link in web:
if 'posts' in link.get('href'):
print(link.get('href'))
else:
pass
data = soup.find_all('a')
#generator(data)
I am getting this data
/posts
/posts/installing-anaconda-python-data-science-platform
/posts/python-libraries-to-import-for-data-science-programs
/posts/shortcut-for-opening-the-object-inspector-in-python-spyder
/posts/python-script-for-replacing-missing-data-in-a-machine-learning-algorithm
/posts/python-script-for-pulling-in-the-same-column-from-an-array-of-arrays
/posts/how-to-implement-fizzbuzz-in-python
/posts/how-to-think-like-a-computer-scientist
/posts/base-case-example-for-how-to-test-a-python-class
/posts/installing-and-working-with-pipenv
/posts/steps-for-building-a-flask-api-application-with-python-3
Traceback (most recent call last):
File "C:\Users\joshu\AppData\Local\Programs\Python\Python38\classes.py", line 18, in <module>
generator(data)
File "C:\Users\joshu\AppData\Local\Programs\Python\Python38\classes.py", line 13, in generator
if 'posts' in link.get('href'):
TypeError: argument of type 'NoneType' is not iterable
What should I do to avoid the errors when I pass my code in the generator. Should I use any for loop?