Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
3 views
in DevOps and Agile by (19.7k points)

As always, I frequently have issues, and I have thoroughly searched for an answer to the current one but find myself at a loss. 

My issue is the following. I have created a spider and want to crawl different URLs. When I crawl each URL independently everything works fine. However, when I try to crawl both I get the following error: httplib.BadStatusLine: ''

I have followed some advice that I read (see links mentioned above) and can print the response.status for each request works, but the response.url does not print and the error is thrown. (I only print both statements to try to identify the source of the error).

I hope that this is clear.

I am using scrapy and selenium

class PeoplePage(Spider):

    name = "peopleProfile"

    allowed_domains = ["blah.com"]

    handle_httpstatus_list = [200, 404]

    start_urls = [

        "url1",

        "url2"

    ]

    def __init__(self):

        self.driver = webdriver.Firefox()

    def parse(self, response):

        print response.status

        print '???????????????????????????????????'

        if response.status == 200:

            self.driver.implicitly_wait(5)

            self.driver.get(response.url)

            print response.url

            print '!!!!!!!!!!!!!!!!!!!!'

            # DO STUFF

        self.driver.close()

1 Answer

0 votes
by (62.9k points)

Based on Python Doc, httplib.BadStatusLine raised if a server responds with an HTTP status code that we don’t understand. You can try to pass this exception. You should close your browser through the object reference to WebDriver class after you have called all the URLs that you want in your automation testing. 

Try this:

def parse(self, response):

    try:

        print response.status

        print '???????????????????????????????????'

        if response.status == 200:

            self.driver.implicitly_wait(5)

            self.driver.get(response.url)

            print response.url

            print '!!!!!!!!!!!!!!!!!!!!'

            # DO STUFF

    except httplib.BadStatusLine:

        pass


 

Browse Categories

...